In this part of creating your own programming language, we will implement classes as an extension over the previously defined structures. Please check out the previous parts:
The full source code is available
In the first section, we will cover lexical analysis. In short, it’s a process to divide the source code into language lexemes, such as keyword, variable, operator, etc.
You may remember from the previous parts that I was using the regex expressions in the TokenType enum to define all the lexeme types.
Let’s look at the following class prototype and add the missing regex parts to our TokenType
lexemes:
class
word in the Keyword
lexeme's expression to let the lexical analyzer know where our class declaration begins:package org.example.toylanguage.token;
...
public enum TokenType {
...
Keyword("(if|elif|else|end|print|input|fun|return|loop|in|by|break|next|class)(?=\\s|$)"),
...
private final String regex;
}
2) Second, we need the new This
lexeme as a marker for a reference to the current object:
public enum TokenType {
...
This("(this)(?=,|\\s|$)");
private final String regex;
}
In the second section, we will transform the lexemes received from the lexical analyzer into the final statements following our language rules.
When we declare a class or a function, this declaration should be available within the defined isolated bounds. For example, if we declare a function named turn_on []
in the following listing, it will be available for execution after declaration:
But if we declare the same function inside a class scope, this function will no longer be accessed directly from the main block:
DefinitionScope
class and store all the declared definitions inside two sets for classes and functions:package org.example.toylanguage.context.definition;
public class DefinitionScope {
private final Set<ClassDefinition> classes;
private final Set<FunctionDefinition> functions;
public DefinitionScope() {
this.classes = new HashSet<>();
this.functions = new HashSet<>();
}
}
2) In addition, we may want to access the parent’s definition scope. For example, if we declare two separate classes and create an instance of the first class inside the second one:
To provide this ability, we will add the DefinitionScope
parent instance as a reference to the upper layer, which we will use to climb to the top definition layer.
public class DefinitionScope {
private final Set<ClassDefinition> classes;
private final Set<FunctionDefinition> functions;
private final DefinitionScope parent;
public DefinitionScope(DefinitionScope parent) {
this.classes = new HashSet<>();
this.functions = new HashSet<>();
this.parent = parent;
}
}
3) Now, let’s finish the implementation by providing the interfaces to add a definition and retrieve it by name using parent scope:
public class DefinitionScope {
…
public ClassDefinition getClass(String name) {
Optional<ClassDefinition> classDefinition = classes.stream()
.filter(t -> t.getName().equals(name))
.findAny();
if (classDefinition.isPresent())
return classDefinition.get();
else if (parent != null)
return parent.getClass(name);
else
throw new ExecutionException(String.format("Class is not defined: %s", name));
}
public void addClass(ClassDefinition classDefinition) {
classes.add(classDefinition);
}
public FunctionDefinition getFunction(String name) {
Optional<FunctionDefinition> functionDefinition = functions.stream()
.filter(t -> t.getName().equals(name))
.findAny();
if (functionDefinition.isPresent())
return functionDefinition.get();
else if (parent != null)
return parent.getFunction(name);
else
throw new ExecutionException(String.format("Function is not defined: %s", name));
}
public void addFunction(FunctionDefinition functionDefinition) {
functions.add(functionDefinition);
}
}
4) Finally, to manage declared definition scopes and switch between them, we create the context class using java.util.Stack
collection (LIFO):
package org.example.toylanguage.context.definition;
public class DefinitionContext {
private final static Stack<DefinitionScope> scopes = new Stack<>();
public static DefinitionScope getScope() {
return scopes.peek();
}
public static DefinitionScope newScope() {
return new DefinitionScope(scopes.isEmpty() ? null : scopes.peek());
}
public static void pushScope(DefinitionScope scope) {
scopes.push(scope);
}
public static void endScope() {
scopes.pop();
}
}
In this section, we will cover the MemoryScope
to manage class and function variables.
But if we declare a variable within a function or a class, the variable will no longer be available from the main (upper) block of code:
To implement this logic and store variables defined in a specific scope, we create the MemoryScope
class that will contain a map with the variable name as a key and the variable Value
as a value:
public class MemoryScope {
private final Map<String, Value<?>> variables;
public MemoryScope() {
this.variables = new HashMap<>();
}
}
2) Next, similarly to the DefinitionScope
, we provide access to the parent’s scope variables:
public class MemoryScope {
private final Map<String, Value<?>> variables;
private final MemoryScope parent;
public MemoryScope(MemoryScope parent) {
this.variables = new HashMap<>();
this.parent = parent;
}
}
3) Next, we add methods to get and set variables. When we set a variable, we always re-assign the previously set value if there is already a defined variable in the upper MemoryScope
layer:
public class MemoryScope {
...
public Value<?> get(String name) {
Value<?> value = variables.get(name);
if (value != null)
return value;
else if (parent != null)
return parent.get(name);
else
return NullValue.NULL_INSTANCE;
}
public void set(String name, Value<?> value) {
MemoryScope variableScope = findScope(name);
if (variableScope == null) {
variables.put(name, value);
} else {
variableScope.variables.put(name, value);
}
}
private MemoryScope findScope(String name) {
if (variables.containsKey(name))
return this;
return parent == null ? null : parent.findScope(name);
}
}
4) In addition to the set
and get
methods, we add two more implementations to interact with the current (local) layer of MemoryScope
:
public class MemoryScope {
...
public Value<?> getLocal(String name) {
return variables.get(name);
}
public void setLocal(String name, Value<?> value) {
variables.put(name, value);
}
}
These methods will be used later to initialize function arguments or class’s instance arguments. For example, if we create an instance of the Lamp
class and pass the predefined global type
variable, this variable shouldn’t be changed when we try to update the lamp_instance :: type
property:
5) Finally, to manage variables and switch between memory scopes, we create the MemoryContext
implementation using java.util.Stack
collection:
package org.example.toylanguage.context;
public class MemoryContext {
private static final Stack<MemoryScope> scopes = new Stack<>();
public static MemoryScope getScope() {
return scopes.peek();
}
public static MemoryScope newScope() {
return new MemoryScope(scopes.isEmpty() ? null : scopes.peek());
}
public static void pushScope(MemoryScope scope) {
scopes.push(scope);
}
public static void endScope() {
scopes.pop();
}
}
In this section, we will read and store class definitions.
package org.example.toylanguage.statement;
public class ClassStatement {
}
2) Each class can contain nested statements for the initialization and other operations, like a constructor. To store these statements, we extend the CompositeStatement which contains the list of nested statements to execute:
public class ClassStatement extends CompositeStatement {
}
3) Next, we declare the ClassDefinition
to store the class name, its arguments, constructor statements, and the definition scope with the class’s functions:
package org.example.toylanguage.context.definition;
import java.util.List;
@RequiredArgsConstructor
@Getter
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
public class ClassDefinition implements Definition {
@EqualsAndHashCode.Include
private final String name;
private final List<String> arguments;
private final ClassStatement statement;
private final DefinitionScope definitionScope;
}
4) Now, we’re ready to read the class declaration using StatementParser. When we meet the Keyword lexeme with class
value, first we need to read the class name and its arguments inside the square brackets:
package org.example.toylanguage;
public class StatementParser {
...
private void parseClassDefinition() {
Token type = tokens.next(TokenType.Variable);
List<String> arguments = new ArrayList<>();
if (tokens.peek(TokenType.GroupDivider, "[")) {
tokens.next(TokenType.GroupDivider, "["); //skip opening square bracket
while (!tokens.peek(TokenType.GroupDivider, "]")) {
Token argumentToken = tokens.next(TokenType.Variable);
arguments.add(argumentToken.getValue());
if (tokens.peek(TokenType.GroupDivider, ","))
tokens.next();
}
tokens.next(TokenType.GroupDivider, "]"); //skip closing square bracket
}
}
}
5) After the arguments, we expect to read the nested constructor statements:
To store these statements, we create an instance of the previously defined ClassStatement
:
private void parseClassDefinition() {
...
ClassStatement classStatement = new ClassStatement();
}
6) In addition to arguments and nested statements, our classes can also contain functions. To make these functions accessible only within the class definition, we initialize a new layer ofDefinitionScope
:
private void parseClassDefinition() {
...
ClassStatement classStatement = new ClassStatement();
DefinitionScope classScope = DefinitionContext.newScope();
}
7) Next, we initialize an instance of ClassDefinition
and put it to the DefinitionContext
:
private void parseClassDefinition() {
...
ClassStatement classStatement = new ClassStatement();
DefinitionScope classScope = DefinitionContext.newScope();
ClassDefinition classDefinition = new ClassDefinition(type.getValue(), arguments, classStatement, classScope);
DefinitionContext.getScope().addClass(classDefinition);
}
8) Finally, to read the class constructor statements and functions, we call the static StatementParser#parse() method that will collect statements inside classStatement
instance until we meet the finalizing end
lexeme that must be skipped at the end:
private void parseClassDefinition() {
...
//parse class statements
StatementParser.parse(this, classStatement, classScope);
tokens.next(TokenType.Keyword, "end");
}
At this point, we can already read class definitions with constructor statements and functions. Now, let’s parse the class’s instance:
ClassValue
, which will contain the state of each class instance. Classes, unlike functions, should have a permanent MemoryScope
and this scope should be available with all the class's instance arguments and state variables every time we interact with the class instance:public class ClassValue extends IterableValue<ClassDefinition> {
private final MemoryScope memoryScope;
public ClassValue(ClassDefinition definition, MemoryScope memoryScope) {
super(definition);
this.memoryScope = memoryScope;
}
}
2) Next, we provide methods to work with the class’s instance properties using MemoryContext
:
public class ClassValue extends IterableValue<ClassDefinition> {
private final MemoryScope memoryScope;
public ClassValue(ClassDefinition definition, MemoryScope memoryScope) {
super(definition);
this.memoryScope = memoryScope;
}
@Override
public String toString() {
return getValue().getArguments().stream()
.map(t -> t + " = " + getValue(t))
.collect(Collectors.joining(", ", getValue().getName() + " [ ", " ]"));
}
public Value<?> getValue(String name) {
Value<?> result = MemoryContext.getScope().getLocal(name);
return result != null ? result : NULL_INSTANCE;
}
public void setValue(String name, Value<?> value) {
MemoryContext.getScope().setLocal(name, value);
}
}
3) Please notice that by calling theMemoryScope#getLocal()
and MemoryScope#setLocal()
methods, we work with the current layer of MemoryScope
variables. But before accessing the class’s instance state, we need to put its MemoryScope
to the MemoryContext
and release it when we finish:
public class ClassValue extends IterableValue<ClassDefinition> {
...
public Value<?> getValue(String name) {
MemoryContext.pushScope(memoryScope);
try {
Value<?> result = MemoryContext.getScope().getLocal(name);
return result != null ? result : NULL_INSTANCE;
} finally {
MemoryContext.endScope();
}
}
public void setValue(String name, Value<?> value) {
MemoryContext.pushScope(memoryScope);
try {
MemoryContext.getScope().setLocal(name, value);
} finally {
MemoryContext.endScope();
}
}
}
4) Next, we can implement the remaining ClassExpression
that will be used to construct defined class instances during syntax analysis. To declare class’s instance definition, we provide the ClassDefinition
and list of Expression
arguments that will be transformed into the final Value
instances during statement execution:
package org.example.toylanguage.expression;
@RequiredArgsConstructor
@Getter
public class ClassExpression implements Expression {
private final ClassDefinition definition;
private final List<Expression> arguments;
@Override
public Value<?> evaluate() {
...
}
}
5) Let's implement the Expression#evaluate()
method that will be used during execution to create an instance of the previously defined ClassValue
. First, we evaluate the Expression
arguments into the Value
arguments:
@Override
public Value<?> evaluate() {
//initialize class arguments
List<Value<?>> values = arguments.stream()
.map(Expression::evaluate)
.collect(Collectors.toList());
}
6) Next, we create an empty memory scope that should be isolated from the other variables and can contain only the class's instance state variables:
@Override
public Value<?> evaluate() {
//initialize class arguments
List<Value<?>> values = arguments.stream().map(Expression::evaluate).collect(Collectors.toList());
//get class's definition and statement
ClassStatement classStatement = definition.getStatement();
//set separate scope
MemoryScope classScope = new MemoryScope(null);
MemoryContext.pushScope(classScope);
try {
...
} finally {
MemoryContext.endScope();
}
}
7) Next, we create an instance of ClassValue
and write the class’s Value
arguments to the isolated memory scope:
try {
//initialize constructor arguments
ClassValue classValue = new ClassValue(definition, classScope);
IntStream.range(0, definition.getArguments().size()).boxed()
.forEach(i -> MemoryContext.getScope()
.setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));
...
} finally {
MemoryContext.endScope();
}
Please notice that we transformed the Expression
arguments into the Value
arguments before setting up an empty MemoryScope
. Otherwise, we will not be able to access the class’s instance arguments, for example:
8) And finally, we can execute the ClassStatement
. But before that, we should set the class’s DefinitionScope
to be able to access the class’s functions within constructor statements:
try {
//initialize constructor arguments
ClassValue classValue = new ClassValue(definition, classScope);
ClassInstanceContext.pushValue(classValue);
IntStream.range(0, definition.getArguments().size()).boxed()
.forEach(i -> MemoryContext.getScope()
.setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));
//execute function body
DefinitionContext.pushScope(definition.getDefinitionScope());
try {
classStatement.execute();
} finally {
DefinitionContext.endScope();
}
return classValue;
} finally {
MemoryContext.endScope();
ClassInstanceContext.popValue();
}
9*) One last thing, we can make the classes more flexible and allow a user to create class instances before declaring the class definition:
This can be done by delegating the ClassDefinition
initialization to the DefinitionContext
and access it only when we evaluate an expression:
public class ClassExpression implements Expression {
private final String name;
private final List<Expression> arguments;
@Override
public Value<?> evaluate() {
//get class's definition and statement
ClassDefinition definition = DefinitionContext.getScope().getClass(name);
...
}
}
You can do the same delegation for the FunctionExpression to invoke functions before definition.
10) Finally, we can finish reading the class instances with the ExpressionReader
. There is no difference between the previously defined structure instances. We just need to read the Expression
arguments and construct the ClassExpression
:
public class ExpressionReader
...
// read class instance: new Class[arguments]
private ClassExpression readClassInstance(Token token) {
List<Expression> arguments = new ArrayList<>();
if (tokens.peekSameLine(TokenType.GroupDivider, "[")) {
tokens.next(TokenType.GroupDivider, "["); //skip opening square bracket
while (!tokens.peekSameLine(TokenType.GroupDivider, "]")) {
Expression value = ExpressionReader.readExpression(this);
arguments.add(value);
if (tokens.peekSameLine(TokenType.GroupDivider, ","))
tokens.next();
}
tokens.next(TokenType.GroupDivider, "]"); //skip closing square bracket
}
return new ClassExpression(token.getValue(), arguments);
}
}
At this moment, we can create a class and execute the class's constructor statements. But we are still unable to execute the class's functions.
FunctionExpression#evaluate
method that will accept the ClassValue
as a reference to the class instance we want to invoke a function from:package org.example.toylanguage.expression;
public class FunctionExpression implements Expression {
...
public Value<?> evaluate(ClassValue classValue) {
}
}
2) The next step is to transform the function Expression
arguments into the Value
arguments using current MemoryScope
:
public Value<?> evaluate(ClassValue classValue) {
//initialize function arguments
List<Value<?>> values = argumentExpressions.stream()
.map(Expression::evaluate)
.collect(Collectors.toList());
}
3) Next, we need to pass the class's MemoryScope
and DefinitionScope
to the context:
...
//get definition and memory scopes from class definition
ClassDefinition classDefinition = classValue.getValue();
DefinitionScope classDefinitionScope = classDefinition.getDefinitionScope();
//set class's definition and memory scopes
DefinitionContext.pushScope(classDefinitionScope);
MemoryContext.pushScope(memoryScope);
4) Lastly for this implementation, we invoke default FunctionExpression#evaluate(List<Value<?>> values) method and pass evaluated Value
arguments:
public Value<?> evaluate(ClassValue classValue) {
//initialize function arguments
List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());
//get definition and memory scopes from class definition
ClassDefinition classDefinition = classValue.getValue();
DefinitionScope classDefinitionScope = classDefinition.getDefinitionScope();
MemoryScope memoryScope = classValue.getMemoryScope();
//set class's definition and memory scopes
DefinitionContext.pushScope(classDefinitionScope);
MemoryContext.pushScope(memoryScope);
try {
//proceed function
return evaluate(values);
} finally {
DefinitionContext.endScope();
MemoryContext.endScope();
}
}
5) To invoke the class’s function, we will use the double colon ::
operator. Currently, this operator is managed by the ClassPropertyOperator
(StructureValueOperator) implementation, which is responsible for accessing the class’s properties:
Let’s improve it to support function invocations with the same double colon ::
character:
Class’s function can be managed by this operator only when the left expression is the ClassExpression
and the second one is the FunctionExpression
:
package org.example.toylanguage.expression.operator;
public class ClassPropertyOperator extends BinaryOperatorExpression implements AssignExpression {
public ClassPropertyOperator(Expression left, Expression right) {
super(left, right);
}
@Override
public Value<?> evaluate() {
Value<?> left = getLeft().evaluate();
if (left instanceof ClassValue) {
if (getRight() instanceof VariableExpression) {
// access class's property
// new ClassInstance[] :: class_argument
return ((ClassValue) left).getValue(((VariableExpression) getRight()).getName());
} else if (getRight() instanceof FunctionExpression) {
// execute class's function
// new ClassInstance[] :: class_function []
return ((FunctionExpression) getRight()).evaluate((ClassValue) left);
}
}
throw new ExecutionException(String.format("Unable to access class's property `%s``", getRight()));
}
@Override
public void assign(Value<?> value) {
Value<?> left = getLeft().evaluate();
if (left instanceof ClassValue && getRight() instanceof VariableExpression) {
String propertyName = ((VariableExpression) getRight()).getName();
((ClassValue) left).setValue(propertyName, value);
}
}
}
In this part, we implemented classes. We can now create more complex things, such as stack implementation:
main []
fun main []
stack = new Stack []
loop num in 0..5
# push 0,1,2,3,4
stack :: push [ num ]
end
size = stack :: size []
loop i in 0..size
# should print 4,3,2,1,0
print stack :: pop []
end
end
class Stack []
arr = {}
n = 0
fun push [ item ]
this :: arr << item
n = n + 1
end
fun pop []
n = n - 1
item = arr { n }
arr { n } = null
return item
end
fun size []
return this :: n
end
end