In this part of creating your own programming language, we’ll continue improving our language by implementing nested classes and slightly upgrading the classes introduced in the previous part. Please check out the previous parts:
The full source code is available over on GitHub.
In the first section, we will cover lexical analysis. In short, it’s a process to divide the source code into language lexemes, such as keyword, variable, operator, etc.
You may remember from the previous parts that I was using the regex expressions listed in the TokenType enum to define all the lexeme types:
package org.example.toylanguage.token;
public enum TokenType {
Comment("\\#.*"),
LineBreak("[\\n\\r]"),
Whitespace("[\\s\\t]"),
Keyword("(if|elif|else|end|print|input|class|fun|return|loop|in|by|break|next)(?=\\s|$)"),
GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"),
Logical("(true|false)(?=\\s|$)"),
Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"),
Null("(null)(?=,|\\s|$)"),
This("(this)(?=,|\\s|$)"),
Text("\"([^\"]*)\""),
Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"),
Variable("[a-zA-Z_]+[a-zA-Z0-9_]*");
private final String regex;
}
Let’s look into this nested classes prototype, and add the missing regex expressions into our TokenType
lexemes:
When we create an instance of a nested class, we use the following construction with two expressions and one operator between them:
The left expression class_instance
is an instance of the class we are referring to get a nested class from. The right expression NestedClass [args]
is a nested class with the properties to create an instance.
Lastly, as an operator to create a nested class, I’ll be using the following expression: :: new
, meaning that we refer to the class instance property with the two colon ::
operator, and then we create an instance with the new
operator.
With the current set of lexemes, we only need to add a regular expression for the :: new
operator. This operator can be validated by the following regex expression:
:{2}\\s+new
Let’s add this expression in the Operator
lexeme as OR expression before the :{2}
part standing for accessing class property:
public enum TokenType {
...
Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}\\s+new|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"),
...
}
In the second section, we will convert lexemes received from the lexical analyzer into the final statements following our language rules.
To evaluate math expressions, we’re using Dijkstra's Two-Stack algorithm. Each operation in this algorithm can be presented by a unary operator with one operand or by a binary operator with respectively two operands:
Nested class’s instantiation is a binary operation where the left operand is a class instance we use to refer to the class where the nested class is defined, and the second operand is a nested class we create an instance of:
Let’s create the NestedClassInstanceOperator
implementation by extending
package org.example.toylanguage.expression.operator;
public class NestedClassInstanceOperator extends BinaryOperatorExpression {
public NestedClassInstanceOperator(Expression left, Expression right) {
super(left, right);
}
@Override
public Value<?> evaluate() {
...
}
}
Next, we should complete the evaluate()
method that will perform nested class instantiation:
First, we evaluate the left operand’s expression into the Value
expression:
@Override
public Value<?> evaluate() {
// ClassExpression -> ClassValue
Value<?> left = getLeft().evaluate();
}
Next, we need to evaluate()
the right operand. In this case, we can't directly invoke Expression#evaluate()
because the nested classes' definitions are declared in the parent class's DefinitionScope (in the left operand).
To access the definitions of nested classes, we should create an auxiliary ClassExpression#evaluate(ClassValue)
method that will take the left operand and use its DefinitionScope to access the nested class definition and create an instance of:
@Override
public Value<?> evaluate() {
Value<?> left = getLeft().evaluate();
if (left instanceof ClassValue && getRight() instanceof ClassExpression) {
// instantiate nested class
// new Class [] :: new NestedClass []
return ((ClassExpression) getRight()).evaluate((ClassValue) left);
} else {
throw new ExecutionException(String.format("Unable to access class's nested class `%s``", getRight()));
}
}
Lastly, let’s implement the missing ClassExpression#evaluate(ClassValue)
method.
This implementation will be similar to the ClassExpression#evaluate()
method with the only difference being that we should set the ClassDefinition#getDefinitionScope()
in order to retrieve nested class definitions:
package org.example.toylanguage.expression;
…
public class ClassExpression implements Expression {
private final String name;
private final List<Expression> argumentExpressions;
@Override
public Value<?> evaluate() {
//initialize class arguments
List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());
return evaluate(values);
}
/**
* Evaluate nested class
*
* @param classValue instance of the parent class
*/
public Value<?> evaluate(ClassValue classValue) {
//initialize class arguments
List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList());
//set parent class's definition
ClassDefinition classDefinition = classValue.getValue();
DefinitionContext.pushScope(classDefinition.getDefinitionScope());
try {
return evaluate(values);
} finally {
DefinitionContext.endScope();
}
}
private Value<?> evaluate(List<Value<?>> values) {
//get class's definition and statement
ClassDefinition definition = DefinitionContext.getScope().getClass(name);
ClassStatement classStatement = definition.getStatement();
//set separate scope
MemoryScope classScope = new MemoryScope(null);
MemoryContext.pushScope(classScope);
try {
//initialize constructor arguments
ClassValue classValue = new ClassValue(definition, classScope);
ClassInstanceContext.pushValue(classValue);
IntStream.range(0, definition.getArguments().size()).boxed()
.forEach(i -> MemoryContext.getScope()
.setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE));
//execute function body
DefinitionContext.pushScope(definition.getDefinitionScope());
try {
classStatement.execute();
} finally {
DefinitionContext.endScope();
}
return classValue;
} finally {
MemoryContext.endScope();
ClassInstanceContext.popValue();
}
}
}
All the operators we’re using to evaluate math expressions are stored in the Operator enum with the corresponding precedence, character, and OperatorExpression
type that we refer to calculate the result of each operation:
...
public enum Operator {
Not("!", NotOperator.class, 7),
ClassInstance("new", ClassInstanceOperator.class, 7),
...
private final String character;
private final Class<? extends OperatorExpression> type;
private final Integer precedence;
Operator(String character, Class<? extends OperatorExpression> type, Integer precedence) {
this.character = character;
this.type = type;
this.precedence = precedence;
}
public static Operator getType(String character) {
return Arrays.stream(values())
.filter(t -> Objects.equals(t.getCharacter(), character))
.findAny().orElse(null);
}
...
}
We already have the ClassInstance
value for a regular class’s initialization. Let’s add a new value to manage nested class instances.
The new NestedClassInstance
value will have the same character expression as we defined in the TokenType earlier and the same precedence as the regular class's instance.
For the OperatorExpression type, we will use the previously defined NestedClassInstanceOperator
:
...
public enum Operator {
Not("!", NotOperator.class, 7),
ClassInstance("new", ClassInstanceOperator.class, 7),
NestedClassInstance(":{2}\\s+new", NestedClassInstanceOperator.class, 7),
...
}
You may notice that we don’t have regex expressions in the character property except for this new operator. To read the NestedClassInstance
operator using a regex expression, we should update the Operator#getType()
method to match an operator with a regular expression:
public enum Operator {
...
public static Operator getType(String character) {
return Arrays.stream(values())
.filter(t -> character.matches(t.getCharacter()))
.findAny().orElse(null);
}
...
}
Lastly, we should add two backslashes \\
before a character for operations containing the following symbols: +, *, (, )
to make sure these characters are not treated as regex search symbols:
Multiplication("\\*", MultiplicationOperator.class, 6),
Addition("\\+", AdditionOperator.class, 5),
LeftParen("\\(", 3),
RightParen("\\)", 3),
After we introduced the NestedClassInstance
operator, we should inject it into the ExpressionReader class that actually parses math expressions into operands and operators. We just need to find the line where we read the class instance:
if (!operators.isEmpty() && operators.peek() == Operator.ClassInstance) {
operand = readClassInstance(token);
}
To support reading the NestedClassInstance
operator, we add the corresponding condition for the current operator in the operator’s stack:
if (!operators.isEmpty() && (operators.peek() == Operator.ClassInstance || operators.peek() == Operator.NestedClassInstance)) {
operand = readClassInstance(token);
}
The readClassInstance()
method will read a nested class's declaration with properties the same way it reads a regular class declaration. This method returns ClassExpression
instance as a whole operand expression.
Everything is ready now. In this part, we implemented nested classes as one more step toward making a complete programming language.