In this part of creating your own programming language, we’ll continue improving our language by implementing nested classes and slightly upgrading the classes introduced in the previous part. Please check out the previous parts: Building Your Own Programming Language From Scratch Building Your Own Programming Language From Scratch: Part II - Dijkstra's Two-Stack Algorithm Build Your Own Programming Language Part III: Improving Lexical Analysis with Regex Lookaheads Building Your Own Programming Language From Scratch Part IV: Implementing Functions Building Your Own Programming Language From Scratch: Part V - Arrays Building Your Own Programming Language From Scratch: Part VI - Loops Building Your Own Programming Language From Scratch: Part VII - Classes The full source code is available . over on GitHub 1 Lexical Analysis In the first section, we will cover lexical analysis. In short, it’s a process to divide the source code into language lexemes, such as keyword, variable, operator, etc. You may remember from the previous parts that I was using the regex expressions listed in the enum to define all the lexeme types: TokenType package org.example.toylanguage.token; public enum TokenType { Comment("\\#.*"), LineBreak("[\\n\\r]"), Whitespace("[\\s\\t]"), Keyword("(if|elif|else|end|print|input|class|fun|return|loop|in|by|break|next)(?=\\s|$)"), GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"), Logical("(true|false)(?=\\s|$)"), Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"), Null("(null)(?=,|\\s|$)"), This("(this)(?=,|\\s|$)"), Text("\"([^\"]*)\""), Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"), Variable("[a-zA-Z_]+[a-zA-Z0-9_]*"); private final String regex; } Let’s look into this nested classes prototype, and add the missing regex expressions into our lexemes: TokenType When we create an instance of a nested class, we use the following construction with two expressions and one operator between them: The left expression is an instance of the class we are referring to get a nested class from. The right expression is a nested class with the properties to create an instance. class_instance NestedClass [args] Lastly, as an operator to create a nested class, I’ll be using the following expression: , meaning that we refer to the class instance property with the two colon operator, and then we create an instance with the operator. :: new :: new With the current set of lexemes, we only need to add a regular expression for the operator. This operator can be validated by the following regex expression: :: new :{2}\\s+new Let’s add this expression in the lexeme as OR expression before the part standing for accessing class property: Operator :{2} public enum TokenType { ... Operator("(\\+|-|\\*|/{1,2}|%|>=|>|<=|<{1,2}|={1,2}|!=|!|:{2}\\s+new|:{2}|\\(|\\)|(new|and|or)(?=\\s|$))"), ... } 2 Syntax Analysis In the second section, we will convert lexemes received from the lexical analyzer into the final statements following our language rules. 2.1 OperatorExpression To evaluate math expressions, we’re using . Each operation in this algorithm can be presented by a unary operator with one operand or by a binary operator with respectively two operands: Dijkstra's Two-Stack algorithm Nested class’s instantiation is a binary operation where the left operand is a class instance we use to refer to the class where the nested class is defined, and the second operand is a nested class we create an instance of: Let’s create the implementation by extending NestedClassInstanceOperator : BinaryOperatorExpression package org.example.toylanguage.expression.operator; public class NestedClassInstanceOperator extends BinaryOperatorExpression { public NestedClassInstanceOperator(Expression left, Expression right) { super(left, right); } @Override public Value<?> evaluate() { ... } } Next, we should complete the method that will perform nested class instantiation: evaluate() First, we evaluate the left operand’s expression into the expression: Value @Override public Value<?> evaluate() { // ClassExpression -> ClassValue Value<?> left = getLeft().evaluate(); } Next, we need to the right operand. In this case, we can't directly invoke because the nested classes' definitions are declared in the parent class's DefinitionScope (in the left operand). evaluate() Expression#evaluate() To access the definitions of nested classes, we should create an auxiliary method that will take the left operand and use its DefinitionScope to access the nested class definition and create an instance of: ClassExpression#evaluate(ClassValue) @Override public Value<?> evaluate() { Value<?> left = getLeft().evaluate(); if (left instanceof ClassValue && getRight() instanceof ClassExpression) { // instantiate nested class // new Class [] :: new NestedClass [] return ((ClassExpression) getRight()).evaluate((ClassValue) left); } else { throw new ExecutionException(String.format("Unable to access class's nested class `%s``", getRight())); } } Lastly, let’s implement the missing method. ClassExpression#evaluate(ClassValue) This implementation will be similar to the method with the only difference being that we should set the in order to retrieve nested class definitions: ClassExpression#evaluate() ClassDefinition#getDefinitionScope() package org.example.toylanguage.expression; … public class ClassExpression implements Expression { private final String name; private final List<Expression> argumentExpressions; @Override public Value<?> evaluate() { //initialize class arguments List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList()); return evaluate(values); } /** * Evaluate nested class * * @param classValue instance of the parent class */ public Value<?> evaluate(ClassValue classValue) { //initialize class arguments List<Value<?>> values = argumentExpressions.stream().map(Expression::evaluate).collect(Collectors.toList()); //set parent class's definition ClassDefinition classDefinition = classValue.getValue(); DefinitionContext.pushScope(classDefinition.getDefinitionScope()); try { return evaluate(values); } finally { DefinitionContext.endScope(); } } private Value<?> evaluate(List<Value<?>> values) { //get class's definition and statement ClassDefinition definition = DefinitionContext.getScope().getClass(name); ClassStatement classStatement = definition.getStatement(); //set separate scope MemoryScope classScope = new MemoryScope(null); MemoryContext.pushScope(classScope); try { //initialize constructor arguments ClassValue classValue = new ClassValue(definition, classScope); ClassInstanceContext.pushValue(classValue); IntStream.range(0, definition.getArguments().size()).boxed() .forEach(i -> MemoryContext.getScope() .setLocal(definition.getArguments().get(i), values.size() > i ? values.get(i) : NullValue.NULL_INSTANCE)); //execute function body DefinitionContext.pushScope(definition.getDefinitionScope()); try { classStatement.execute(); } finally { DefinitionContext.endScope(); } return classValue; } finally { MemoryContext.endScope(); ClassInstanceContext.popValue(); } } } 2.2 Operator All the operators we’re using to evaluate math expressions are stored in the enum with the corresponding precedence, character, and type that we refer to calculate the result of each operation: Operator OperatorExpression ... public enum Operator { Not("!", NotOperator.class, 7), ClassInstance("new", ClassInstanceOperator.class, 7), ... private final String character; private final Class<? extends OperatorExpression> type; private final Integer precedence; Operator(String character, Class<? extends OperatorExpression> type, Integer precedence) { this.character = character; this.type = type; this.precedence = precedence; } public static Operator getType(String character) { return Arrays.stream(values()) .filter(t -> Objects.equals(t.getCharacter(), character)) .findAny().orElse(null); } ... } We already have the value for a regular class’s initialization. Let’s add a new value to manage nested class instances. ClassInstance The new value will have the same character expression as we defined in the earlier and the same precedence as the regular class's instance. NestedClassInstance TokenType For the OperatorExpression type, we will use the previously defined : NestedClassInstanceOperator ... public enum Operator { Not("!", NotOperator.class, 7), ClassInstance("new", ClassInstanceOperator.class, 7), NestedClassInstance(":{2}\\s+new", NestedClassInstanceOperator.class, 7), ... } You may notice that we don’t have regex expressions in the character property except for this new operator. To read the operator using a regex expression, we should update the method to match an operator with a regular expression: NestedClassInstance Operator#getType() public enum Operator { ... public static Operator getType(String character) { return Arrays.stream(values()) .filter(t -> character.matches(t.getCharacter())) .findAny().orElse(null); } ... } Lastly, we should add two backslashes before a character for operations containing the following symbols: to make sure these characters are not treated as regex search symbols: \\ +, *, (, ) Multiplication("\\*", MultiplicationOperator.class, 6), Addition("\\+", AdditionOperator.class, 5), LeftParen("\\(", 3), RightParen("\\)", 3), After we introduced the operator, we should inject it into the class that actually parses math expressions into operands and operators. We just need to find the line where we read the class instance: NestedClassInstance ExpressionReader if (!operators.isEmpty() && operators.peek() == Operator.ClassInstance) { operand = readClassInstance(token); } To support reading the operator, we add the corresponding condition for the current operator in the operator’s stack: NestedClassInstance if (!operators.isEmpty() && (operators.peek() == Operator.ClassInstance || operators.peek() == Operator.NestedClassInstance)) { operand = readClassInstance(token); } The method will read a nested class's declaration with properties the same way it reads a regular class declaration. This method returns instance as a whole operand expression. readClassInstance() ClassExpression 3. Wrap Up Everything is ready now. In this part, we implemented nested classes as one more step toward making a complete programming language.
Share Your Thoughts