In this part of creating your programming language, we will implement loops. Please see the previous parts: Building Your Own Programming Language From Scratch Building Your Own Programming Language From Scratch: Part II - Dijkstra's Two-Stack Algorithm Build Your Own Programming Language Part III: Improving Lexical Analysis with Regex Lookaheads Building Your Own Programming Language From Scratch Part IV: Implementing Functions Building Your Own Programming Language From Scratch: Part V - Arrays The full source code is available over on . GitHub Our language will provide the following types of loops: loop - to iterate through a range: For loop - to iterate as long as a specified condition is true: While loop - to iterate through elements of arrays and structures: For-each (Iterable) 1 Lexical analysis In the first section, we will cover lexical analysis. It’s a process to divide the source code into language lexemes, such as keyword, identifier/variable, operator, etc. To make our lexical analysis work with loop constructions, we add the following lexemes: , , to the TokenType enum: Keyword loop in by package org.example.toylanguage.token; ... public enum TokenType { ... Keyword("(if|elif|else|then|end|print|input|struct|fun|return|loop|in|by)(?=\\s|$)"), ... } Then, in order to separate the lower and upper bounds of the For loop range, we add the two-dot lexeme: GroupDivider [.]{2} ... public enum TokenType { ... GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"), ... } And finally, if we don’t want the lexeme to catch the two-dot lexeme declared after the loop’s lower bound, we add a negative look ahead construction before the dot expression: Numeric GroupDivider (?![.]{2}) ... public enum TokenType { ... Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"), ... } 2 Syntax analysis Let’s move to the next section - the syntax analysis, which will convert the lexemes received from the previous step into the final statements following our language rules. As was said at the beginning, our language will support three types of loops: For, While and Iterable. Therefore, first we will declare the class, which will contain common execution process for all the loop types: AbstractLoopStatement package org.example.toylanguage.statement.loop; public abstract class AbstractLoopStatement extends CompositeStatement { protected abstract void init(); protected abstract boolean hasNext(); protected abstract void preIncrement(); protected abstract void postIncrement(); } Inside this class, we declare the following abstract methods, which can have different implementation for each loop type: - it will be invoked to perform initialization operations, like initializing a counter variable when a loop starts the execution. init() - It will be called before each iteration to define that we still have remaining items to iterate. hasNext() - It’s a prefix operation that will be called before each iteration. preIncrement() - It’s a postfix operation that will be called at the end of each iteration. postIncrement() Before creating the loop implementations, we complete the method by utilizing declared abstract methods. To start the execution, we invoke the method to initialize the loop state, but before that, we should open the new MemoryScope for the variables declared within the loop, as they shouldn’t be accessed after the loop ends. In the end of the execution, we release the earlier opened MemoryScope: Statement#execute() init() MemoryContext.newScope(); try { init(); } finally { MemoryContext.endScope(); } Next, to iterate a loop, we utilize the abstract method. Inside each iteration, we invoke the loop’s inner statements, defined within the inherited CompositeStatement class. When the iteration starts, we invoke the method to perform the prefix increment operations. At the end of each iteration, we call the method to perform the postfix increment operations accordingly: hasNext() preIncrement() postIncrement() while (hasNext()) { preIncrement(); try { // execute inner statements for (Statement statement : getStatements2Execute()) { statement.execute(); } } finally { postIncrement(); } } Next, to make sure that variables declared inside the loop statements are isolated from each iteration, we open the inner MemoryScope during the statements' execution. And finally, if our loop is defined inside the function and one of the loop’s inner statements contains a return statement, we should stop the iteration immediately because it’s a sign to exit a function. Therefore, we validate that the function's return statement hasn’t been called after each statement execution. The complete method: AbstractLoopStatement#execute() public abstract class AbstractLoopStatement extends CompositeStatement { ... @Override public void execute() { MemoryContext.newScope(); try { init(); while (hasNext()) { preIncrement(); MemoryContext.newScope(); try { // execute inner statements for (Statement statement : getStatements2Execute()) { statement.execute(); if (ReturnContext.getScope().isInvoked()) return; } } finally { MemoryContext.endScope(); // release each iteration memory postIncrement(); } } } finally { MemoryContext.endScope(); // release loop memory } } } 2.1 For loop Let’s begin with the first For loop implementation. By our language design, it will contain the variable expression, the lower bound, the upper bound, and the step expression: package org.example.toylanguage.statement.loop; @RequiredArgsConstructor public class ForLoopStatement extends AbstractLoopStatement { private final VariableExpression variable; private final Expression lowerBound; private final Expression uppedBound; private final Expression step; } Next, we implement the abstract methods. Within the first method, we assign the lower bound to the variable by evaluating an instance of AssignmentOperator, which accepts the expression as a first operand and the expression as a second operand: init() variable lowerBound ... public class ForLoopStatement extends AbstractLoopStatement { ... @Override protected void init() { new AssignmentOperator(variable, lowerBound) .evaluate(); } } Next, we override the . It should return a boolean value if we remaining have items. To make a compare operation, we create an instance of the LessThenOperator by passing the expression as a first operand, and the as a second operand. As a result, we expect to get the LogicalValue with boolean value inside it: hasNext() variable upperBound ... public class ForLoopStatement extends AbstractLoopStatement { ... @Override protected boolean hasNext() { LessThanOperator hasNext = new LessThanOperator(variable, uppedBound); Value<?> value = hasNext.evaluate(); return value instanceof LogicalValue && ((LogicalValue) value).getValue(); } } Lastly, to perform the increment operations for each iteration, we need to implement the and methods. The For loop should use the step increment only at the end of each iteration. Therefore, we perform the increment only within the method. To implement the increment operation, we create an instance of the AdditionOperator, which will sum up the value with the value, and assign the result to the expression again using an instance of the AssignmentOperator. If the step expression hasn’t been provided, we accumulate the NumericValue instance with the default value equal to : preIncrement() postIncrement() postIncrement() variable step variable 1.0 ... public class ForLoopStatement extends AbstractLoopStatement { ... @Override protected void preIncrement() { } @Override protected void postIncrement() { AdditionOperator stepOperator = new AdditionOperator(variable, Objects.requireNonNullElseGet(step, () -> new NumericValue(1.0))); new AssignmentOperator(variable, stepOperator) .evaluate(); } } 2.2 While loop The next loop implementation is the While loop. It’s more straightforward than the For loop, as we don’t need to perform addition and assignment operations at all. This operator will contain only the expression, which will only serve to calculate the result. The other overridden methods will be empty: hasNext hasNext() package org.example.toylanguage.statement.loop; @RequiredArgsConstructor public class WhileLoopStatement extends AbstractLoopStatement { private final Expression hasNext; @Override protected void init() { } @Override protected boolean hasNext() { Value<?> value = hasNext.evaluate(); return value instanceof LogicalValue && ((LogicalValue) value).getValue(); } @Override protected void preIncrement() { } @Override protected void postIncrement() { } } 2.3 Iterable loop The last loop implementation is the Iterable loop. It will allow us to perform the for-each iterations with arrays and structures. First, we define the new Value implementation, which will override the interface and, accordingly, will provide us with the implementation: java.util.Iterable java.util.Iterator package org.example.toylanguage.expression.value; public abstract class IterableValue<T> extends Value<T> implements Iterable<Value<?>> { public IterableValue(T value) { super(value); } } Next, we extend the ArrayValue by implementing IterableValue and passing array values to the Iterator instance: package org.example.toylanguage.expression.value; public class ArrayValue extends IterableValue<List<Value<?>>> { ... @Override public Iterator<Value<?>> iterator() { return getValue().iterator(); } } And we do the same extension for StructureValue as well. For now, we will iterate only the structure’s values. Later, we can introduce the dictionary data type to operate with key-value types: package org.example.toylanguage.expression.value; public class StructureValue extends IterableValue<Map<String, Value<?>>> { ... @Override public Iterator<Value<?>> iterator() { return getValue().values().iterator(); } } Now we have the two IterableValue implementations which can provide us with the Iterator instance, we can complete the Iterable loop with the corresponding implementation. It will contain the expression and the expression. Also, we declare an auxiliary non-final field that will be used to iterate the loop: variable iterable Iterator<Value<?> package org.example.toylanguage.expression.value; @RequiredArgsConstructor public class IterableLoopStatement extends AbstractLoopStatement { private final VariableExpression variable; private final Expression iterableExpression; private Iterator<Value<?>> iterator; } Next, we override and resolve the abstract methods. Inside the method, we initialize the field using the IterableValue’s method: init() iterator iterator() ... public class IterableLoopStatement extends AbstractLoopStatement { ... @Override protected void init() { Value<?> value = iterableExpression.evaluate(); if (!(value instanceof IterableValue)) throw new ExecutionException(String.format("Unable to loop non IterableValue `%s`", value)); this.iterator = ((IterableValue<?>) value).iterator(); } } The method will utilize the implementation. Lastly, we should perform the prefix increment operation to know the value that is currently being iterated. Within , we create an instance of the AssignmentOperator, which will accept the as a first operand, and the value received from the iterator as a second operand: hasNext() Iterator#hasNext() preIncrement() variable ... public class IterableLoopStatement extends AbstractLoopStatement { ... @Override protected boolean hasNext() { return iterator.hasNext(); } @Override protected void preIncrement() { Value<?> next = iterator.next(); new AssignmentOperator(variable, next) .evaluate(); } @Override protected void postIncrement() { } } 2.4 StatementParser To make the loop statements work, we need to map the loop lexemes to the defined statements using the StatementParser class. All the loops defined by the language grammar start with the keyword. Therefore, inside the parseExpression() method, we add a new case for the corresponding Keyword lexeme: loop loop package org.example.toylanguage; public class StatementParser { ... private Statement parseExpression() { Token token = tokens.next(TokenType.Keyword, TokenType.Variable, TokenType.Operator); switch (token.getType()) { case Variable: case Operator: ... case Keyword: switch (token.getValue()) { case "print": ... case "input": ... case "if": ... case "struct": ... case "fun": ... case "return": ... case "loop": ... } default: ... } } } After the Keyword, we read the next expression using the Dijkstra Two-Stack algorithm defined within the ExpressionReader class. If we read the For or the Iterable loop, we expect this expression to be only a variable. If we read the While loop, this expression will act as a loop condition, which can be an instance of any Expression implementation, including operator or function invocation. Therefore, we split the execution. In case the expression is the VariableExpression instance and the next lexeme is the Keyword, we continue to read the For and Iterable loop. In the negative case, we build the WhileLoopStatement using the condition expression: loop in case "loop": { Expression loopExpression = new ExpressionReader().readExpression(); AbstractLoopStatement loopStatement; if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) { // loop <variable> in <bounds> } else { // loop <condition> loopStatement = new WhileLoopStatement(loopExpression); } Inside the For and Iterable loop clause, we skip the next Keyword lexeme, and then read the following bounds expression. This expression can be an instance of any Expression implementation which can be evaluated to the lower bound value or IterableValue only when the code will be executed. To define which loop type it is, we can rely on the next lexeme. In the case of the For loop, after the lower bound expression, we expect to read the two-dot GroupDivider as a spliterator for lower and upper bounds. In the negative condition, we build the IterableLoopStatement: in if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) { // loop <variable> in <bounds> VariableExpression variable = (VariableExpression) loopExpression; tokens.next(TokenType.Keyword, "in"); Expression bounds = new ExpressionReader().readExpression(); if (tokens.peek(TokenType.GroupDivider, "..")) { // loop <variable> in <lower_bound>..<upper_bound> } else { // loop <variable> in <iterable> loopStatement = new IterableLoopStatement(variable, bounds); } } To build the last For loop statement, we skip the two-dot GroupDivider and read the upper bound expression. In our language grammar, for the For loop, we can define the step expression with the keyword. Therefore, if the next expression is the lexeme, we read the following step expression. At the end, we build an instance of the ForLoopStatement: by by if (tokens.peek(TokenType.GroupDivider, "..")) { // loop <variable> in <lower_bound>..<upper_bound> tokens.next(TokenType.GroupDivider, ".."); Expression upperBound = new ExpressionReader().readExpression(); Expression step = null; if (tokens.peek(TokenType.Keyword, "by")) { // loop <variable> in <lower_bound>..<upper_bound> by <step> tokens.next(TokenType.Keyword, "by"); step = new ExpressionReader().readExpression(); } loopStatement = new ForLoopStatement(variable, bounds, upperBound, step); } We have read every loop type declaration. Finally, we can read the body of the loop. To gather all internal statements, we invoke the recursive method, which will return an instance of the Statement interface. We should stop the process when we reach the finalizing lexeme: StatementParser#parseExpression() end while (!tokens.peek(TokenType.Keyword, "end")) { Statement statement = parseExpression(); loopStatement.addStatement(statement); } At the end, we skip this keyword and return the loopStatement. The complete clause: end loop case "loop": { Expression loopExpression = new ExpressionReader().readExpression(); AbstractLoopStatement loopStatement; if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) { // loop <variable> in <bounds> VariableExpression variable = (VariableExpression) loopExpression; tokens.next(TokenType.Keyword, "in"); Expression bounds = new ExpressionReader().readExpression(); if (tokens.peek(TokenType.GroupDivider, "..")) { // loop <variable> in <lower_bound>..<upper_bound> tokens.next(TokenType.GroupDivider, ".."); Expression upperBound = new ExpressionReader().readExpression(); Expression step = null; if (tokens.peek(TokenType.Keyword, "by")) { // loop <variable> in <lower_bound>..<upper_bound> by <step> tokens.next(TokenType.Keyword, "by"); step = new ExpressionReader().readExpression(); } loopStatement = new ForLoopStatement(variable, bounds, upperBound, step); } else { // loop <variable> in <iterable> loopStatement = new IterableLoopStatement(variable, bounds); } } else { // loop <condition> loopStatement = new WhileLoopStatement(loopExpression); } while (!tokens.peek(TokenType.Keyword, "end")) { Statement statement = parseExpression(); loopStatement.addStatement(statement); } tokens.next(TokenType.Keyword, "end"); return loopStatement; } 3 Tests We have finished embedding loops in both lexical and syntax analyzers. Now we should be able to perform For, While and Iterable loops. Let’s create and test a more real task - the bubble sort algorithm: fun bubble_sort [ arr, n ] loop i in 0..n - 1 loop j in 0..n - i - 1 if arr{j+1} < arr{j} then temp = arr{j} arr{j} = arr{j + 1} arr{j + 1} = temp end end end end fun is_sorted [ arr, n ] loop i in 0..n - 1 if arr{i} > arr{i+1} then return false end end return true end arr = {} arr_len = 20 loop i in 0..arr_len by 1 arr << i * 117 % 17 - 1 end print arr # [-1, 14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1, -1, 14, 12] print is_sorted [ arr, arr_len ] # false bubble_sort [ arr, arr_len ] print arr # [-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 15] print is_sorted [ arr, arr_len ] # true