Hello everyone, in the last article I introduced the Turtle Graphics Android App project with implementation details and resources about the scripting language, editor, generating documentation…etc, after I published the app and got more than 2000 downloads a few times and good ratings and feedback, I decided to add support for code formatting and in this article, I will talk in detail how a simple code formatter works and how I implemented it in the Turtle Graphics app.
As programmers, Code formatters are an essential tool in our day-to-day jobs, They make it easier to read the code if it is formatted, but did you ask yourself how it works?
Before talking about Code formatter, let’s first talk about how Compilers represent your code from text to data structure to do the process on it such as type checking.
Let’s start our story from your file that contains a simple hello world example
fun main() {
print("Hello, World!")
}
The first step is to read this text file and convert it into a list of tokens, A token is a class that represents a keyword, number, bracket, string, …etc with this position in the source code for example
data class Token (
val kind : TokenKind,
val literal : String,
val line : Int,
)
We can also save the file name, column start and end so when we want to report an error we can provide useful info about the position for example
Error in File Main Line 10: Missing semicolon :D
This step is called scanner, lexer or tokenizer and at the end, we will end up with a List of tokens for example
{ FUN_KEYWORD, "fun", 1 }
{ IDENTIFIER, "main", 1 }
{ LEFT_PAREN, "(", 1 }
{ RRIGHT_PAREN, ")", 1 }
{ LEFT_BRACE, "{", 1 }
{ IDENTIFIER, "print", 2 }
{ LEFT_PAREN, "(", 2 }
{ STRING, "Hello, World!", 2 }
{ RRIGHT_PAREN, ")", 2 }
{ RIGHT_BRACE, "}", 3 }
The result is a list of tokens
val tokens : List<Token> = tokenizer(input)
Note that in this step we can check for some errors such as unterminated string or char, unsupported symbols …etc
After this step, you will forget your text file and deal with this list of tokens, and now we should convert some tokens into nodes depending on our language grammar for when we saw FUN_KEYWORD that means we will build a function declaration node and we expect name, paren, parameters …etc
In this step, we need a data structure to represent the program in a way we can traverse and validate it later and it is called Abstract Syntax Tree (AST), each node in AST represent statement such as If, While, Function declaration, var declaration …etc or expressions such as assignments, unary …etc, each node store required information to use them later in the next steps for example
Function Declaration
data class Function (
var name : String,
var arguments : List<Argument>,
var body : List<Statement>
)
Variable Declaration
data class Var (
var name : String
var value : Expression
)
This step is called parsing and we will end up with an AST object that we can use later to traverse all nodes.
var astNode = parse(tokens)
If the language statically types such as Java, C, Go …etc we will go to the Type Checker step, the goal for this step is to check that the user use type correctly for example, if the user declares a variable with int type it should store only integers on it, the if condition must be a boolean type or an integer in a language like C …etc
For example suppose that we want all developers to declare variables without using _ inside the name, to check that we will traverse our AST node to find all Var nodes and check them
fun checkVarDeclaration(node : Var) {
if (node.name.contains("_") {
reportError("Ops your variable name ${node.name} contains _")
}
}
But now we need to format it, so how to do that? It's the same we traverse our AST and for each node, we will write it back to text but formatted for example
fun formatVarDeclaration(node : Var) : String {
var builder = StringBuilder()
builder.append(indentation)
builder.append("var ")
builder.append(node.name)
builder.append(" = ")
builder.append(formatValue(node.value))
builder.append("\n")
return builder.toString()
}
In this simple method, we rewrite the node to string but with correct indentations and add a new line after it so now 2 variables are declared in the same line, the value also is formatted using another function you can use Visitor design pattern to make it easy to handle all nodes.
At the end of this step, we end up with a string that represents the same input file but formatted and then we write it back to the file.
This is the basic implementation of code formatter, a real production code formatter must handle more cases, for example, what if the code is not valid?, should I format only valid code? should we read the whole program every time we want to format or compile the code?
Now back to Turtle graphics, In this project i already did all the required steps before and has a ready AST, so i just rewrite it with code as you saw above ^_^ i read it from the UI format it and write it back to UI in my case
If you are interested and want to read more I suggest
I hope you enjoyed my article and you can find me on
Enjoy Programming 😋.