The inspiration for this post came from a project at work. I was building a service that required the comparison of two Json objects. The catch was that I needed to be able to replace keys, filter out paths, and apply comparison functions to specific nodes.
Obviously, a standard library comparison function such as reflect.DeepEqual()
would not work. 😧
The solution was to build a AST(Abstract Syntax Tree) modeled off of the Json objects. Every Node
in the tree represents either a string
, integer
, array
, or object
.
By doing this I would allow for the flexibility to more easily apply algorithms onto the data.
To build this we’ll start with the Lexer to generate Tokens. Then move onto the Parser which will take the tokens and match them to Json grammar. Finally, we’ll add AST hooks to generate the tree.
The final directory structure:
.main.go/lexerlexer.golexer_test.go/tokentoken.go/parserparser.go/astast.go
If you want to see and run the final results:
cd $GOPATH/src/github.com/Lebonescogit clone https://github.com/Lebonesco/json_parser.gitgo run main.go ./examples/test.json
The lexer’s job is to take in the json data and convert it into a stream of tokens
. These tokens include: INVALID
, EOF
, COMMA
, COLON
, LBRACE
, RBRACE
, LBRACKET
, RBRACKET
, STRING
, and INTEGER
.
Note: A lexer is also referred to as a scanner.
Let’s start down below 👇
cd $GOPATH/src/github.com/Lebonesco/json_parsermkdir tokencd tokentouch token.go
You have some freedom in how you want to define your tokens
. The more data you add to a token
the easier it is to debug.
Note: We’ll be using a rune
array, []rune
, to store our token
literals to allow for Unicode characters.
Next, let’s jump into our lexer
👍
mkdir lexercd lexertouch lexer.gotouch lexer_test.go
The lexer
will track where we are in the input and necessary character look aheads.
In terms of functionality, it will need to be able to create a new token
and peak ahead to the next one.
Note: This scanner doesn’t support Boolean values, but they could easily be added.
Here we will take in a json string and make sure that it outputs the correct stream of tokens.
To run the test:
go test -v=== RUN TestLexer--- PASS: TestLexer (0.00s)PASSok github.com/Lebonesco/json_parser/lexer 0.433s
You now have a working lexer 🎉 🎉 🎉
This is the part where we take our stream and match it with json grammar to produce AST nodes.
If we were to define json in terms of regular expressions it would be represented by the grammar defined below 👇
JSON : valueValue : Object | Array | String | Integer | Bool | NullArray : '[' [Value] {',' Value}']'Object : '{' [Property] {',' Property} '}'Property : String ':' Value
In the above syntax, [expression]
means the expression occurs one or more times and {expression}
means that it occurs zero or more times.
There are tools like gocc
that will generate a lexer and/or parser if you provide the regular expressions. That’s the recommended way to go if you’re dealing with something more complicated.
But since Json is fairly simple, we’ll do it by hand! 👐
Let’s construct the AST nodes
that will represent our final results.
mkdir astcd asttouch ast.go
The nodes
are fairly simple. If we cared more about error handling and tracking node
hashes, like in my use case, we could store more data.
Note: Because Go uses composition
instead of inheritance
we need to apply the TokenLiteral()
method to each node
type in order for each to be interpreted as a Json
node.
Now onto the Parser!
Let’s bring it all together and write our driver, main.go
.
Note: json.MarshalIndent()
is a nice substitute to json.Marshal()
to get prettier json outputs.
To run:
go run main.go ./exampes/test.json
All Done 😄
Now that we can generate an AST, it’s easy to add a rolling hash to each node and do all of the other custom operations such as:
I’ll leave those extensions to the reader. Feel free to drop links to your work in the comments. I’d love to see what you come up with. 👍
Thank you for taking the time to read this article.
If you found it helpful or interesting please let me know 👏👏👏.