I love pet projects, they are great excuse to use libraries and other technologies that you can’t use at work. Lately I’ve been working on a bigger pet project that needs to parse Go files, I’ve used before to make this kind of things but unfortunately, Go target has poor performance. So I began to search for alternatives written in pure Go and came across with , which took a different approach on creating parsers with Go, but before we’re going to understand how this library is different from others, let’s cover some basic concepts about parsing. ANTLR ANTLR’s this one Sorry, Parsing? For the sake of “formality” here is what Wikipedia has to say: Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech). Now, let’s try to break it down by using an example, if you use Go your’e probably familiar with go.mod files; but if you aren’t using it, go.mod files are used to manage your project dependencies. Here is a simple go.mod file: module github.com/matang28/ -latest require ( github.com/some/dependency1 v1 github.com/some/dependency2 v5 ) replace github.com/some/dependency1 => github.com/some/dependency1@dev latest go go 1.13 .2 .3 .1 .0 // indirect before we can parse this file, we need two things: A stream of symbols that will be obtained from the file itself. A grammar definition that will be used by the parser to validate the syntax. The first one is formally called a Lexer (or lexing). Lexers are all about turning strings into a list of predefined symbols that our parser uses, any symbol (or token) used by our language should be identified by the Lexer. In our case we need to capture the following tokens: Keywords: , , , . module go require replace Strings such as: , . github.com/some/dependency1 1.13 Version string such as: , . v1.2.3 v1.4.21-haijkjasd9ijasd Other misc symbols such as: , whitespaces, line-terminators and tabs. => , // , ( , ) The second one will give these tokens a syntactic meaning, you can see that files follows some basic rules: go.mod You have to declare your own module name with the directive. module You may declare you go version using the directive. go You may list your dependencies with the directive. require You may replace certain dependencies with the directive. replace You may exclude certain dependencies with the directive. exclude These rules are often called the grammar of the language (the go.mod language) and there is a more formal ways to represent them, one of them is the EBNF standard: grammar GoMod; sourceFile : moduleDecl stmt* EOF; stmt : (versionDecl | reqDecl | repDecl | excDecl); moduleDecl : STRING; versionDecl : STRING; reqDecl : dependency | dependency* ; repDecl : STRING dependency; excDecl : dependency; dependency : STRING version comment?; version : VERSION | ; comment : STRING; VERSION : [v]STRING+; STRING : [a-zA-Z0- _+.@/-]+; WS : [ \n\r\t\u000C]+ -> skip; // A source file should be structured in the following way: // The module directive: 'module' // The version directive: 'go' // The require directive, it can be either a single line: // require github.com/some/dep1 OR multi line: // require ( // github.com/some/dep1 // github.com/some/dep2 // ) 'require' 'require (' ')' // The replace directive: 'replace' '=>' // The exclude directive: 'exclude' 'latest' '//' 9 Now back to the definition, a Parser takes a stream of tokens (e.g the tokenization of our text file) and will try to match each token against the grammar, if our file doesn’t follow the grammar we will get a parse error but, if our file does follow the grammar we know for sure that the input file is valid (in term of syntax) and we can get the parse tree which we can traverse throughout the matched grammar rules. Parsing is all about making this magic happen… taking a stream of tokens and building a parse tree from it (iff the syntax matches the grammar). Whats so different? At the beginning of the article I’ve said that one library was the spark to create this pet project, why is that? Well parse trees aren’t the most comfortable data structure to extract data from, “walking” on a parse tree can be pretty confusing. Most of the tools available today uses code generation techniques to generate lexers and parsers from a grammar file and, in order to be as generic as possible most of them give you a listener interface which will get notified when the parser enters or exits each grammar rule. Participle uses different approach, one that’s more common for Go developers. It uses Go’s structs system to represent the grammar (or more precisely the parts that we want to capture from the grammar), it also utilizes “struct tags” to define the grammar rules. Wow! that was a mouthful, let’s learn by doing. At first we need some structs that represents a standard `go.mod` file: GoModFile { Module Statements []Statement } Statement { GoVersion * Requirements []Dependency Replacements []Replacement Excludes []Dependency } Dependency { ModuleName Version Comment * } Replacement { FromModule ToModule Dependency } // The root level object that represents a go.mod file type struct string type struct string // A struct that represents a go.mod dependency type struct string string string // A struct that represents a replace directive type struct string In order to write a parser, we need a stream of tokens. Participle got us covered by giving us a regex-based lexer (their docs says that they’ve been working on an EBNF-style one that could be really useful since this is the most cumbersome part of Participle IMO). Let’s see that in action: ( ) iniLexer = lexer.Must(lexer.Regexp( + + + + , )) import "github.com/alecthomas/participle/lexer" // The lexer uses named regex groups to tokenize the input: var /* We want to ignore these characters, so no name needed */ `([\s\n\r\t]+)` /* Parentheses [(,)]*/ `|(?P<Parentheses>[\(\)])` /* Arrow [=>]*/ `|(?P<Arrow>(=>))` /* Version [v STRING] */ `|(?P<Version>[v][a-zA-Z0-9_\+\.@\-\/]+)` /* String [a-z,A-Z,0-9,_,+,.,@,-,/]+ */ `|(?P<String>[a-zA-Z0-9_\+\.@\-\/]+)` Now we can move on to define our grammar, think of it as a combination between the EBNF file and our Go structs representation. Here are some of the basic syntax operators Participle has to offer: Use quotes to match a symbol for example: will try to match the symbol . "module" module Use to capture expression (i.e named group defined in our lexer), for example will match the symbol and extract it into the struct field. @ @String String Use to let the underlaying struct to match the input, this is useful when you have multiple alternatives for a rule. @@ Use the common regex operators to capture groups: to match zero or more, to match at least one, to match zero or one, etc... * + ? GoModFile { Module Statements []Statement } Statement { GoVersion * Requirements []Dependency Replacements []Replacement Excludes []Dependency } Dependency { ModuleName Version Comment * } Replacement { FromModule ToModule Dependency } // The root level object that represents a go.mod file type struct string `"module" @String` `@@*` type struct string `( "go" @String )` `| (("require" "(" @@* ")") | ("require" @@))` `| (("replace" "(" @@* ")") | ("replace" @@))` `| (("exclude" "(" @@* ")") | ("exclude" @@))` // A struct that represents a go.mod dependency type struct string `@String` string `(@Version | @"latest")` string `("//" @String)?` // A struct that represents a replace directive type struct string `@String "=>"` `@@` All we have to do now is to build the parser from our grammar: { p, err := participle.Build(&GoModFile{}, participle.Lexer(iniLexer), ) err != { , err } ast := &GoModFile{} err = p.ParseString(source, ast) ast, err } func Parse (source ) string (*GoModFile, error) if nil return nil return And we’re done, we can parse `go.mod` files now! Let’s write a simple test to make sure everything works as expected: { file, err := Parse( ) assert.Nil(t, err) assert.NotNil(t, file) assert.EqualValues(t, , file.Module) assert.EqualValues(t, , *file.GoVersion) assert.EqualValues(t, , (file.Requirements)) assert.EqualValues(t, , file.Requirements[ ].ModuleName) assert.EqualValues(t, , file.Requirements[ ].Version) assert.EqualValues(t, , *file.Requirements[ ].Comment) assert.EqualValues(t, , file.Requirements[ ].ModuleName) assert.EqualValues(t, , file.Requirements[ ].Version) assert.Nil(t, file.Requirements[ ].Comment) assert.Nil(t, file.Replacements) } func TestParse_WithMultipleRequirements (t *testing.T) ` module github.com/matang28/go-latest go 1.12 require ( github.com/bla1/bla1 v1.23.1 // indirect github.com/bla2/bla2 v2.25.8-20190701-fuasdjhasd8 ) ` "github.com/matang28/go-latest" "1.12" 2 len "github.com/bla1/bla1" 0 "v1.23.1" 0 "indirect" 0 "github.com/bla2/bla2" 1 "v2.25.8-20190701-fuasdjhasd8" 1 1 Where is the pet project? Picking files for the examples wasn’t accidental, the tool I’ve decided to build is a simple automation tool. If you use Go in your organization you probably know that editing file manually can be tedious. go.mod go.mod to the rescue! will scan for files recursively, patching every dependency that matches your query to latest . go-latest go-latest go.mod For example, for this file tree: . ├── go.mod ├── subModule1 │ └── go.mod └── subModule2 └── go.mod If I want to patch dependencies that matches my organization name to their latest version I’ll type: $> go-latest ”organization.com” . See?! Iv’e told you, pet projects are fun :) and I don’t care that we’ve could solve this problem by using simple regex expressions because this is the essence of pet projects! For more info about go-latest . checkout it’s Github page And as always, thanks for reading... Picture Credits (by the order of their appearance): Photo by on Avi Richards Unsplash Photo by on Camylla Battani Unsplash Photo by on Caleb Woods Unsplash Photo by on Curology Unsplash Photo by on Josh Rakower Unsplash