paint-brush
Creating a Scripting Language with ANTLR — Part 2by@thosakwe
3,293 reads
3,293 reads

Creating a Scripting Language with ANTLR — Part 2

by Tobe OsakweMay 12th, 2016
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In <a href="https://medium.com/@thosakwe/creating-a-scripting-language-with-antlr-part-1-1b42c3e4d718" target="_blank">Part 1</a>, we learned the basics of compiler theory, and the role of ANTLR in computerized language recognition. Now, we will move onto using ANTLR to build an AST in code.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Creating a Scripting Language with ANTLR — Part 2
Tobe Osakwe HackerNoon profile picture

In Part 1, we learned the basics of compiler theory, and the role of ANTLR in computerized language recognition. Now, we will move onto using ANTLR to build an AST in code.

If you do not have ANTLR installed already, follow the instructions on the official Getting Started Page.

If you don’t know how to write or read ANTLR 4 grammars, then you can learn via the very detailed ANTLR 4 documentation. This, believe it or not, is not a tutorial on ANTLR syntax itself. Regardless, you will be able to get through reading this guide without really knowing ANTLR at all.

Now that we have ANTLR available on our system, we can design our language and write our grammar.

We will follow the example from Part 1 and write a grammar that runs code of this type:


set sky to "blue"set roses to 3


say "The sky is colored ${sky}."say "I have ${roses} roses."

Our grammar file might look like this:

grammar Foo;







compilationUnit: stmt*;stmt:assignStmt| invocationStmt;assignStmt: SET ID TO expr;invocationStmt: name=ID ((expr COMMA)* expr)?;

expr: ID | INT | STRING;




COMMA: ',';SAY: 'say';SET: 'set';TO: 'to';



INT: [0-9]+;STRING: '"' (~('\n' | '"'))* '"';ID: [a-zA-Z_] [a-zA-Z0-9_]*;

And then we can simply invoke ANTLR from the command line to generate the lexer and parser. At its most basic, we just have to supply the name of the input grammar, and “-Dlanguage=JavaScript” to indicate to ANTLR that we want Javascript output.

antlr4 -Dlanguage=JavaScript Foo.g4

This will leave you with about 5 files, including FooLexer.js, FooParser.js and FooListener.js.

Using this in code is remarkably easy. The following function takes input text and returns the generated AST.



var antlr4 = require('antlr4');var FooLexer = require('./FooLexer').FooLexer;var FooParser = require('./FooParser').FooParser;






function buildAst(inputText) {var chars = new antlr4.InputStream(inputText);var lexer = new FooLexer(chars);var tokens = new antlr4.CommonTokenStream(lexer);var parser = new FooParser(tokens);parser.buildParseTrees = true;

**return** parser.compilationUnit();  

}

module.exports = buildAst;

With ANTLR, it takes a matter of milliseconds to create both a lexer and a parser. Imagine writing these by hand!

In Part 3, we will wrap everything up by spitting out Javascript from our compiler. Stay tuned!

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!