How to create language that compiles to JavaScriptโ€‚by@jcubic

How to create language that compiles to JavaScript

image
Jakub T. Jankiewicz Hacker Noon profile picture

Jakub T. Jankiewicz

Front-end developer with Backend, SEO, and security skills. Open Source maintainer. Blogger. Polish Wikipedia redactor.

Did you ever want to create your own programming language? In this article, I will demonstrate how to quickly write simple language that compiles to JavaScript using free tools and PEG.js parser generator. This article will show you everything that is needed to quickly create your own programming language.

What is Parser Generator

Parser generator as the name suggests is a program that generates the source code of a parser for you based on grammar (language specification). Written in a specific syntax. In this article, we will use PEG.js parser generator that generates a JavaScript code that will parse the code for your language and output AST.

What is AST

AST is an acronym for Abstract Syntax Tree. It's the way to represent code in a format that tools can understand. We will use AST in the format of an Esprima, which is a JavaScript parser that outputs AST.

JavaScript Code generation

What's cool about Esprima syntax is that there are tools that generate code based on their AST. An example is escodegen that takes Esprima AST as input and output JavaScript code. You can think that you can use just strings to generate code, but this solution will not scale. In this tutorial, I show only single if statement but you will run into a lot of problems if you will have more complex code.

Simple PEG.js parser example

Here I will show you how to create simple parser grammar for if statement that will output AST that then later will be translated into JavaScipt code.

The syntax of PEG.js is not very complicated, it consists of the name of the rule, then the matching and block of JavaScript that is executed and returned from the rule.

Here is a simple example provided by PEG.js documentation:

{
  function makeInteger(o) {
    return parseInt(o.join(""), 10);
  }
}

start
  = additive

additive
  = left:multiplicative "+" right:additive { return left + right; }
  / multiplicative

multiplicative
  = left:primary "*" right:multiplicative { return left * right; }
  / primary

primary
  = integer
  / "(" additive:additive ")" { return additive; }

integer "integer"
  = digits:[0-9]+ { return makeInteger(digits); }

The output parser from this grammar can parse and evaluate simple arithmetic expressions for example

10+2*3
that evaluates to
16
. You can test this parser at PEG.js Online Tool.

But what we need is not to interpret the code and return a single value but return Esprima AST. To see how Esprima AST looks like you can check AST Explorer select Esprima as output and type some JavaScript.

Here is an example of simple code like this:

if (foo == "bar") {
   10 + 10
   10 * 20
}

The output in JSON format looks like this:

{
  "type": "Program",
  "body": [
    {
      "type": "IfStatement",
      "test": {
        "type": "BinaryExpression",
        "operator": "==",
        "left": {
          "type": "Identifier",
          "name": "foo",
          "range": [
            4,
            7
          ]
        },
        "right": {
          "type": "Literal",
          "value": "bar",
          "raw": "\"bar\"",
          "range": [
            11,
            16
          ]
        },
        "range": [
          4,
          16
        ]
      },
      "consequent": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "BinaryExpression",
              "operator": "+",
              "left": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  23,
                  25
                ]
              },
              "right": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  28,
                  30
                ]
              },
              "range": [
                23,
                30
              ]
            },
            "range": [
              23,
              30
            ]
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "BinaryExpression",
              "operator": "*",
              "left": {
                "type": "Literal",
                "value": 10,
                "raw": "10",
                "range": [
                  34,
                  36
                ]
              },
              "right": {
                "type": "Literal",
                "value": 20,
                "raw": "20",
                "range": [
                  39,
                  41
                ]
              },
              "range": [
                34,
                41
              ]
            },
            "range": [
              34,
              41
            ]
          }
        ],
        "range": [
          18,
          43
        ]
      },
      "alternate": null,
      "range": [
        0,
        43
      ]
    }
  ],
  "sourceType": "module",
  "range": [
    0,
    43
  ]
}

You don't need to care about "range" and "raw". The part of the parser output.

Let's split the JSON down into its part:

If statement

The if statement needs to be in the format:

{
    "type": "IfStatement",
    "test": {
    },
    "consequent": {
    },
    "alternate": null
}

Where "test" and "consequent are any expressions:

if statement condition

The condition can be any expression but here we will have a binary expression that compare two things:

{
  "type": "BinaryExpression",
  "operator": "==",
  "left": {},
  "right": {}
}

Variables

Variables usage looks like this:

{
  "type": "Identifier",
  "name": "foo"
}

Literal string

A literal string that is used in our code looks like this:

{
    "type": "Literal",
    "value": "bar"
}

Block with curly braces

The block inside if is created like this:

{
    "type": "BlockStatement",
    "body": [ ]
}

Whole program

And the whole program is created like this:

{
  "type": "Program",
  "body": [ ]
}

PEG Parser for your own language that compiles to JavaScript

For our demo language we will create code that looks similar to ruby:

if foo = "bar" then
  10 + 10
  10 * 20
end

and we will create AST, which then will create JavaScript code.

Peg grammar for if looks like this:

if = "if" _ expression:(comparison / expression) _ "then" body:(statements / _) _ "end" {
   return {
     "type": "IfStatement",
     "test": expression,
     "consequent": {
        "type": "BlockStatement",
        "body": body
     },
     "alternate": null
   };
}

we have "if" token, then an expression that is comparison or expression and body is statements or white space. _ are optional whitespaces that are ignored.

_ = [ \t\n\r]*

The comparison looks like this:

comparison = _ left:expression _ "==" _ right:expression _ {
   return {
        "type": "BinaryExpression",
        "operator": "==",
        "left": left,
        "right": right
   };
}

The expression looks like this:

expression = expression:(variable / literal) { return expression; }

Variable is created from three rules:

variable = !keywords variable:name {
  return {
    "type": "Identifier",
    "name": variable
  }
}

keywords = "if" / "then" / "end"

name = [A-Z_$a-z][A-Z_a-z0-9]* { return text(); }

Now let's look at statements:

statements = _ head:(if / expression_statement) _ tail:(!"end" _ (if / expression_statement))* {
    return [head].concat(tail.map(function(element) {
        return element[2];
    })); 
  }

expression_statement = expression:expression {
    return  {
      "type": "ExpressionStatement",
      "expression": expression
    };
}

And the last thing is literals:

literal = value:(string / Integer) {
   return {"type": "Literal", "value": value };
}

string = "\"" ([^"] / "\\\\\"")*  "\"" {
  return JSON.parse(text());
}

Integer "integer"
  = _ [0-9]+ { return parseInt(text(), 10); }

And that is the whole parser, that generates AST. After we have Esprima AST all we have to do, is to generate the code with escodegen.

The code that generates the AST and creates JavaScript code looks like this:

const ast = parser.parse(code);
const js_code = escodegen.generate(ast);

the parser variable is the name that you give when you generate the parser using PEG.js.

And here is a simple demo that I was using to write the parser, you can play with the grammar and generate different syntax for your own programming language that compiles to JavaScript.

Parser Generator Demo.

This simple application save your code in LocalStorage, If it compile without errors, on each change. So you can safely use it to create your own language. But I don't guarantee that you will not lose your work, so you may use something that is more robust.

NOTE: The original PEG.js project is not maintained anymore, but there is a new fork, Peggy that is maintained and it's backward compatibile with PEG.js so it will be easy to switch.

Conclusion

Writing language that compiles to JavaScript is simple. The techniques explained in this article should allow you to create any programming language that compiles to JavaScript on your own. This can be a way to create a POC of a language that you want to design. As far as I know, this is the fastest way to have something working. But you can use your language as is and create your own DLS (Domain Specific Language), write code in that language, and make JavaScript do the hard work.

Jakub T. Jankiewicz Hacker Noon profile picture
by Jakub T. Jankiewicz @jcubic. Front-end developer with Backend, SEO, and security skills. Open Source maintainer. Blogger. Polish Wikipedia redactor.Read my stories

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.