Let’s Build a Programming Language

The Great Wave off Kanagawa, Katsushika Hokusai As someone who writes code, you undoubtedly do so using one or multiple programming languages. You probably enjoy writing code in some programming languages because of their elegance, expressive power or any other reason and you also have probably kept your distance from other programming languages because of, maybe, some of their features that you think are badly implemented. Have you, however, thought about how those programming languages we love and hate came to be? How those particular features we like (or do not like) are designed and implemented and why? How those magic black boxes that are compilers and interpreters work? How code written in JavaScript, Ruby, Python, etc, turns into an executable program? Or have you ever thought about building your own programming language? If your answer to any of the above questions is yes, welcome to this series of articles about building a programming language. This series will take you from 0 to 1 in building a functional interpreter for a programming language. At the end of the series, you will have an interpreter, that you built from scratch, run programs written in a programming language that we would have designed together. Why? Why learn how to implement a programming language? Building a compiler or interpreter will make you a better programmer. Compilers and interpreters encompass interesting data structures and algorithms, the knowledge of which is applicable and useful to other domains. You can apply this knowledge to writing parsers for different file formats, building domain specific languages, for example a database query language, …. You will also better understand how and when to optimize code, be better equipped to choose the best programming language for a particular task and you will finally make sense of some of those weird error messages your compiler/interpreter throws sometimes :). Contribute to the development of your favorite programming language. Many interesting programming languages are open source and welcome new contributors but often, the knowledge necessary to contribute is a barrier to entry for most people who never took a CS compiler course. If you care about your preferred programming language and/or have ever thought about contributing to its evolution, you’ll be better equipped to do so after having built a toy compiler or interpreter. It’s a lot of fun. :) How? This article is the introduction to a series of which each article will introduce the concepts and knowledge necessary to complete one specific step in implementing a programming language. Each article ends with a challenge (it wouldn’t be fun if there was no challenge, right?). The challenge will prompt you to implement a well-defined component of your interpreter. There will be some test files to download at the end of each article and you complete the challenge by writing the code to make all the tests pass. By the time you complete all the challenges, you will have a full working interpreter that can run code written in our programming language. Although any programming language can be used to complete the challenges, for this series, our implementation language will be JavaScript, (ES6 more precisely) mostly so that the series will be accessible to a wider audience. What? If all the above sounds good, let’s start by describing the programming language we will be implementing. Our programming language, specifically designed for the purpose of this series, is called Blink. It’s an interpreted, object-oriented and statically typed programming language inspired by Swift, Kotlin and Scala. A tour of Blink Hello, Blink! The typical “Hello, World” is written as follows. Console.println("Hello, Blink!") Data types Blink supports values of familiar types like Int, Double, String and Bool. Additionally, there is a special type Unit to express the absence of value (similar to void in Java) and all types in Blink inherit from a supertype Object. Comments A comment in Blink starts with //. Anything from the // symbol to the end of line will be ignored by the Blink interpreter. // This is a comment. Declaring variables Variables are declared using a let expression. let message: String = "Hello, Blink!" in {Console.println(message)} A let expression comprises 2 parts: the declaration (before the in keyword) where the variable is declared and possibly initialized. A variable declaration is made of the name of the variable followed by a colon :, followed by the type of the variable message: String. To initialize the variable at its declaration, append equal = and the value of the variable message: String = "Hello, Blink!". the body (after the in keyword) where the variable can be accessed. Variables declared with let are only accessible in the associated body. Omitting the type of the variable. Blink supports type inference meaning that if a variable is initialized at its declaration then its type can be omitted and Blink will infer the correct type of the variable based on its value. The previous example could have been written like this: let message = "Hello, World!" in {Console.println(message)} Declaring multiple variables at once. To declare multiple variables, separate them with commas , in the declaration part of the let expression. let a = 42**,** b = 12**,** c = 24 in {Console.println(a + b + c)} Each variable can access all the variables declared before it in its initialization expression. So something like the following is perfectly possible. let a = 42, b = a * 2, c = b * 3 in {Console.println(a)Console.println(b)Console.println(c)} The above will print 4284252 Omitting the curly braces. A final note on the let expression is that if the body is made of only one expression, the curly braces can be omitted. The first example could be written more succinctly as follows. let message = "Hello, Blink!" in Console.println(message) Conditions Like in most programming languages, conditions are expressed using an if-else expression. if ( ) { } else { } The condition must evaluate to a value of type Bool. The else block can be omitted if needed and the curly braces can be omitted for the if or else block if the block is made of only one expression. if (answer == 42) {Console.println("That's the answer of life!")} else {Console.println("That's not the answer of life.")} Looping A while expression is used to execute one or more expressions as long as a condition holds true. while ( ) { } Again, the curly braces can be omitted if the body of the while is made of only one expression. Defining functions Functions are defined using the func keyword. func sum(a: Int, b: Int): Int = {a + b} Function parameters in Blink are separated by commas , and enclosed in parenthesis (a: Int, b: Int). Each parameter is defined by its name followed by a colon :, followed by its type a: Int. Return type. The return type is defined by adding a colon : followed by the type of the value returned by the function after the parameter list. If the function does not return a value, the return type must be Unit. func greet(): Unit = {Console.println("Hello, Blink!")} However, as a syntactic sugar, if a function does not declare any return type, Blink will automatically assume that the return type is Unit. So the previous example could be written as follows. func greet() = {Console.println("Hello, Blink!")} Function body. After the return type, comes the equal = operator and then the body of the function enclosed in curly braces. There is no return keyword in Blink, the value of the last expression in the body of the function is the value returned by the function. If the body is made of only one expression, the curly braces can be omitted. Our first example could be rewritten like this: func sum(a: Int, b: Int) = a + b Defining classes Classes are defined using the class keyword. class Person {} This defines a simple (albeit useless) class Person. Objects of the class Person can now be created using the new keyword. let p = new Person() in { } Properties. A class can have one or more properties declared with the var keyword. class Person {var firstname: String = "Klaus"var lastname: Stringvar age: Int} A property can be optionally initialized at its declaration. If a property is initialized, the initialization expression will be evaluated when the object is being created. Properties in Blink are private. They can not be made public. Getters and setters (which are just functions) can be created to access properties outside of the class. Functions. A class can have functions (or methods if you prefer). class Person {var firstname: String = "Klaus"var lastname: Stringvar age: Int **func** firstname(): **String** = { firstname } **func** setFirstname(name: **String**) = { firstname = name } } In the example above, we have a getter and a setter for the property firstname. A more idiomatic way to write those functions would be to omit the curly braces because there is only one expression in the body of each of those functions. class Person {var firstname: String = "Klaus"var lastname: Stringvar age: Int **func** firstname(): **String** = firstname **func** setFirstname(name: **String**) = firstname = name } Functions are public by default in Blink. However, it’s possible to make a function private by adding the private modifier before the func keyword. private func age(): Int = // ... Constructor. A class in Blink has one and only one constructor. By default, a class in Blink has a default constructor which takes no parameter. An explicit custom constructor can be defined by adding a list of parameters enclosed in parenthesis to the name of the class. class Person**(firstname: String,** lastname**: String)** {func firstname(): String = firstnamefunc setFirstname(name: String) = firstname = name **func** lastname(): **String** = lastname **func** setLastname(name: **String**) = lastname = name } The constructor parameters automatically act as properties of the class and can be accessed from the body of all the functions in the class or from the initialization expression of any additional property. let person = new Person("Klaus", "Baudelaire") in {Console.println(person**.firstname())Console.println(person.lastname()**)} The above will print KlausBaudelaire Inheritance. Inheritance in Blink is expressed with the extends keyword. class Employee extends Person {} If the class being inherited (the superclass) has an explicit constructor, the inheriting class must honor the superclass’s constructor by passing the arguments necessary to create an object of the superclass. class Employee**(firstname: String,** lastname**: String,** company**: String)** extends Person**(firstname, lastname)** {} Overriding functions. The override modifier is used to override a function in the superclass. class Person(firstname: String, lastname: String) {override func toString(): String = firstname + " " + lastname} In the above example, we’re overriding the toString available in the Object class. All classes inherit from Object in Blink. Defining operators Just like it’s possible to use arithmetic operators +, -, * or / to perform operations between 2 or more Ints, Blink allows us to define the behavior of the binary operators +, -, *, /, % and unary operators — and ! for our own classes. A binary operator is defined by adding a function which name is the symbol of the operator to define. For example, let’s define a + operator for a class Rational (representing rational numbers like 3/4). class Rational(num: Int, den: Int) {func +(other: Rational): Rational = // ...} Given two variables a and b of type Rational, we can now write let c: Rational = a + b. Blink automatically converts the expression a + b to the function call a.+(b). A unary operator is defined by adding a function which name starts with unary_ followed by the symbol of the operator being defined. class Rational(num: Int, den: Int) {func unary_-(): Rational = new Rational(-num, -den)} In the example above, we’re defining the behavior of the unary — operator on our class Rational. Given a variable a of type Rational, we can write let b: Rational = -a. Blink converts the expression -a to the function call a.unary_-(). Operator overloading is also supported in Blink, meaning that it is possible to define multiple + operators, for example, as long as the parameters list of each operator function is different. For instance, for our Rational class, we may want to add 2 Rationals together but also want to add a Rational and an Int. Operator overloading allows us to do just that. class Rational(num: Int, den: Int) {func +(other: Rational): Rational = // ...func +(integer: Int): Rational = // ...func +(double: Double): Rational = // ...} With the above example and given 2 variables a and b of type Rational, we can write expressions like a + b, a + 42 or b + 3.14. Everything is an object Everything in Blink is an object. Even primitives like Ints or Doubles are actually objects. And an arithmetic expression like 32 + 21 is, in fact, the function call 32.+(21). This is a neat feature that enables us, among others things, to define operators for our own types in Blink. There is no statement in Blink, only expressions (and definitions) To conclude our tour of Blink, it is worth noting that unlike other programming languages like Java or JavaScript, there is no statement in Blink, only expressions (and definitions like function or property definitions). The difference between an expression and a statement is that an expression always evaluates to a value while a statement just performs some action. For example in JavaScript, the if statement executes a certain block of code if a condition is true or false. In Blink, if is an expression and evaluates to a value. That means, in Blink, it’s perfectly valid to write let isLegallyAdult = if (age > 18) { true } else { false } isLegallyAdult will be equal to true if age > 18 or will be equal to false if not. This concludes our quick tour of Blink. As you can see, Blink has the necessary concepts to express almost any program, is simple enough to tackle writing an interpreter for it in a relatively small amount of time and has some interesting features like its usage of objects and expressions. If you’re ready to jump in, head over to the second article which will give a high level overview of compilers and interpreters. The Great Wave off Kanagawa , Katsushika Hokusai The Great Wave off Kanagawa Katsushika Hokusai As someone who writes code, you undoubtedly do so using one or multiple programming languages. You probably enjoy writing code in some programming languages because of their elegance, expressive power or any other reason and you also have probably kept your distance from other programming languages because of, maybe, some of their features that you think are badly implemented. Have you, however, thought about how those programming languages we love and hate came to be? How those particular features we like (or do not like) are designed and implemented and why? How those magic black boxes that are compilers and interpreters work? How code written in JavaScript, Ruby, Python, etc, turns into an executable program? Or have you ever thought about building your own programming language? If your answer to any of the above questions is yes, welcome to this series of articles about building a programming language . building a programming language This series will take you from 0 to 1 in building a functional interpreter for a programming language. At the end of the series, you will have an interpreter, that you built from scratch, run programs written in a programming language that we would have designed together. Why? Why learn how to implement a programming language? Building a compiler or interpreter will make you a better programmer. Compilers and interpreters encompass interesting data structures and algorithms, the knowledge of which is applicable and useful to other domains. You can apply this knowledge to writing parsers for different file formats, building domain specific languages, for example a database query language, …. You will also better understand how and when to optimize code, be better equipped to choose the best programming language for a particular task and you will finally make sense of some of those weird error messages your compiler/interpreter throws sometimes :). Contribute to the development of your favorite programming language. Many interesting programming languages are open source and welcome new contributors but often, the knowledge necessary to contribute is a barrier to entry for most people who never took a CS compiler course. If you care about your preferred programming language and/or have ever thought about contributing to its evolution, you’ll be better equipped to do so after having built a toy compiler or interpreter. It’s a lot of fun. :) Building a compiler or interpreter will make you a better programmer. Compilers and interpreters encompass interesting data structures and algorithms, the knowledge of which is applicable and useful to other domains. You can apply this knowledge to writing parsers for different file formats, building domain specific languages, for example a database query language, …. You will also better understand how and when to optimize code, be better equipped to choose the best programming language for a particular task and you will finally make sense of some of those weird error messages your compiler/interpreter throws sometimes :). Building a compiler or interpreter will make you a better programmer. Contribute to the development of your favorite programming language . Many interesting programming languages are open source and welcome new contributors but often, the knowledge necessary to contribute is a barrier to entry for most people who never took a CS compiler course. If you care about your preferred programming language and/or have ever thought about contributing to its evolution, you’ll be better equipped to do so after having built a toy compiler or interpreter. Contribute to the development of your favorite programming language It’s a lot of fun . :) It’s a lot of fun How? This article is the introduction to a series of which each article will introduce the concepts and knowledge necessary to complete one specific step in implementing a programming language. Each article ends with a challenge (it wouldn’t be fun if there was no challenge, right?). The challenge will prompt you to implement a well-defined component of your interpreter. There will be some test files to download at the end of each article and you complete the challenge by writing the code to make all the tests pass. By the time you complete all the challenges, you will have a full working interpreter that can run code written in our programming language. Although any programming language can be used to complete the challenges, for this series, our implementation language will be JavaScript , ( ES6 more precisely) mostly so that the series will be accessible to a wider audience. JavaScript ES6 What? If all the above sounds good, let’s start by describing the programming language we will be implementing. Our programming language, specifically designed for the purpose of this series, is called Blink . It’s an interpreted, object-oriented and statically typed programming language inspired by Swift , Kotlin and Scala . Blink Swift Kotlin Scala A tour of Blink Hello, Blink! The typical “Hello, World” is written as follows. “Hello, World” Console.println("Hello, Blink!") Data types Blink supports values of familiar types like Int , Double , String and Bool . Additionally, there is a special type Unit to express the absence of value (similar to void in Java) and all types in Blink inherit from a supertype Object . Int Double String Bool Unit void Object Comments A comment in Blink starts with // . Anything from the // symbol to the end of line will be ignored by the Blink interpreter. // // // This is a comment. Declaring variables Variables are declared using a let expression . let expression let message: String = "Hello, Blink!" in {Console.println(message)} let String in A let expression comprises 2 parts: let the declaration (before the in keyword) where the variable is declared and possibly initialized. A variable declaration is made of the name of the variable followed by a colon :, followed by the type of the variable message: String. To initialize the variable at its declaration, append equal = and the value of the variable message: String = "Hello, Blink!". the body (after the in keyword) where the variable can be accessed. Variables declared with let are only accessible in the associated body. the declaration (before the in keyword) where the variable is declared and possibly initialized. A variable declaration is made of the name of the variable followed by a colon : , followed by the type of the variable message: String . To initialize the variable at its declaration, append equal = and the value of the variable message: String = "Hello, Blink!" . declaration in : message: String = message: String = "Hello, Blink!" the body (after the in keyword) where the variable can be accessed. Variables declared with let are only accessible in the associated body. body in let Omitting the type of the variable. Blink supports type inference meaning that if a variable is initialized at its declaration then its type can be omitted and Blink will infer the correct type of the variable based on its value. The previous example could have been written like this: Omitting the type of the variable. type inference let message = "Hello, World!" in {Console.println(message)} let in Declaring multiple variables at once. To declare multiple variables, separate them with commas , in the declaration part of the let expression . Declaring multiple variables at once. , let expression let a = 42**,** b = 12**,** c = 24 in {Console.println(a + b + c)} let in Each variable can access all the variables declared before it in its initialization expression. So something like the following is perfectly possible. let a = 42, b = a * 2 , c = b * 3 in {Console.println(a)Console.println(b)Console.println(c)} let a * 2 b * 3 in The above will print 4284252 Omitting the curly braces. A final note on the let expression is that if the body is made of only one expression , the curly braces can be omitted. The first example could be written more succinctly as follows. Omitting the curly braces. let expression expression let message = "Hello, Blink!" in Console.println(message) let in Conditions Like in most programming languages, conditions are expressed using an if-else expression . if-else expression if ( ) { } else { } if else The condition must evaluate to a value of type Bool . The else block can be omitted if needed and the curly braces can be omitted for the if or else block if the block is made of only one expression . Bool else if else expression if (answer == 42) {Console.println("That's the answer of life!")} else {Console.println("That's not the answer of life.")} if else Looping A while expression is used to execute one or more expressions as long as a condition holds true. while expression expressions while ( ) { } while Again, the curly braces can be omitted if the body of the while is made of only one expression. while expression. Defining functions Functions are defined using the func keyword. func func sum(a: Int , b: Int ) : Int = {a + b} func Int Int : Int = Function parameters in Blink are separated by commas , and enclosed in parenthesis (a: Int, b: Int) . Each parameter is defined by its name followed by a colon : , followed by its type a: Int . Function parameters , (a: Int, b: Int) : a: Int Return type. The return type is defined by adding a colon : followed by the type of the value returned by the function after the parameter list. If the function does not return a value, the return type must be Unit . Return type. : Unit func greet() : Unit = {Console.println("Hello, Blink!")} func : Unit However, as a syntactic sugar, if a function does not declare any return type, Blink will automatically assume that the return type is Unit . So the previous example could be written as follows. Unit func greet() = {Console.println("Hello, Blink!")} func Function body. After the return type, comes the equal = operator and then the body of the function enclosed in curly braces. There is no return keyword in Blink, the value of the last expression in the body of the function is the value returned by the function. Function body. = no return keyword expression is If the body is made of only one expression , the curly braces can be omitted. Our first example could be rewritten like this: expression func sum(a: Int, b: Int) = a + b func = Defining classes Classes are defined using the class keyword. class class Person {} class This defines a simple (albeit useless) class Person . Objects of the class Person can now be created using the new keyword. Person Person new let p = new Person() in { } let new Person() in Properties. A class can have one or more properties declared with the var keyword. Properties. var class Person { var firstname: String = "Klaus" var lastname: Stringvar age: Int } class var String var Stringvar Int A property can be optionally initialized at its declaration. If a property is initialized, the initialization expression will be evaluated when the object is being created. expression Properties in Blink are private . They can not be made public . Getters and setters (which are just functions) can be created to access properties outside of the class. Properties in Blink are private They can not be made public Functions. A class can have functions (or methods if you prefer). Functions. methods class Person { var firstname: String = "Klaus" var lastname: Stringvar age: Int class var String var Stringvar Int **func** firstname(): **String** = { firstname } **func** setFirstname(name: **String**) = { firstname = name } **func** firstname(): **String** = { firstname } **func** setFirstname(name: **String**) = { firstname = name } } In the example above, we have a getter and a setter for the property firstname . A more idiomatic way to write those functions would be to omit the curly braces because there is only one expression in the body of each of those functions. firstname expression class Person { var firstname: String = "Klaus" var lastname: Stringvar age: Int class var String var Stringvar Int **func** firstname(): **String** = firstname **func** setFirstname(name: **String**) = firstname = name **func** firstname(): **String** = firstname **func** setFirstname(name: **String**) = firstname = name } Functions are public by default in Blink . However, it’s possible to make a function private by adding the private modifier before the func keyword. Functions are public by default in Blink private func private func age(): Int = // ... private func Constructor. A class in Blink has one and only one constructor. By default, a class in Blink has a default constructor which takes no parameter. An explicit custom constructor can be defined by adding a list of parameters enclosed in parenthesis to the name of the class. Constructor. one and only one class Person**( firstname : String,** lastname**: String)** { func firstname(): String = firstname func setFirstname(name: String ) = firstname = name class firstname func String func String **func** lastname(): **String** = lastname **func** setLastname(name: **String**) = lastname = name **func** lastname(): **String** = lastname **func** setLastname(name: **String**) = lastname = name } The constructor parameters automatically act as properties of the class and can be accessed from the body of all the functions in the class or from the initialization expression of any additional property. expression let person = new Person("Klaus" , "Baudelaire" ) in {Console.println(person**.firstname() )Console.println(person .lastname()**)} let new Person("Klaus" ) in )Console.println(person The above will print KlausBaudelaire Inheritance. Inheritance in Blink is expressed with the extends keyword. Inheritance. extends class Employee extends Person {} class extends If the class being inherited (the superclass) has an explicit constructor, the inheriting class must honor the superclass’s constructor by passing the arguments necessary to create an object of the superclass. class Employee**( firstname : String,** lastname**: String,** company**: String)** extends Person**(firstname, lastname)** {} class firstname extends Overriding functions. The override modifier is used to override a function in the superclass. Overriding functions. override class Person(firstname: String , lastname: String ) { override func toString(): String = firstname + " " + lastname} class String String override func String In the above example, we’re overriding the toString available in the Object class. All classes inherit from Object in Blink. toString Object Object Defining operators Just like it’s possible to use arithmetic operators + , - , * or / to perform operations between 2 or more Int s, Blink allows us to define the behavior of the binary operators + , - , * , / , % and unary operators — and ! for our own classes. + - * / Int + - * / % — ! A binary operator is defined by adding a function which name is the symbol of the operator to define. For example, let’s define a + operator for a class Rational (representing rational numbers like 3/4 ). A binary operator + Rational 3/4 class Rational(num: Int , den: Int ) { func +(other: Rational ): Rational = // ...} class Int Int func Rational Rational Given two variables a and b of type Rational , we can now write let c: Rational = a + b . Blink automatically converts the expression a + b to the function call a.+(b) . a b Rational let c: Rational = a + b a + b a.+(b) A unary operator is defined by adding a function which name starts with unary_ followed by the symbol of the operator being defined. A unary operator unary_ class Rational(num: Int , den: Int ) { func unary_-(): Rational = new Rational (-num, -den)} class Int Int func Rational new Rational In the example above, we’re defining the behavior of the unary — operator on our class Rational . Given a variable a of type Rational , we can write let b: Rational = -a . Blink converts the expression -a to the function call a.unary_-() . — Rational a Rational let b: Rational = -a expression -a a.unary_-() Operator overloading is also supported in Blink, meaning that it is possible to define multiple + operators, for example, as long as the parameters list of each operator function is different. Operator overloading + For instance, for our Rational class, we may want to add 2 Rational s together but also want to add a Rational and an Int . Operator overloading allows us to do just that. Rational Rational Rational Int class Rational(num: Int , den: Int ) { func +(other: Rational ): Rational = // ... func +(integer: Int ): Rational = // ... func +(double: Double ): Rational = // ...} class Int Int func Rational Rational func Int Rational func Double Rational With the above example and given 2 variables a and b of type Rational , we can write expressions like a + b , a + 42 or b + 3.14 . a b Rational expressions a + b a + 42 b + 3.14 Everything is an object Everything in Blink is an object. Even primitives like Int s or Double s are actually objects. And an arithmetic expression like 32 + 21 is, in fact, the function call 32.+(21) . This is a neat feature that enables us, among others things, to define operators for our own types in Blink. Int Double 32 + 21 32.+(21) There is no statement in Blink, only expressions (and definitions) To conclude our tour of Blink, it is worth noting that unlike other programming languages like Java or JavaScript, there is no statement in Blink, only expressions (and definitions like function or property definitions) . The difference between an expression and a statement is that an expression always evaluates to a value while a statement just performs some action. expressions . expression expression value For example in JavaScript, the if statement executes a certain block of code if a condition is true or false . In Blink, if is an expression and evaluates to a value. if true false if That means, in Blink, it’s perfectly valid to write let isLegallyAdult = if (age > 18) { true } else { false } let if true false isLegallyAdult will be equal to true if age > 18 or will be equal to false if not. isLegallyAdult true age > 18 false This concludes our quick tour of Blink. As you can see, Blink has the necessary concepts to express almost any program, is simple enough to tackle writing an interpreter for it in a relatively small amount of time and has some interesting features like its usage of objects and expressions. If you’re ready to jump in, head over to the second article which will give a high level overview of compilers and interpreters. the second article