Uri Shaked

Senior Software Engineer at BlackBerry

Instead of Semicolons, What if You Could Code with… Dragons?

An Adventure into the Magical Lands of Node.js, TypeScript, and Dragon Emoji 🐉

While attending Jazoon Tech Days conference last week (which was great BTW), I overheard people reviving the old discussion — whether to use semicolons in JavaScript or not. It then crossed my mind that it would be funny if we could change semicolons for something more fun — perhaps some kind of emoji, say... dragons!

As soon as I got back home, I sat down and quickly decided on a game plan: first, I wanted to add this capability to V8, the JavaScript engine behind Chrome and Node.js.

I figured out that the easiest way would just be to grab Node.js code and start playing with it. This turned out to be super-easy on my Ubuntu virtual machine:

git clone https://github.com/nodejs/node
./configure
make -j4
out/Release/node

Yes. That’s it. 4 simple commands, a cup of tea while you wait for it to compile (about 20 minutes on my machine), and you’ve got yourself a fresh new Node.js build. Kudos to the Node.js team for making it so simple.

Note: For you folks on Macs, the build process should be very similar once you have all the prerequisites. The Windows build is also pretty easy, too.

Now, for the fun part!

Let There Be Dragons! (Adding Dragons to Node)

Having no prior experience with neither V8 or Node.js code bases, I started by simply searching the code for the word SEMICOLON, and found a rather promising result:

So my thought was to try and simply to replace the ";" with my very own dragon and try to recompile the code. Unfortunately, it seems like changing this had no affect on the language behavior. It still accepted ; as separator between different commands, but I got no such success with dragons:

Back to searching! The second hit was much better, in the middle of a file called deps/v8/src/parser/scanning.cc inside the scan() method here:

This piece of code looked very promising, as it looked very much like a parser that reads the program code and emits a list of tokens. I quickly tried to change the ; to # and see if the following would work:

console.log('hello') # console.log('world')

And it did!

So the only thing that was left to figure out is how to put a dragon character there. Unfortunately, when it comes to Emoji things, are not quite as simple.

If I type ’🐉’ into my source code, it is actually saved as a 4-bytes sequence :
9f 90 90 22 — that’s how it is encoded in UTF-8, and the editor that I work with saves source files in UTF-8 encodings. If you are not sure what the difference is between Unicode, UTF-8 and UTF-16, I suggest that you read this article which will help you make sense with these terms.

To skip ahead to the point at hand, in order to modify the code to match the dragon emoji, I first needed to figure out whether the V8 scanner represented the character using some encoding or as code points (I though it might be UTF-16 at first, as JavaScript uses it internally).

After adding some debug printing, I figured out that by the time the V8 scanner runs, the source file has already been decoded to Unicode code points, so we would simply need to put the code point number for the dragon emoji, which we can figure out using from codePointAt() JavaScript String method:

console.log('🐉'.codePointAt(0));

And this will give us the desired magical number — 128009.

So, by changing the scanner code to read:

compiling node again, and feeding it with some nutritious dragon food, it finally worked!

The princess looks a little different in the Linux terminal, doesn’t she?

Actually, this turned out to be a lot easier that I originally imagined — I started as a total Node.js source code noob and added the feature that I wanted within less than an hour, and that’s including the 20 minutes I spent waiting for the computer to compile and build node. And all we needed was just a single line of code… amazing, isn’t it?

So now that I have a working implementation (without any tests though… but tests? Who needs them?!), I can submit it as a Strawman proposal for TC39 — I think it’s clear that the world could use a project of this scale ;)

I am an untested but working implementation, hear me roar!

I have feeling that they are not very likely to adopt it into the language, so I came up with a backup plan:

Dragonizing TypeScript

I use Angular a lot, as you can tell from some of my previous blog posts (e.g. Angular and Accessibility, ng-simon or my AngularConnect Workshop post). And as an Angular developer, I don’t just use plain JavaScript — I actually write my code in TypeScript, which is then compiled to JavaScript.

In addition, while we managed to add dragons to Node, we still have no browser that support them. It happens that Chrome also uses the V8 engine, so adding dragons there would probably be similar (except for a much longer build time — 4 hours last time I tried). But, if we can get TypeScript to transform the dragons for us into Semicolons automatically, we will be able to use dragons everywhere, even on IE 8 (I feel really sorry for you if you still have to support it, though).

So the next logical step would be to try and add the new dragon syntax to TypeScript.

Getting started with TypeScript development was super easy:

git clone https://github.com/Microsoft/TypeScript
npm install
npm run build

This built the typescript compiler at build/local/tsc.js, which means I was now able to compile any source file into JavaScript by running:

node build/local/tsc.js somefile.ts

and get the compiled code in somefile.js. Great start! Now, into the dragon part…

When we Dragonized Node.js, we found the relevant code inside scanning.cc. I quickly looked up for a file with a similar name, and found that they have a compiler/scanner.ts module in their source code, which looked really promising. I opened it and searched for the word “semicolon” (do you notice the pattern here?) and quickly found the giant Switch statement that splits the TypeScript source code into tokens:

On first sight, it seemed like I would be able to get it to recognize the new dragon by simply adding this line just below the highlighted one:

    case 128009:

(In case you forgot, 128009 is the Unicode code point for dragons). But since tsc, the TypeScript Compiler, is actually a JavaScript program, it uses UTF-16 internally to represent string characters. And represented as a UTF-16 string, the dragon emoji is actually comprised of two characters. You can easily confirm this by pasting the following code in your browser’s console:

console.log("🐉".length);

which gives you a result of 2. Surprising, isn’t it?

We can extract the two UTF-16 character codes by simply calling the charCodeAt JavaScript method:

console.log("🐉".charCodeAt(0), "🐉".charCodeAt(1));

Which reveals the UTF-16 character codes for this emoji: 55357 and 56329.

Finally, armed with this knowledge, we can augment TypeScript to accept the dragon as a valid semicolon by adding a switch case that matches the first UTF-16 character (55357) and then looking ahead and checking for a match of the next character:

    case 55357: // 🐉
        if (text.charCodeAt(pos + 1) === 56329) {
            pos += 2; // Move to next token
            return token = SyntaxKind.SemicolonToken;
        }
        continue;

That’s it! It only took 6 lines of code to bring Dragons to TypeScript. And now we can play with dragons in TypeScript, too!

p.s. — Shoutout to the TypeScript team for making it so simple!

Takeaways

Though I could “drag on” and on about this project, I think the main takeaway of this little experiment is that though hacking away at Node.js or TypeScript can be scary, it’s well within reach for many of us. Yes, they are both very big projects with large code bases, and that might give you the feeling that there is a steep learning curve even if you just want to build the thing, not to mention modifying their internals or add a new feature. But in in practice, both are really well-maintained projects and have a very straightforward build process.

I managed to hack my desired feature into either project in about one hour (and that includes to time to clone the project, figure out the build process and finding where to make the changes), and learned a lot about the internal structure of these projects throughout the process.

So now that we’ve shown that it’s pretty easy to replace JavaScript syntax markers/keywords, what other emoji do you think might be fun to include in our new “Dragon Language?” Or maybe there’s a different emoji you think might be fun to use? Regardless, if anyone else is brave enough to face the scary dragons, I’d love to hear about your experience coding with dragon semicolons in the comments section — I, at least, thought it was a pretty fun way to do things :)

Also, thanks to Tracy, Ben, and everyone else who helped out with the Dragon Punning on Twitter — fire dragons indeed!

More by Uri Shaked

More Related Stories