Josh Haas

@jphaas1

How we built a cloud-based programming language: ASTs in the Cloud

Building Bubble. In this series of posts, our engineering team talks about the inner workings of Bubble, the cloud-based visual programming language that’s making programming accessible to everyone.

If you’ve ever tried to build a compiler or an interpreter, you are probably familiar with Abstract Syntax Trees, generally abbreviated ASTs. An AST represents what code looks like to a computer. It’s the naked structure of the program, stripped of all the punctuation and spaces, organized hierarchically in a tree.

For instance, in Javascript, the code

if (x === 3) {
alert('hi!');
}

Might turn into a graph like

Whenever a computer compiles or runs a computer program, it first converts the code that the programmer types into an AST, and then navigates the AST to actually execute it.

Bubble, unlike a traditional programming language, doesn’t have code. It’s just an AST, without any syntax. Instead of storing Bubble apps as code, we just store the tree itself. Our web-based visual editor manipulates the AST directly.

Because Bubble apps are stored as ASTs instead of code, it is much easier for us to evolve the Bubble language over time than it is in a traditional programming language. We can make radical changes to the user interface that Bubble programmers work with without breaking compatibility with existing user applications, because we can write code that interprets the existing ASTs and translates them into the new format.

For this reason, Bubble is interpreted, not compiled: when running a Bubble app, we feed the raw AST to an interpreter that knows how to execute it, which means that when we change how Bubble works, there’s no need to re-compile existing applications.

So, because ASTs are so core to the Bubble language — they are, basically, the Bubble language — one of the very first decisions we faced building Bubble was what technology to use to store and transmit them.

Spoiler alert: we picked JSON.

JSON is an obvious choice for representing data trees in a web-based environment, for a couple of reasons. First, it’s a standard format that can easily be sent over the internet as plain text. Second, it’s very flexible and easy to build trees with: you just nest objects, using any combination of keys and values. Finally, Bubble is written in Javascript (well, CoffeeScript, to be precise, but we’ll discuss that in another post), so a JSON document translates directly into a Javascript object graph, without any weird impedance mismatches.

When we first started working on Bubble, we imagined that each Bubble app would be a single JSON document that we would send around the web as needed. We figured we could just store the whole thing in memory, either in the user’s browser or on the server. Ha! In retrospect, that was blindingly naive.

The first time we saw an application with a 5 MB document, we had an “oh shit” moment. Our code we loading the entire 5 MBs into memory to do anything, and we started watching the Bubble platform crawl to a grinding halt (Today, there are Bubble applications that are over a 100 MBs).

Obviously, we had to break applications into chunks somehow, and only send the chunks we needed to perform any given operation. In some cases, this was pretty easy. For instance, when running a Bubble workflow on the server, all the code that represents that workflow is in a single branch of the AST.

In other cases, though, it gets harder. There are a lot of things in Bubble that reference other parts of the app: elements reference styles, actions reference elements, and expressions reference user-defined types, for instance.

As a result, it can get complicated to figure out what part of the app we need to load in order to do something. Often we need to load data in one part of the app, and that data will contain a reference to another part of the app, which we’ll also need to load, but we won’t know that til we load the first part.

In some cases, the efficient thing to do is to just load a big chunk of the app upfront. When a user displays a Bubble page in the web browser, we know we need the AST for rendering all the elements on the page, so it makes sense to bundle that together and just send it.

In other cases, it is more efficient to lazy-load data as we discover we need it: when running a workflow on the server, if a server action references something else in the app, we want to be able to pull just the node that it references — often only a few bytes of data — temporarily into memory.

Having a messy problem like this so close to the core mechanic of our product is no good. We want to be able to develop Bubble without having to constantly juggle the question of what data is available when and where. So, we did what we always do when we encounter a messy problem we don’t want to deal with: we invented an abstraction layer.

We call the layer — a little uncreatively — JSONBase. It’s an abstraction over a node in a JSON document. Each object that inherits from JSONBase has an application name and application version that identifies which document it belongs to, and it has a path that indicates which node in the document it points to. For instance, path ‘a.b.c’ refers to

{
a: {
b: {
c: ← THIS
}
}
}

JSONBase (or JSONs, for short), are entirely abstract. The key, or even the entire sub-tree, that a JSON points to may or may not actually exist. This means we can work with them without worrying about errors on null references, or whether or not the data is loaded. We’ve abstracted the position in a JSON document from the actual JSON object itself.

JSONs expose methods for navigating the tree and accessing the data. For instance, calling json.child('d') on the json that represent ‘a.b.c’ yields the json ‘a.b.c.d’; calling json.parent() yields ‘a.b’. json.raw() returns a javascript object representing the data stored at this location in the tree; json.exists() returns a boolean indicating if there is any data stored there at all.

JSONBase is an abstract base class; it defines an interface and some functionality, but it doesn’t say anything about how the data is stored or loaded. We actually have three separate implementations of JSONBase, corresponding to the three main environments that Bubble code runs in: in the Bubble visual editor, in the user’s web browser when visiting a page built on Bubble, and on our web servers.

The simplest is RuntimeJSON: this is what we use in the browser when rendering a Bubble app. RuntimeJSON is all about speed: when we are rendering a page, we want it to be as fast as humanly possible. RuntimeJSON doesn’t have the ability to fetch data at all. We pre-compute on the server what parts of the application are needed to render a page, and send them over: RuntimeJSON is a thin wrapper around the actual data loaded in memory.

In contrast, EditorJSON, which we use in the web browser as part of the Bubble editor, is all about flexibility. The subset of application data we need when someone is using the Bubble editor changes on the fly as they navigate to different parts of the app, so EditorJSON is capable of dynamically querying the server for more data as needed.

EditorJSON also has the ability to handle changes to the application object. Each EditorJSON object has a .set(data) method that overwrites that part of the application tree. When set is called, EditorJSON both saves it in memory in the web-browser, and sends a request to the server to permanently save it to our database.

Finally, EditorJSON has a notification mechanism to let other parts of the editor code that rely on that data know that there’s been an update, so that they can re-draw the UI as needed. (We use data-binding similar to the way React does… that’s a topic for another post!) Because multiple users might be editing the same app at the same time, EditorJSON periodically polls the server for any updates made by other users, so that we can display those edits in real time.

As you might imagine, the code for EditorJSON is significantly more complicated than the code for RuntimeJSON, and it’s not quite as performant. But because they share the same JSONBase interface, most of our code doesn’t need to know which of the two it is dealing with: we can write code compatible with either, and then gain the performance of RuntimeJSON when running an app (we call this “run mode”), or the flexibility of real-time data updates in the Bubble visual editor (“edit mode”).

ServerJSON, which is used in run mode on the Bubble servers, is somewhere in between RuntimeJSON and EditorJSON. We load data on the servers in order to build the HTML for pages, to calculate what chunks of the application tree we need to send to the web browser, and to execute workflows that run on the server.

So, like EditorJSON, we need to load data on the fly as we need it, since at any time our server could need to execute any part of any user’s app. In contrast to EditorJSON, though, performance is a top priority, since when running an app, we want to make things as fast as possible. Luckily, we can make it simpler and faster than EditorJSON by not worrying about the app owner making changes. While executing a single page or workflow, we want to present a consistent view of the application, so we neither need nor want to take changes into account.

JSONBase is a testament to the power of abstraction. By wrapping a layer around the raw JSON documents that represent the AST of a Bubble application, we’re able to write code that manages multi-megabyte documents, loading only the parts into memory that we actually need, without the complexity of that enterprise leaking its way into the rest of our codebase.

JSONs aren’t the only level of abstraction in the Bubble language interpreter. A raw AST is all fine and good, but by itself, we don’t know what all those JSON trees mean. You can read about that in our next post in the series, Trees in the Clouds, Part II.

Found this interesting? We’re always looking for great engineers to join us!

Originally published at blog.bubble.is on May 2, 2018.

More by Josh Haas

Topics of interest

More Related Stories