turtleDB: A JavaScript Framework for building offline-first, collaborative web apps

Building an offline-first, collaborative web app can be done in 2 easy steps:

npm install turtleDB
npm install tortoiseDB

Done!

Before we get into what turtleDB and tortoiseDB are…

Offline first AND collaborative? What does that even mean?

In this day and age, those of us who live in modern, heavily-populated cities like SF, NYC, Toronto, etc. take it for granted the fact that we have internet access at our fingertips pretty much everywhere we go.

However, if you think about it, there are still times where you may lose connection and there’s nothing you can do about it. Bad internet in the hotel suite you’re staying at? Getting on a plane for a 10 hour flight? You get the idea. And we’re not even getting into third world countries here.

Look familiar? It’s everybody’s favourite dinosaur! Ok that game is pretty fun, but I’m willing to bet you’d rather have your site actually load.

Welcome to the world of offline-first

When you’re browsing a website or using an app, a request must be sent to a server somewhere in order to retrieve your desired content. This is the underlying principle of the traditional client-server model.

Although this is what most users are accustomed to, there are some flaws to this model with the most predominant being: how can an app using the client-server model function without an internet connection? It can’t! This is the problem an offline-first approach attempts to solve.

Instead of querying the server first, offline-first web applications request data that’s stored locally on your machine, in your browser . This also means that responses from the server that would normally change an app’s data would modify your local data first. Your hard drive would then serve up these changes to the app.

How can you tell if a web app is offline-first? Short answer: you can’t. A well designed offline-first app feels absolutely no different than a traditional app. All the magic happens behind the scenes. Instead of the traditional client-server model we’re all used to, offline-first apps take advantage of technologies built into your local browser so users can experience a seamless online to offline transition.

Since we’re talking specifically about web apps here, we need a way to store data locally in the browser. The 3 main options you have are:

LocalStorage — around 5MB limit, can only store strings
WebSQL — deprecated
IndexedDB (IDB) — pretty much your only choice for document storage

If we don’t sound particularly enthusiastic about IDB, it’s because not many people are.

“IndexedDB API is powerful, but may seem too complicated for simple cases.” — MDN

That’s right. This quote was taken directly off the MDN website. They incidentally admit that IDB is a pain to work with, and boy is that true. Asynchronous JavaScript already gives people headaches and along with IDB being event-driven, it takes about 30 lines of tedious code just to insert your first piece of data.

Introducing turtleDB

The left shows what typical native IDB code may look like to insert “Bob” into the database. turtleDB can do this in 1 line (right).

We really believe that the complexity of IDB is a big detriment to the offline-first world. This was one of the driving factors behind us building turtleDB. We want developers to have the ability to create offline-first applications without learning all the quirks of IDB. Because of this, we decided to wrap IDB in a promise-based API (turtle-db.github.io/api) that feels more natural to work with and developers can invoke CRUD operations with just 1 line.

So now that we can perform CRUD queries in our browser, who would even use them?

Our target

We took some examples of common web apps and placed them in categories on a according to how easy or difficult it would be to convert them to an offline-first model. In the “Easy” category, we’re dealing with apps that don’t have a need for data storage of any kind. All they would need to work offline is some basic static asset caching.. Next up are apps that require some kind of data persistence but aren’t collaborative in any way. We categorized these under “Medium” difficulty because you still need to work with some kind of in-browser storage solution.

“Hard” is where things get really interesting. Not only do we need to cache static assets and make use of local data storage, we need to think about how these apps can send and receive data to and from other users when they do get back online . Email without extremely large attachments, turn based games like Chess, and project management apps such as Trello, Pivotal Tracker, and Basecamp could all be converted to an offline-first architecture. These are the kinds of applications we were attempting to make simpler for developers to build as offline-first.

Before moving on, we just want to briefly mention the last class of apps on this chart. These would be pretty much impossible to convert to an offline-first architecture. Apps such as Twitch and Facebook aren’t able to adopt an offline-first approach while keeping some of their core functionalities. Streaming for example, requires a persistent and reliable internet connection. Add on extremely large data sets and we have a situation where we don’t even have enough hard drive space to store that information. We put chat apps in this category too because even though they have the potential of being offline-first, a disconnected chat app might as well be email.

turtleDB Architecture

Because turtleDB attempts to be the solution for those apps under the “Hard” category, there were three major features we kept in mind. First, it needed to be very user friendly. If an API is clunky, nobody would use it (just look at IDB!). Secondly, we’re targeting apps that are collaborative. This means multiple people can work together on the same dataset; think chess, where two players share the same board. And lastly, working collaboratively on datasets introduces conflicts. We’ll get into this in a bit but conflicts occur when clients “disagree” on what the dataset should be. Having already touched on how easy turtleDB is to use, let’s get into the second and third points.

Achieving Consistency

In order for an application to be collaborative, the state of that app must be shared across its network of users. How can multiple people work on the same document if they each see something different? That just wouldn’t work. This means collaborative apps must have a way of achieving consistency across its user base.

We wanted to push turtleDB far beyond just a user friendly API for IndexedDB. What would we have to do in order to allow developers to build collaborative applications that can also function offline? As mentioned, in order for an application to be collaborative, it must be able to maintain a consistent view of the data across all users.

Bi-directional synchronization is how turtleDB solves this problem. Users are able to push their data to other users and vice versa. We wrote a lengthy paper on how we built out this functionality as it was by far one of the biggest engineering challenges we faced while working on turtleDB (turtle-db.github.io/about#synchronization). Without going into too much detail, here’s a quick demo of how a sync between turtleDB and MongoDB would look like:

Example: Synchronization

With syncing in place, users can now use our framework and push changes to each other. I can create a document, you can create a document, and after we both sync, we’ll each have not only our original document, but the other person’s as well. This is great! But what happens if we both updated the same document independently? Whose changes do we ultimately see?

Conflicts

Working asynchronously and collaboratively introduces conflicts. Conflicts arise when two or more people working on the same document make independent, different changes to that same document.

Have you ever worked on a Google doc with friends and “accidentally” deleted each others’ work? You could overwrite each others’ letter, word, or even paragraph. This is fine because real-time collaborative apps have the luxury of providing live updates and those changes could easily be undone.

In an asynchronous, offline-first setting, being able to overwrite other people’s work can be disastrous. Here’s what I mean.

Imagine if you were a Project Manager putting together boards on a turtleDB powered Trello for your team, and you’re doing this with the help of another PM.

*If you don’t know what Trello is, just imagine that you’re collaboratively working on a database with someone else; making inserts, updates, deletes.

There’s a huge lightning storm in your area and your internet goes down for 15 mins. No problem! Thanks to turtleDB, Trello can seamlessly transition to offline-mode and you can continue working without even noticing you lost internet.

But once you get back online and sync, you notice all the work you did was deleted. Where did all of it go? Was there an issue with Trello? You can’t figure out how or why. A few minutes later you get a message from the other PM saying he deleted the boards you created because they’re no longer needed. You’re outraged. All that work you put in is lost and the worst part is, you didn’t even have a say in it.

In other words, asynchronous changes can lead to a lot of lost work if not handled well. Thankfully, that was a made up scenario because turtleDB doesn’t allow this to ever happen.

Instead, turtleDB ensures if multiple clients sync with a common remote server, conflicts are not only surfaced, but are easily resolvable. It does this by storing all the changes that all clients have ever made — document histories — and tracking all those versions in a tree-like data structure. All the conflicting versions remain available.

Competing changes make branches in the tree. If one person deletes a version, the other can continue working on theirs.

Similar to synchronization, conflict resolution is a detailed and complicated process which we describe in explicit detail on our website (turtle-db.github.io/about#conflicts).

The short version is that because turtleDB keeps all document histories, your work is never lost. For example, if somebody were to delete the document you’re working on, you could keep working on it unless you also chose to delete it!

If you’re thinking “wait.. if you store all the document histories, won’t you run out of space?”

Scalability

The answer to that last question is a resounding “yes”. Avoiding a “last write wins” scenario and being able to resolve conflicts is a luxury that comes with a cost: disk space. But it probably won’t be a problem unless your app is dealing with huge amounts of data.

Typical storage capacities of modern mobile devices

This table shows common storage capacities of mobile devices & laptops after taking into account the operating system. We performed some rough calculations to determine how much space is available to someone using turtleDB (or any IDB wrapper for that matter). Although numbers in the right side column may seem large, a write-heavy application will eat through it quickly. We reference exactly how we came up with these calculations here: https://turtle-db.github.io/about#idb-limits

This is what the disk usage of a write-heavy application could potentially look like. Keep in mind that we’re just showing how one unique document could potentially scale.

For a large dataset with multiple collaborators, disk space can easily become a limiting factor. However, we’re willing to make this tradeoff because:

It is much more annoying to have somebody overwrite your work than losing some storage space
We came up with a partial solution called compaction

Compaction is an optional tool for users to free up disk space. If you had a really long history of documents that you knew weren’t going to be of use ever again, those could all be deleted with our compaction feature. This is comparable to deleting your browser history or in programming terms, garbage collection.

Final thoughts

Clearly you have an interest in the world of offline-first applications if you’ve made it this far. We believe this space has a ton of potential but is often overshadowed by trending topics such as blockchain and decentralized apps.

We spent a lot of time writing our entire process of building turtleDB and pitfalls we encountered. If you’d like to use, contribute, or just read about our project, please check out turtle-db.github.io

The team behind turtleDB

We’re scattered through North America and so turtleDB was built entirely remotely. If you have opportunities or just want to chat, don’t hesitate to get in touch!

Steven Shen — LinkedIn
Max Appleton — LinkedIn
Andrew Houston-Floyd — LinkedIn

Inspiration

Finally, we just want to give a shout out to the projects that inspired us to embark on this journey. If you’re just looking for storing data in-browser, check out these awesome IndexedDB libraries:

However, if you need something more powerful that can also give your app collaborative abilities,

PouchDB (uses a CouchDB back-end)
Firebase (Google’s proprietary storage option)
and of course our very own, turtleDB!