G├ęza Radics

@holdfenytolvaj

Javascript: experience of migrating from MongoDB to PostgreSQL

Background

Our project (labcup.net) started as a hobby weekend project and as it is.. naturally a good place to try new technologies, this is how we chose MongoDB circa 2 years ago. The project has grown to a full time work and started to have paying customers as well. Though the experimenting was a right decision back at the time, MongoDB was is clearly the wrong tool for the task now (we have relatively small databases with few users but strong need for consistency).
We knew it but had to keep on moving till this summer, when we spent so much time on discovering and fixing database inconsistencies that we could not live with it longer, we had to act.
Note: that inconsistencies arise not from lack of tests, but lack of type checks (or silent conversion), lack of transactions and from wrong assumptions; — unless you check the code, you can only guess about the structure and types.
Note: the size of the project is around 60K LOC (without comments, according to cloc) and it is typescript only, so it is definitely not a small project.
I also have to mention here, that typescript helped a lot on the type checks and structure definitions but obviously more was needed.
Note: there were other issues as well with MongoDB. E.g.:
- the ObjectId occupied 32 chars which is a lot to send over and store on client side (we cache data on the browser’s indexedDb).
- not the biggest issue, but annoying and error prone to always convert the ObjectId for queries (e.g. if string was used for the query, it just silently returned with no result)
- some of the mysterious inconsistencies were probably due to surprising index usage, where not all records has been returned. (see here https://engineering.meteor.com/MongoDB-queries-dont-always-return-all-matching-documents-654b6594a827#.j1n0zyk2d )

Why PostgreSQL?

Previously we both had experience with MySQL/MariaDb, but PostgreSQL has better support for JSON columns and operators, yet alone this is enough reason to choose it, but posts like (http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL) also leaved us with very good impression on Postgre that we want to dive into more. And last but not least I’m not expecting Oracle to put much effort to develop MySQL in long term. So PostgreSQL was a natural choice for our purpose.

Which js library for PostgreSQL?

As we both were biased against ORMs, mostly because
- they tend to be over-engineered
- they usually don’t support extra features (because of the least common denominator)
- usually you have to learn some new syntax instead of using pure SQL
- usually you have to predefine many things, that is a lot of overhead
- it’s is not really an abstraction since you have to be good of both the database and the ORM, in this way it just stands on the way.

… and we won’t change database again.

What we were looking for is a simple library that with ease covers 95% of the queries (the rest can be done with SQL). It should not use callback, at least support Promises, but async-await would have been even more awesome.
Now pg.js is supporting async-await, but as it is, it was too simple.
Massive.js on the other hand looked just as what we want. Unfortunately when we started to use it, more and more limitation has surfaced. E.g. no transaction, no pooling, every query open and close a connection, no logger, it only guess about JSON types that fails when you insert an empty array []. It’s primary to use either relational or fully JSON column tables. 
After few days of playing around we have decided to write our own library. With full typescript and async-await support built on pg.js and as a good artist s̶t̶e̶a̶l̶ borrow some idea from Massive and ended up with writing pogi.

At first it looked like it will take longer to write our own handler lib, but we were actually faster to write directly what we need, rather than work around issues.

How long it took?

All together it was around 2 and a half man-months. Around 1 month for conversion, and 1 more for testing every bit of the application. (A lot of time was wasted on fixing inconsistencies also.)

The good thing is that with jsonb columns we could cut a lot of corner. Just had to convert the important fields (one with foreign keys, that need (special) checks, must or should exists, the one that can have special index etc.) the rest could go to jsonb column so basically needed no conversion from MongoDB.

As for the queries, the syntax doesn’t differ much from MongoDB, so it was trivial to change almost everything, except the join queries. But it was a joy to delete a lot of code and replace it with a small elegant join.

To be fair, it was also a good time to work on some technical depts. Since full test was carried out anyway, so few days went on there as well.

What do you like/dislike most from PostgreSQL so far?

Definitely like the:
- jsonb columns naturally
- partial and transformed indexes, e.g.:

CREATE UNIQUE INDEX “magical_idx” ON “users” (LOWER(“name”), NULLIF(“removed”, true));

So it makes impossible to create two users with only case differences, but if the user is “removed”, you can reuse the name. (Note: this is only an example, not necessarily a good design.)
- array type, principally combined with the enumerations, it just feels natural.

Don’t like:
- add the created/updated timestamp was easy in MySQL –could be added to the table definition- here it’s a bit more trouble to add triggers for every table
- the complex type feels a bit like hack, without foreign key check, and troublesome queries, I just don’t feel worth using them.

What were the tricky parts?

  • in MongoDB you have unique keys (uuid), which is hard guessable and makes brute force scans a bit harder. Fortunately there is an easy way to create non-guessable indexes in PostgreSQL, so you can have unique keys across tables with randomness. By defining the column as:
“id” varchar PRIMARY KEY NOT NULL DEFAULT (‘us’ || nextval(‘users_id_seq’)::text || (LPAD(floor(random()*1000)::text, 3, ‘0’))),

In this way every table has a unique prefix (‘us’ for ‘users’) then a serial id and finally a random number at the end to make iterating over users/files or other things harder (or impossible with blocking in place). This can be even shortened by converting the number to a base where all letters are used (https://www.postgresql.org/message-id/482B7FAF.7000902%40lorenso.com).

  • type conversion
    With full json document store there is no issue with conversion (only the ObjectId is troublesome for MongoDB), but when you start to use two systems, it is inevitable that few things will have different representation. E.g. the PostgreSQL bigInt can represent larger number precisely than the javascript. In our library, we have solved this in the way, that if the number is larger than the Number.MAX_SAFE_INTEGER, then we throw an exception but otherwise convert the number. This is a fine compromise for 95% of cases, in the rest one need to handle the conversion on it’s own. 
    For handling arrays you also have to choose:
    - use json(b) column, but then you have to validate the manage data consistency yourself 
    - use PostgreSQL e.g. enumeration, string, number or complex type array
    For the second case lib automatically convert the first 3 basic types, but for complex type it would need some additional work (either convert it to JSON in PostgreSQL or parse the string representation in javascript).
  • delete vs set to null
    For MongoDB (or more like for JSON), you are operating on objects, e.g. deleting a field really means to delete the field from the object:
    delete record.price;
    then send the whole record to the server for overwriting the previous.
    But for relational database, since you only overwrite what has changed, you have to set the field to null. Overall nothing big, but this was a small fundamental difference, that needed some special care.

And that is all, it was easier to change than we expected first (and why we kept delaying), it was just hard to find the time to completely stop development and fully focus on this huge technical dept (as we were not the target audience for MongoDB). As expected now we have transaction support, many queries became faster and have a lot of type checks that also serves as documentation …

Topics of interest

More Related Stories