Peter Cooper

@peterc

A Look at Ten New Database Systems Released in 2017

December 28th 2017

As editor of Database Weekly, a weekly newsletter on what’s new in the world of databases and data storage generally, I enjoy poking around new database systems and seeing what ideas might end up affecting everyday developers in the decades to come.

The database world isn’t packaged with mind-bending announcements on a weekly basis, but over the course of a year it never fails to surprise me how many new things we do see, and how unrelenting the progression is. 2017 was no exception, so I want to reflect on some of the interesting new releases including a transactional graph database, a geo-replicated multi-model database, and a new high performance key/value store.

TimescaleDB — A Postgres-based Time-Series Database with Automatic Partitioning

One of several exciting new extensions for PostgreSQL, Timescale is Apache 2 licensed but backed by a PhD-packed startup.

Timescale adds time-series storage features to Postgres with automatic partitioning but wrapped up in the usual Postgres interface and tooling. Queries are performed using regular SQL against a“hypertable” that provides an interface to the time-series data.

Microsoft Azure Cosmos DB — Microsoft’s Multi-Model Database

Cosmos DB is essentially a rebranding and redevelopment of Azure’s older DocumentDB but makes it easy to globally distribute data across Azure’s various datacenters. Global distribution is Cosmos DB’s killer feature, and it’s possible to have database requests routed to the nearest region containing your data with no config changes.

The “multi-model” part is also significant. While everything’s schema-less JSON under the hood, there’s an SQL API to query it, as well as a MongoDB API, Cassandra API, and even a graph database API (based on Gremlin).

One of the better ways to learn more about Cosmos is in this 15 minute video introduction on Microsoft’s Channel 9:

Cloud Spanner — Google’s Globally Distributed Relational Database

Google’s Cloud Spanner has been in the works for a long time, being first publicly explained in a pretty interesting 2012 academic paper (though development began in 2007). It was initially developed because Google needed a globally-distributed, high-availability storage system for itself, but it’s now available to the public too.

Google recognizes that the features that make Cloud Spanner suitable for its own purposes make it attractive to enterprises too, so it promises 99.999% availability, no planned downtime, and “enterprise-grade” security.

Supporting ANSI 2011 SQL, Cloud Spanner brings a battle-tested, high-availability, horizontally-scaling relational database to developers already familiar with relational database concepts.

Neptune — Amazon’s Fully Managed Graph Database Service

We’ve covered Microsoft and Google, so why not Amazon? Another database limited to a specific cloud, a preview of Neptune was announced at Amazon’s recent re:Invent conference.

Neptune promises to a be a fast and reliable graph database service and is designed to quickly bring the insights graph databases can offer to developers but without any of the operational headache. For a price, of course.

Neptune supports two standards for making queries across your graphs, the increasingly “gold standard” of Gremlin, as well as SPARQL (where your graph is treated as RDF).

YugaByte — An Open Source, Cloud-Native Database

YugaByte popped out of “stealth mode” this year and offers a database that supports both SQL and NoSQL modes of operation. Aimed directly at use in the cloud, it’s designed to “serve as the stateful complement to containers”.

Open source and built with C++, it supports Cassandra’s query langauge (CQL), as well as the Redis protocol. Support for the PostgreSQL protocol is on the way, and Spark apps can run on top of it.

The YugaByte team

YugaByte is another startup-backed effort (founded by engineers who scaled the Apache HBase platform at Facebook, no less) with the business model being an “enterprise edition” that sits alongside the open source, community edition and adds features like multi-cloud cluster orchestration, monitoring and alerting, tiered storage, and support.

Peloton — A “Self Driving” SQL DBMS

Peloton explores some interesting ideas, particularly in the area of using AI to automatically optimize the database. It also has support for byte-addressable NVM storage technology and is Apache-licensed open source.

The idea behind “self driving” databases is that it could be possible for a DBMS to operate and tweak itself autonomously. It could predict workload trends and prepare itself accordingly, without a DBA or ops people at the helm.

Perhaps unsurprisingly, Peloton stems from an academic project (from Carnegie Mellon, specifically) and one of its creators wrote an extensive article about why it was built. It’s been in development for a few years but has become more open in 2017.

JanusGraph — A Java-based Distributed Graph Database

JanusGraph is a practical, ready-to-go database with lots of integrations baked in and built on the solid foundation of TitanDB. It’s optimized for scalability and storing and querying huge graph databases while supporting transactions and a high number of simultaneous users.

It can use Cassandra, HBase, Google Cloud Bigtable, and BerkeleyDB for its storage backend, and can integrate with Spark, Giraph, and Hadoop out of the box. It even supports full-text and geolocation searches by integrating with ElasticSearch, Solr, or Lucene.

Aurora Serverless — An Instantly Scalable, “Pay As You Go” Relational Database on AWS

Another announcement from Amazon’s re:Invent conference was a serverless version of their successful Aurora database service, Aurora Serverless.

Plugging neatly into the recent trend of “serverless” platforms that promise to rid you of your scaling and operations headaches forever, the idea behind Aurora Serverless is that many database use cases don’t require a consistent level of performance or usage and that, instead, you could “pay as you go” (on a second-by-second basis) for a database that scales as you need it to.

It’s currently just in preview but promises to be a big deal in 2018.

TileDB — Storage of Massive Dense and Sparse Multi-Dimensional Arrays

TileDB is a database that started life at MIT and Intel that’s designed for storing multi-dimensional array data, a requirement commonly found in areas like genomics, medical imaging, and financial time series.

It supports numerous compression mechanisms (such as gzip, lz4, Blosc, and RLE) and storage backends (such as GFS, S3, and HDFS).

Memgraph — A High Performance, In-Memory Graph Database

The motivation behind Memgraph is to provide a tool for rapidly analyzing and using the data that comes from artificial and machine intelligence and the increasing interconnectivity of devices and IoT. The priorities, therefore, are “speed, scale, and simplicity.”

It’s still early in Memgraph’s life and it isn’t open source but it can be downloaded by request. It supports the openCypher graph query language, supports in-memory ACID transactions, and has a disk-based persistence mechanism.

How to Keep Up to Date in Future

Enjoy this roundup? I’m doing it every week in Database Weekly, a weekly newsletter devoted to the world of databases and looking at what’s new, what’s on the horizon, and what’s getting updated.

More Related Stories