As editor of Database Weekly, a weekly newsletter on what’s new in the world of databases and data storage generally, I enjoy poking around new database systems and seeing what ideas might end up affecting everyday developers in the decades to come.
The database world isn’t packaged with mind-bending announcements on a weekly basis, but over the course of a year it never fails to surprise me how many new things we do see, and how unrelenting the progression is. 2017 was no exception, so I want to reflect on some of the interesting new releases including a transactional graph database, a geo-replicated multi-model database, and a new high performance key/value store.
Timescale adds time-series storage features to Postgres with automatic partitioning but wrapped up in the usual Postgres interface and tooling. Queries are performed using regular SQL against a“hypertable” that provides an interface to the time-series data.
Timescale | an open-source time-series SQL database optimized for fast ingest, complex queries and…_An open-source time-series database fully compatible with Postgres for fast ingest and complex queries._www.timescale.com
Cosmos DB is essentially a rebranding and redevelopment of Azure’s older DocumentDB but makes it easy to globally distribute data across Azure’s various datacenters. Global distribution is Cosmos DB’s killer feature, and it’s possible to have database requests routed to the nearest region containing your data with no config changes.
The “multi-model” part is also significant. While everything’s schema-less JSON under the hood, there’s an SQL API to query it, as well as a MongoDB API, Cassandra API, and even a graph database API (based on Gremlin).
One of the better ways to learn more about Cosmos is in this 15 minute video introduction on Microsoft’s Channel 9:
Google’s Cloud Spanner has been in the works for a long time, being first publicly explained in a pretty interesting 2012 academic paper (though development began in 2007). It was initially developed because Google needed a globally-distributed, high-availability storage system for itself, but it’s now available to the public too.
Google recognizes that the features that make Cloud Spanner suitable for its own purposes make it attractive to enterprises too, so it promises 99.999% availability, no planned downtime, and “enterprise-grade” security.
Supporting ANSI 2011 SQL, Cloud Spanner brings a battle-tested, high-availability, horizontally-scaling relational database to developers already familiar with relational database concepts.
Cloud Spanner | Automatic Sharding with Transactional Consistency at Scale | Google Cloud Platform_The only enterprise-grade, globally-distributed, and strongly consistent relational database service._cloud.google.com
We’ve covered Microsoft and Google, so why not Amazon? Another database limited to a specific cloud, a preview of Neptune was announced at Amazon’s recent re:Invent conference.
Neptune promises to a be a fast and reliable graph database service and is designed to quickly bring the insights graph databases can offer to developers but without any of the operational headache. For a price, of course.
Neptune supports two standards for making queries across your graphs, the increasingly “gold standard” of Gremlin, as well as SPARQL (where your graph is treated as RDF).
YugaByte popped out of “stealth mode” this year and offers a database that supports both SQL and NoSQL modes of operation. Aimed directly at use in the cloud, it’s designed to “serve as the stateful complement to containers”.
Open source and built with C++, it supports Cassandra’s query langauge (CQL), as well as the Redis protocol. Support for the PostgreSQL protocol is on the way, and Spark apps can run on top of it.
The YugaByte team
YugaByte is another startup-backed effort (founded by engineers who scaled the Apache HBase platform at Facebook, no less) with the business model being an “enterprise edition” that sits alongside the open source, community edition and adds features like multi-cloud cluster orchestration, monitoring and alerting, tiered storage, and support.
Peloton explores some interesting ideas, particularly in the area of using AI to automatically optimize the database. It also has support for byte-addressable NVM storage technology and is Apache-licensed open source.
The idea behind “self driving” databases is that it could be possible for a DBMS to operate and tweak itself autonomously. It could predict workload trends and prepare itself accordingly, without a DBA or ops people at the helm.
Perhaps unsurprisingly, Peloton stems from an academic project (from Carnegie Mellon, specifically) and one of its creators wrote an extensive article about why it was built. It’s been in development for a few years but has become more open in 2017.
JanusGraph is a practical, ready-to-go database with lots of integrations baked in and built on the solid foundation of TitanDB. It’s optimized for scalability and storing and querying huge graph databases while supporting transactions and a high number of simultaneous users.
It can use Cassandra, HBase, Google Cloud Bigtable, and BerkeleyDB for its storage backend, and can integrate with Spark, Giraph, and Hadoop out of the box. It even supports full-text and geolocation searches by integrating with ElasticSearch, Solr, or Lucene.
Another announcement from Amazon’s re:Invent conference was a serverless version of their successful Aurora database service, Aurora Serverless.
Plugging neatly into the recent trend of “serverless” platforms that promise to rid you of your scaling and operations headaches forever, the idea behind Aurora Serverless is that many database use cases don’t require a consistent level of performance or usage and that, instead, you could “pay as you go” (on a second-by-second basis) for a database that scales as you need it to.
It’s currently just in preview but promises to be a big deal in 2018.
TileDB is a database that started life at MIT and Intel that’s designed for storing multi-dimensional array data, a requirement commonly found in areas like genomics, medical imaging, and financial time series.
It supports numerous compression mechanisms (such as gzip, lz4, Blosc, and RLE) and storage backends (such as GFS, S3, and HDFS).
The motivation behind Memgraph is to provide a tool for rapidly analyzing and using the data that comes from artificial and machine intelligence and the increasing interconnectivity of devices and IoT. The priorities, therefore, are “speed, scale, and simplicity.”
It’s still early in Memgraph’s life and it isn’t open source but it can be downloaded by request. It supports the openCypher graph query language, supports in-memory ACID transactions, and has a disk-based persistence mechanism.
Memgraph | Memgraph - The High Performance, In-Memory, Transactional Graph Database_A graph database bringing developers the simplicity, speed and scale needed to build the next generation of intelligent…_memgraph.com
Enjoy this roundup? I’m doing it every week in Database Weekly, a weekly newsletter devoted to the world of databases and looking at what’s new, what’s on the horizon, and what’s getting updated.