Database storage is a crucial aspect of any data-intensive application. How data is stored, organized, and accessed can have a significant impact on the performance, scalability, and reliability of the system. In this article, we will explore Log Structured Merge Trees that use SSTables and Memtables to provide fast and reliable storage. We will explore each component in turn and understand how they work together.
Before we explore how Cassandra stores data, let’s understand how it models data. Cassandra uses tables similar to SQL databases. It also supports an SQL-like language (CQL) that enables us to insert, update and delete rows. However, these tables are more akin to key-value stores when we consider how they are accessed.
Here is an example table for employee data.
Department (partition key) |
Join Year (cluster column) |
Name |
---|---|---|
Accounting |
2022 |
John |
Engineering |
2021 |
Jack |
Engineering |
2022 |
James |
Sales |
2020 |
Sam |
In any SQL database, a row is uniquely identified by a Primary Key.
In Cassandra and Scylla, a Primary Key also serves as the only way to access a row. A Primary Key in Cassandra can be either Simple or Composite.
Cassandra is a distributed database that runs on multiple nodes. The Partition Key’s hash determines which node (or nodes) will store the row.
With a Simple Primary Key (i.e., only a Partition Key), our table acts as a key-value store. We can retrieve the row’s contents by providing the key to Cassandra.
Select * from Employees where department = ‘Engineering’ and Year >2020 and Year < 2022;
In a distributed database like Cassandra, each node stores only a fraction of the table’s rows.
Next, let’s examine how Memtables and SSTables store these rows.
Memtables stand for Memory Tables. It is an in-memory data structure that holds data before it is flushed to disk as an SSTable. A Memtable is basically a hash map.
SSTables stands for Sorted Strings Table. It is a persistent file format that stores data on disk in a sorted and immutable way. An SSTable consists of multiple data blocks, each containing a set of key-value pairs. The keys are sorted in ascending order within each block, and the blocks are sorted by their first key. Each SSTable also has an index file that maps each key to its corresponding block location, allowing for fast lookups.
SSTables are created when data stored in memory (in Memtables) is flushed to disk periodically or when a certain memory threshold is reached. Once an SSTable is written to disk, it cannot be modified. Therefore, any updates or deletes on existing data are stored in a new SSTable. This means that there can be multiple SSTables for the same data, with different versions or timestamps.
An LSM tree is a tree or collection of Memtable and SSTables. The topmost level consists of a single Memtable. The second Level and below are one or more SSTables. As the levels grow so does the amount of SSTables or sizes of SSTables (this depends on the compaction strategy). The basic workflow is as follows:
When data is inserted into the LSM tree, the following happens.
As the number of SSTables grows, we see the following trends
Compaction helps to reduce these problems by merging two or more SSTables together. The resulting SSTable will only hold the newest copy of data. Since SSTables are sorted this process of merging is very efficient. Note here that we never alter an existing SSTable, as said before they are immutable. This means we can do compaction lock-free while the system is still using the SSTables.
There are many different compaction strategies. Size Tiered and Leveled compaction are two of the more popular strategies:
In Size Tiered Compaction Strategy the size of SSTable grows as we go down the levels. Simply once we have a fixed number of SSTables in level 1, they are all combined to create a larger SSTable in level 2. The SSTables in level 1 are deleted. When there is a predefined number of SSTables collected in level 2, they are again combined/compacted to create an even larger SSTable in level 3.
Compaction is a complex topic that we will not cover in detail in this article, but it is important to note that the choice of compaction strategy can affect the database performance significantly.
Log Structured Merge Tree, Memtables and SSTables are common database storage concepts that are used by some NoSQL databases, such as Apache Cassandra and ScyllaDB, to store data on disk and in memory, respectively. They offer a trade-off between performance and durability, while also enabling scalability and fault tolerance in distributed systems. However, they also pose some challenges that require careful design and implementation choices.
The lead image for this article was generated by HackerNoon's AI Image Generator via the prompt "a tree with computer fruits"