paint-brush
Dynamodb Partition Keys: How They Work Under the Hoodby@praveenmuthu22
147 reads

Dynamodb Partition Keys: How They Work Under the Hood

by Praveen MuthukumaranaAugust 8th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Dynamodb is a distributed system. This means that unlike in traditional Relational Database Management System (RDBMS), all the data is not stored in a single server. In a Dynamodb table, items are grouped into partitions. The partitions in a table are stored physically in different nodes (servers) within the region in which the table was created.
featured image - Dynamodb Partition Keys: How They Work Under the Hood
Praveen Muthukumarana HackerNoon profile picture

Primary Key

In a relational database, records are uniquely identified by the Primary Key. No two records can have the same primary key.


This concept is the same in Dynamodb as well. The Primary Key of a table is either the Partition key or a combination of the Partition key and the Sort key.


In the table given below, the student ID is the partition key. It is unique for each of the items.

student_id (Partition Key)

name

email

date_of_birth

cb15ffa1-5e86-4c51-b341

John Doe

[email protected]

2002-01-15

cbfe7f85-6b85-4d75-8030

Jane Smith

[email protected]

2000-05-22

2accafed-6a3e-4f10-b16e

Michael Johnson

[email protected]

2001-08-30

93b98d8b-6de3-452e-s

Emily Davis

[email protected]

1999-11-10

93b98d8b-6de3-452e-8

William Brown

[email protected]

2003-03-25

In the below example, we store the content of articles in this table. Since the article content is long, it is broken into multiple parts and stored in multiple table items to prevent exceeding the 400kb limit Dynamodb imposes on item size.

ArticleId (Partition Key)

Part (Sort Key)

Content

101

1

<p>Introduction to DynamoDB with examples and illustrations...</p>

101

2

<p>Setting up your DynamoDB environment with detailed steps...</p>

101

3

<p>Basic operations in DynamoDB such as <strong>CRUD</strong>...</p>

102

1

<p>Understanding different types of databases with pros and cons...</p>

102

2

<p>Benefits of NoSQL databases over SQL databases...</p>

103

1

<p>Advanced DynamoDB features like <em>global tables</em>...</p>

103

2

<p>Performance optimization tips and tricks in DynamoDB...</p>

A single article includes multiple items, so, the partition key cannot be unique. Therefore, a combination of the partition key and the sort key is used to uniquely identify each item. This is called a composite primary key.

Partitions

Dynamodb is a distributed system. This means that unlike in traditional Relational Database Management System (RDBMS), all the data is not stored in a single server. Instead, they are spread across multiple servers.


In a Dynamodb table, items are grouped into partitions. Think of partitions as boxes within the table into which each item is placed. The partitions in a table are stored physically in different nodes (servers) within the region in which the table was created.


To simplify, you put each item of the table into different partitions (think of them as boxes) and then keep those boxes in different locations.

How Dynamodb Stores Items

This is where the partition key comes into the picture. When you add a new item to the table, the partition key goes through a hash function. A hash function is an algorithm that takes in a value as an input and then outputs a hash code. In this case, the hash code is a fixed-size integer. When the same value is input into the hash function multiple times, it will always produce the same hash code.


Each partition in a table is associated with a hash code range.


To simplify, let’s say that a table has the below partitions associated with the given ranges.

  • Partition 1 — 100 - 200
  • Partition 2 — 201 - 301
  • Partition 3 — 302 - 402
  • Partition 4 — 403 - 503


We are adding this item to the table:

{
  "student_id": "cb15ffa1-5e86-4c51-b341-48d1aed0f84f",
  "name": "John Doe",
  "email": "[email protected]",
  "date_of_birth": "2002-01-15"
}

When the student_id is input to the hash function, the output is 352.


Since this is within 302 - 402, the item will go into Partition 3.

How Dynamodb Retrieves Items

Unlike relational databases, Dynamodb does not offer too much flexibility on how data is queried.


You can only query data using a Partition Key. For example, if you want to query John Doe from the Students table, you can only do so if you know his student_id. To query by any other attribute, you need to create a Global Secondary Index (discussed later). Or perform a scan operation which is computationally very expensive.


This limitation is because items in the table are stored in different partitions that are located at different nodes. Dynamodb needs to know the partition where the item is stored so that it only has to search within that partition. In contrast, performing a scan operation is going through all the items in the table to find the one you are looking for.


Suppose we are querying for John Doe using his student_id which is the partition key. First, Dynamodb will input the student_id into the hash function. The hash function will output the same hash code that was generated when the item was added to the table. By looking at the hash code, Dynamodb can know the partition in which the item is located.


Inside the partition, items are stored using a data structure called a B-Tree where the items are sorted by the partition key if it is unique. If we are using a composite primary key and the partition key is not unique, items will be sorted by the sort key. This allows Dynamodb to search the item within the partition very efficiently with time complexity of O(logn).

Pros and Cons of This Behavior

As mentioned earlier, the limitation of Dynamodb is that queries are not as flexible as in relational databases. When you are designing a Dynamodb table, you have to consider all querying patterns that will be used by your application and decide on the Primary Key pattern and indexes to enable them.


But on the upside, for these query patterns that are supported by your table keys, data can be fetched extremely efficiently compared to an RDBMS. This is because, unlike in an RDBMS, when the Dynamodb table grows in size, the underlying infrastructure can be scaled horizontally by spreading the data across more nodes. The query knows the precise location of each item using the partition key.


Therefore, there will be no change in performance or latency of the table as it grows in size.

How Partition Key Works in a Local Secondary Index (LSI)

Consider this table.

Department (Partition Key)

empId (Sort Key)

Name

Age

Location

DOJ

HR

1001

John Doe

30

New York

2020-05-15 21:35:51

IT

1002

Jane Smith

28

San Francisco

2019-08-10 01:20:55

Finance

1003

Mike Brown

35

Chicago

2018-03-22 08:49:11

Marketing

1004

Linda White

32

Seattle

2021-06-30 21:22:48

IT

1005

James Green

29

Austin

2017-11-12 07:14:55

HR

1006

Emily Davis

27

New York

2022-02-05 21:07:59

There is a requirement to query the employees in a specific department who joined after a specific date.


This can be achieved by creating an LSI with DOJ as the sort key.

Department (Partition Key)

DOJ (Sort Key)

empId

Name (Projected Attribute)

IT

2017-11-12 07:14:55

1005

James Green

Finance

2018-03-22 08:49:11

1003

Mike Brown

IT

2019-08-10 01:20:55

1002

Jane Smith

HR

2020-05-15 21:35:51

1001

John Doe

Marketing

2021-06-30 21:22:48

1004

Linda White

HR

2022-02-05 21:07:59

1006

Emily Davis

Note that the LSI has the same partition key as the associated item in the base table. Therefore, LSI items are stored in the same partition as their base table counterpart. LSI is the same items as the base table sorted differently. In this case, they are sorted by DOJ. Now, you can use this LSI to query for employees who joined within a specific date range.


Note that similar to the base table, the primary key of an LSI should be unique. There cannot be two employees who are from the same department and joined on the exact same date and time!

How Partition Key Works in a Global Secondary Index (GSI)

Consider the same table as above.

Department (Partition Key)

empId (Sort Key)

Name

Age

Location

DOJ

HR

1001

John Doe

30

New York

2020-05-15 21:35:51

IT

1002

Jane Smith

28

San Francisco

2019-08-10 01:20:55

Finance

1003

Mike Brown

35

Chicago

2018-03-22 08:49:11

Marketing

1004

Linda White

32

Seattle

2021-06-30 21:22:48

IT

1005

James Green

29

Austin

2017-11-12 07:14:55

HR

1006

Emily Davis

27

New York

2022-02-05 21:07:59

We have another requirement to query employees by their ID without knowing their department. This query cannot be performed using an LSI since we don’t know the partition key.


So, we create a GSI where the partition key is the empId.

empId (Partition Key)

Department

Name

Age

Location

DOJ

1001

HR

John Doe

30

New York

2020-05-15 21:35:51

1002

IT

Jane Smith

28

San Francisco

2019-08-10 01:20:55

1003

Finance

Mike Brown

35

Chicago

2018-03-22 08:49:11

1004

Marketing

Linda White

32

Seattle

2021-06-30 21:22:48

1005

IT

James Green

29

Austin

2017-11-12 07:14:55

1006

HR

Emily Davis

27

New York

2022-02-05 21:07:59

Since the partition key of the GSI is different from the base table, the items in the GSI will be stored in a partition different from its corresponding item in the base table.


Think of the GSI as a separate copy of the table (with limited attributes that you select from the base table). Same with the LSI, the primary key should be unique.


You can use the GSI to query employees by their ID.

Recap

  • Primary Key uniquely identifies items in a Dynamodb table. It can be just the partition key if that is unique or a combination of the partition key and sort key if the partition key is not unique.


  • Based on the partition key of a record, Dynamodb puts it into different partitions. Partitions in a table are stored in different nodes (servers) within the region where the table was created.


  • In queries, Dynamodb uses the partition key to know the partition in which the item is stored and hence the exact server where the item resides.


  • Within a partition, items are stored in a sorted order of partition key if it is unique or else the sort key. This allows Dynamodb to efficiently search for an item within the partition.


  • LSI is like a copy of the table sorted using a different attribute but items still have the same partition key and are stored in the same partition as the corresponding base table item.


  • GSI is also like a copy of the table but the items have a different partition key, and they are stored in partitions different to the corresponding items in the base table.