If there's one database that promises both performance and cost and also delivers, then it's Amazon DynamoDB. Single digit millisecond latency, fully managed with automatic scaling, pay per use - DynamoDB is genuinely impressive. But here's the thing, there are rarely any mentions about what happens when your data model doesn't fit DynamoDB's worldview, or when you discover a limitation (I would rather say understand DynamoDB architecture better) three months into production that forces a complete redesign.
I've spent considerable time working with DynamoDB across various projects… some successful, some educational. Along the way, I've collected a list of things I desperately wish someone had told me before I started. These aren't just the basics that we can find in tutorials. These are some gotchas that surface when your table has a few million items and your access patterns have evolved beyond the original design. Let’s dive in.
1. Items & Attributes are NOT Rows and Columns
In DynamoDB, every record is an item, with attributes. If you are from a SQL background, it’s very easy and tempting to relate them to rows and columns. Even the table view in the AWS console looks like a sql table, with rows and columns.
But they are NOT ! And this fundamental difference shapes everything else about how DynamoDB works.
In SQL, a row is just… a row. It has a primary key, a unique identifier for that single row, and we can query it however we want. Need to find all rows that has year = 1994 ? Sure, just use a WHERE clause. Need to join two tables on a particular column, simple.
But none of these are possible in DynamoDB. DynamoDB does have a primary key, which has a partition key that that gets hashed, and used to decide the partition the item should live on. This isn't just a storage optimization detail we can ignore. It fundamentally constrains how we can access our data. We cannot query items without knowing their partition key, because DynamoDB literally doesn't know where to look.
In SQL databases, columns are uniform across rows. Every row has the same columns (nullable or not) and we can add an index on any column, but in DynamoDB attributes are per item. Each item can have different attributes, and there is no enforced schema. Now, this doesn’t mean, DynamoDB is similar to MongoDb or couchbase which are document databases. Attributes that aren't part of the key structure or a secondary index are essentially invisible to queries. Now if you are thinking what about GSI, hold on to that thought for a few minutes.
Why does this matter? Because every "limitation" I'm about to describe flows from this fundamental architecture. If you take one thing from this section, let it be this: stop thinking about DynamoDB as "SQL but NoSQL." It's a key-value store with some nice features. The moment you internalize that, the rest of its behavior starts making sense.
2. The Partition Key is Not Just a Primary Key
If you have a SQL background, it's tempting to think of DynamoDB's partition key as just another primary key. Pick something unique and you are done. Right?
But that’s so very wrong.
Here's what actually happens: when we write an item to DynamoDB, it takes the partition key value and feeds it through an internal hash function. The output of that hash determines which physical partition the data lands on. All items with the same partition key end up on the same partition, stored together in what is called an "item collection." This isn't just a storage detail, it fundamentally shapes what we can and cannot do with our data.
The limitations that bite:
Partition key queries require exact equality. We cannot do LIKE, CONTAINS, BEGINS_WITH, or range operations on partition keys. DynamoDB needs the complete partition key value to compute the hash and locate the partition. There's no negotiating here.
Sort key operations are limited too. We get =, <, >, <=, >=, BETWEEN, and BEGINS_WITH. Notably missing? CONTAINS, ENDS_WITH, and anything resembling SQL's LIKE.
Let me give an example of how this might become an issue. Say we're building an employee management system:
Table: employees
Partition Key: orgId
Sort Key: empId
This seems reasonable. However, if this table has data from companies that vary largely in size, that is we have a company with 5-10 employees and a company with hundreds and thousands of employees we already have a partition skew.
To make things worse, what if the empId is a non sortable id, i.e., an UUID. Now we have many more restrictions. We cannot do any range based queries to get only a few employees, unless we know the full IDs. That is
SELECT * FROM "employees" WHERE "orgid" = 'org#001' AND BEGINS_WITH("empid", '100')
vs
SELECT * WHERE "employees" WHERE "orgid" = 'org#001' AND "empid" IN ('ca17b525-c67b-4351-bddc-724efba7f966', 'b2f78338-17c5-47d4-8fb0-4732446cb598', 'c0af48b2-1101-4eb5-ad4d-ccd535e4a7ff')
Why does this happen? Sort keys are stored in a B+ tree structure. DynamoDB can efficiently traverse ranges (slide from point A to point B) but cannot teleport to arbitrary non-sequential values. It's a fundamental architectural constraint, not a missing feature.
Now with a skewed partition, if I need to find a couple of employee details from an organization that has 5000 employees, and if I dont have their empIds which is the sort key in this case, I need to read all 5K items and filter them at the application for the 2 items that I need.
This is why knowing the access patterns ahead of time, and choosing the right keys is very crucial for a DynamoDB model. Some of the best practices to manage such situations include write sharding to avoid hot partition, using composite keys with prefixes such as STATUS#ACTIVE#USER#1001 based on anticipated access patterns. But the key point is, use DynamoDB only if its the right fit for your project.
3. Filter Expressions are not a traditional WHERE clause
Remember how I said attributes that aren't keys are "invisible to queries"? Let's dig into what that actually means. DynamoDB has a feature called FilterExpression that lets us filter query results by any attribute. Sounds very similar to a WHERE clause right ? Except that it is not. Filter expressions are applied after DynamoDB reads the data, not before.
When we run a Query with a FilterExpression:
- DynamoDB uses the key condition to find matching items
- DynamoDB reads those items from storage (consuming RCUs)
- DynamoDB applies the filter
- DynamoDB returns only the filtered results
Steps 2 and 4 are the key insight. We pay for everything read in step 2, even if step 4 throws most of it away.
Query: Get orders for user123 where status = 'PENDING'
Key condition: userId = 'user123' (returns 10,000 orders)
Filter: status = 'PENDING' (matches 50 orders)
We pay for: 10,000 items read
We receive: 50 items
That's a 99.5% waste rate on read capacity. 😬
The 1MB page limit also applies before filtering. So if we're filtering a lot of data, we might need multiple pagination requests to get a small number of results. Each page reads up to 1MB, filters it down, returns maybe a handful of items, and we go back for another page.
This goes back to our fundamental point: attributes aren't columns. In SQL, a WHERE clause on any column can use indexes and query planning to minimize data read. In DynamoDB, if the attribute isn't in the key structure, we're doing a read-then-filter operation.
And, this is where the composite keys I mentioned above are generally used. A sort key such as STATUS#PENDING#2024-01-15 includes a bunch of filterable attributes.
4. Secondary Indexes Are Complete Data Copies
So far we've talked about querying with partition keys, filtering on attributes. The natural question becomes: "What if we need to query by a different attribute efficiently?" The answer is secondary indexes.
If we're used to SQL databases, we might think of indexes as lightweight pointers to the main table. A few extra bytes per row, nothing dramatic. DynamoDB's Global Secondary Indexes (GSIs) are a completely different beast. They're not pointers. They're not lightweight. They're entire separate tables with their own partition infrastructure.
Think about this table.
|
orgId (PK) |
empId (SK) |
Name |
Department |
|---|---|---|---|
|
Org#001 |
emp#001 |
John |
Engineering |
|
Org#001 |
emp#002 |
Dave |
Sales |
If you have to represent that as a Key-Value pair, the simplest way would be
{
"org#001+emp#001" = {'Name': 'John', 'Department': 'Engineering'},
"org#001+emp#002" = {'Name': 'Dave', 'Department': 'Sales'}
}
Now if you have query these employees by department, the only option is to create another map like below, isn’t it ?
{
"org#001+dept#engineering" = {'Name': 'John', 'id'': 'emp#001'},
"org#001+dept#sales" = {'Name': 'Dave', 'id': 'emp#002'}
}
This is what happens when we create secondary GSI in DynamoDB. While Dynamo abstracts all these details and makes it simple and easy for us to use the table, behind the scenes the data is copied.
So, when we write to the base table, DynamoDB replicates the relevant attributes to every affected GSI. This means:
- Every write to the base table triggers writes to all GSIs that include that item
- Updating an indexed attribute costs 2 GSI writes — one to delete the old entry, one to insert the new one
- Storage is duplicated for every projected attribute across every GSI
- GSIs have their own throughput that needs to be provisioned (or paid for in on-demand mode)
Let's do some quick math that nobody tells us upfront:
Base table write: 1 WCU
+ GSI #1 (ALL projection): 1 WCU
+ GSI #2 (ALL projection): 1 WCU
+ GSI #3 (ALL projection): 1 WCU
= 4 WCUs for a single item write
Three GSIs with ALL projection means 4x the write costs. And if we update an attribute that's a key in one of those GSIs? That GSI alone costs 2 WCUs for the update. Surprise! 😬
I've seen teams add GSIs like they're free, then wonder why their DynamoDB bill tripled.
Some of the good practices to use GSIs include,
- Use
KEYS_ONLYorINCLUDEprojection instead ofALL. Only project what's actually needed in queries against that index. - Audit GSIs regularly. Remove any that aren't being queried — they cost money even when unused.
- Design sparse indexes where possible. Only items with the GSI's key attributes get indexed, so we can filter out items that don't need to be in the index.
- Consider single-table design patterns before adding multiple GSIs. Sometimes a well-designed sort key eliminates the need for additional indexes entirely.
5. Strongly Consistent Reads Don't Work on GSIs
Another key point that most people miss is, Global Secondary Indexes only support eventually consistent reads. We cannot request strongly consistent reads from a GSI. Period.
If your code does something like:
QueryRequest request = QueryRequest.builder()
.tableName("MyTable")
.indexName("MyGSI")
.consistentRead(true) // This will fail!
.build();
DynamoDB will reject this with a validation error.
But it gets worse. GSI reads are also not monotonic. What does that mean? During replication lag, we can read a value, then read an older value, then read the newer value again:
Base table: Update item.status from "PENDING" to "COMPLETED"
GSI Read #1: Returns "COMPLETED" ✅
GSI Read #2: Returns "PENDING" 😱 Wait, what?
GSI Read #3: Returns "COMPLETED" ✅
This is completely valid DynamoDB behavior. The GSI is eventually consistent, and "eventually" means exactly that.
Why can't GSIs support strong consistency? Because GSIs live on completely separate partition infrastructure from the base table. Updates are replicated asynchronously. DynamoDB could theoretically coordinate this, but it would destroy the performance characteristics that make DynamoDB valuable.
6. Single Table Design: Powerful When Done Right, Painful When Not
If we've spent any time researching DynamoDB best practices, we've probably encountered the concept of "single table design." AWS documentation even states that we should "maintain as few tables as possible in a DynamoDB application."
But here's the thing: single table design has become one of the most misunderstood and misapplied patterns in the DynamoDB ecosystem. Some teams implement it brilliantly and reap massive benefits. Others cargo-cult the pattern without understanding the "why," ending up with a convoluted mess that's harder to maintain than the multi-table design they were trying to avoid.
What Single Table Design Actually Solves
The core problem single table design addresses is this: DynamoDB doesn't support joins. In a relational database, if we need a Customer and their Orders, we join two tables. In DynamoDB with separate tables, we'd need to:
- Query the Customers table for the customer
- Take the customerId from that response
- Query the Orders table using that customerId
- Combine the results in application code
That's two sequential network round trips. At scale, this pattern gets slower and more expensive. The latency compounds.
Single table design solves this by pre-joining related data. We store Customers and Orders in the same table, using the same partition key (e.g., customerId). Now one Query retrieves both the Customer record and all their Orders in a single request:
Query: pk = 'CUSTOMER#12345'
Returns:
- { pk: 'CUSTOMER#12345', sk: 'PROFILE', name: 'Alice', email: '...' }
- { pk: 'CUSTOMER#12345', sk: 'ORDER#001', total: 50.00, ... }
- { pk: 'CUSTOMER#12345', sk: 'ORDER#002', total: 75.00, ... }
One request. One network round trip. All related data together. This is the item collection pattern, and it's genuinely powerful.
When Single Table Design Adds Real Value
Single table design shines when:
-
We frequently need related entities together. Customer + Orders, User + Preferences + Sessions, Product + Reviews — if the access pattern is "get parent and children together," single table design eliminates the join problem.
-
Access patterns are well-defined and stable. Single table design requires knowing our queries upfront. If we can confidently list the access patterns, we can model the table to serve them efficiently.
-
We're operating at scale where latency matters. The difference between 1 request and 3 sequential requests might not matter at 100 QPS. At 100,000 QPS, it's the difference between a responsive app and a sluggish one.
-
Entities share a natural relationship. Orders belong to Customers. Comments belong to Posts. When there's a clear hierarchical relationship, modeling them in the same item collection makes intuitive sense.
When Single Table Design Adds Zero Value (Or Makes Things Worse)
1: Storing unrelated data in the same table
If our application has Users, Products, Inventory, and Analytics Events, do they all need to be in one table? Probably not. If we never query Users and Inventory together, putting them in the same table gains us nothing. We're just making the table harder to understand and operate.
Worse, we lose operational flexibility. Different data types might need different backup strategies, different capacity modes, different storage classes. With separate tables, we can configure each appropriately. With one mega-table, we're stuck with one-size-fits-all.
2: Implementing single table design but still making multiple requests
I've seen this pattern too many times: a team implements single table design with complex composite keys and overloaded GSIs, but their application code still makes separate queries for each entity type. They've added all the complexity of single table design without any of the benefits.
If the code does GetItem(customer) followed by Query(orders) followed by Query(addresses), it doesn't matter that they're all in the same table. We're still making three requests. We've gained nothing except confusion.
3: Ignoring the "adjacent" multi-table alternative
Here's a secret: we can get most of the latency benefits without full single table design. If Customer and Orders are in separate tables but both keyed by customerId, we can fetch them in parallel:
# Parallel requests - not sequential!
customer_future = executor.submit(get_customer, customer_id)
orders_future = executor.submit(get_orders, customer_id)
customer = customer_future.result()
orders = orders_future.result()
Total latency is the max of the two requests, not the sum. For many applications, this "parallel multi-table" approach provides 80% of the benefit with 20% of the complexity.
4: Forcing single table design when access patterns are unknown
Single table design requires knowing access patterns upfront. If we're building an early-stage product where requirements change weekly, locking ourselves into a rigid single-table schema is asking for pain. We'll end up with expensive data migrations every time the product evolves.
For rapidly evolving applications, a simpler multi-table design (or even a different database entirely) might be more appropriate until access patterns stabilize.
While the above points are more focused on proper data modeling with DynamoDB, I will also talk about a few operational issues, that are something to be aware of.
7. JSON Stored as String = No Partial Updates
DynamoDB has two ways to store structured data: the String type (JSON serialized to a string) and Document types (native Map and List). If we're storing JSON as a String:
{
"address": "{\"city\":\"Seattle\",\"zip\":\"98101\"}"
}
We cannot do partial updates. Want to change just the city? We must:
- Read the entire item
- Deserialize the JSON string
- Modify the city
- Re-serialize to JSON
- Write the entire attribute back
That's a read + write for every tiny change, plus application code to handle the serialization.
With native Document types (Map/List):
{
"address": {
"M": {
"city": {"S": "Seattle"},
"zip": {"S": "98101"}
}
}
}
We can do:
UpdateExpression: "SET address.city = :newCity"
One update, one attribute, done. No read required. Much cheaper, much simpler.
Gotcha within the gotcha: The parent map must exist before we can update nested attributes. If address doesn't exist yet on an item, SET address.city = :value will fail. We need to initialize the structure first or use a more complex update expression.
Why do people use String for JSON anyway? Often it's because we're serializing objects from application code without thinking about it. The ORM or SDK might default to JSON string serialization. Or the data came from another system as a JSON blob. Either way, we've now lost the ability to efficiently update parts of that data.
Best suggestion here is to think about datatypes and check early how they are stored in the database, and consider the datatypes that best work for the application needs. If you need frequent partial updates to JSON data, then using native Map is a much better option, because you can actually UPDATE a record, instead of READ + UPSERT.
8. Global Tables + DAX = Stale Cache Problem
DynamoDB Accelerator (DAX) is fantastic for read-heavy workloads. Microsecond latency from an in-memory cache. DynamoDB Global Tables are fantastic for multi-region deployments — automatic replication across regions.
Using them together? That's where things get complicated.
The problem: DAX is regional. It only knows about writes that happen in its region. Global Tables replicate directly to DynamoDB in other regions, completely bypassing DAX.
Here's what happens:
Region A: Write item → Updates DynamoDB (A) → Updates DAX (A)
↓
Replicates to DynamoDB (B) → DAX (B) has no idea
User in Region B reads via DAX → Gets stale data until TTL expires
Region B users might be reading data that's hours old, depending on DAX TTL settings, while the actual DynamoDB table has the latest data sitting right there.
Some of the best practices to follow when using Global tables with Dax would be
- Use very short TTL values (seconds, not minutes)
- Accept and design for eventual consistency across regions
- Consider implementing explicit cache invalidation for critical data
For strong cross-region consistency, skip DAX entirely and accept the latency of going directly to DynamoDB.
9. We Cannot Bulk Delete by Partition Key
This one genuinely surprised me the first time I encountered it. Coming from SQL, I expected something like:
DELETE FROM Orders WHERE userId = 'user123'
DynamoDB has no equivalent. The DeleteItem API requires the complete primary key — partition key AND sort key (for composite keys). There's no DeleteByPartitionKey operation.
So how do we delete all orders for user123 if there are 10,000 of them?
Step 1: Query(userId = 'user123') → paginate through all 10,000 items
Step 2: For each item, extract the orderId (sort key)
Step 3: Call DeleteItem for each, or use BatchWriteItem in batches of 25
Step 4: That's 400+ API calls minimum
Step 5: 😭
BatchWriteItem can delete up to 25 items per request, but we still need the complete primary key for each item. There's no shortcut.
The only truly "bulk" delete? Drop and recreate the table. I'm not joking — for large datasets, this is often faster and cheaper than iterating through millions of items.
This is one reason to think about TTLs upfront, and add necessary timestamp attributes, because cleaning up obsolete data at the later time in production can be a nightmare.
10. PartiQL: The SQL That can Hide Expensive Scans
I was genuinely excited by the PartiQL support for DynamoDB. Not just because its SQL-like syntax but some of the operations we discussed above are possible only through PartiQL, and the Java SDK library at least do not have all the operations supported. And then I learned why it's actually dangerous.
PartiQL looks like SQL, feels like SQL, but it absolutely does not behave like SQL. The critical difference? We cannot tell by looking at a PartiQL statement whether it will execute as an efficient Query or an expensive full-table Scan.
Pop quiz: which of these will scan the entire table?
-- Query 1
SELECT * FROM Orders WHERE OrderID = 100
-- Query 2
SELECT * FROM Orders WHERE OrderID > 100
-- Query 3
SELECT * FROM Orders WHERE Status = 'PENDING'
-- Query 4
SELECT * FROM Orders WHERE OrderID = 100 OR Status = 'PENDING'
If OrderID is the partition key:
- Query 1: Efficient Query (exact partition key match)
- Query 2: Full Table Scan (range on partition key not allowed)
- Query 3: Full Table Scan (Status isn't a key)
- Query 4: Full Table Scan (OR breaks the partition key optimization)
They all look like simple regular SQL. Three of them will read every single item in the table.
With the native DynamoDB API, we'd explicitly call Scan() and feel the pain in our fingers as we type it. PartiQL lets us accidentally scan a million-item table with a query that looks completely reasonable.
And even though PartiQL looks like SQL, it can only support what DynamoDB supports. That is
- No JOINs (it's still NoSQL)
- No aggregate functions (COUNT, SUM, AVG? Nope.)
- No subqueries or CTEs
ORDER BYrequires a WHERE clause on the partition key- GSIs must be explicitly named in the FROM clause — no automatic index selection
- No
LIKEoperator
The best and easy way to avoid this mishap is to restrict table scan on the tables using IAM Policy, this will avoid accidental table scans by a poorly written PartiQL.
11. Transactions: Handle With Extreme Caution
DynamoDB transactions are powerful — ACID guarantees across up to 100 items! But they come with gotchas that can turn a production environment into a debugging nightmare.
The basics we probably know:
- Max 100 items per transaction (increased from 25 in September 2022)
- Each transactional operation consumes 2x the capacity units of a regular operation
- 4MB total data limit per transaction
The part that will haunt us:
DynamoDB uses Optimistic Concurrency Control (OCC), not locks. This means transactions can fail due to conflicts with other concurrent operations — and debugging these failures is genuinely painful.
When a transaction fails, we get a TransactionCanceledException with a CancellationReasons array. Sounds helpful, right? Here's the catch: the full CancellationReasons details are a less helpful string representation. (In most SDKs)
And the reasons themselves? They tell us that something failed, not which specific item or why it conflicted:
TransactionCanceledException: Transaction cancelled, please refer
cancellation reasons for specific reasons [TransactionConflict, None, None, None]
Great. One of the four items conflicted. Which one? Why? The error doesn't say. Good luck!
Why this is especially insidious: In development and staging environments with low traffic, we rarely hit transaction conflicts. Everything works beautifully. Then we deploy to production with 1000x the concurrency, and suddenly TransactionConflict errors are everywhere — and we have no idea why because proper logging was never instrumented for them.
As best practices
- Log everything, Specifically log the complete
CancellationReasonsarray with context about what items were in the transaction:
catch (TransactionCanceledException e) {
log.error("Transaction failed for items: {}. Reasons: {}",
itemKeys,
e.getCancellationReasons().stream()
.map(r -> r.getCode() + ": " + r.getMessage())
.collect(Collectors.toList()));
}
- Keep transactions small. Fewer items = lower probability of conflicts.
- Design for idempotency when possible. Sometimes transactions can be avoided entirely with conditional writes and careful operation ordering.
- Implement exponential backoff retry specifically for
TransactionConflicterrors — they're often transient. - Load test with production-like concurrency to surface timing-related conflicts before they hit production.
Final Thoughts: DynamoDB Rewards the Prepared
DynamoDB is genuinely powerful. For the right use cases — high-scale, predictable access patterns, single-digit millisecond requirements — it's hard to beat. But it's also unforgiving. The constraints I've described aren't bugs; they're fundamental to how DynamoDB achieves its performance guarantees.
The engineers I've seen struggle most with DynamoDB are those who approach it like a flexible SQL database. They design their schema first, figure out queries later, and add indexes when things get slow. That approach works reasonably well with PostgreSQL. With DynamoDB, it leads to expensive rewrites and frustrated teams.
The engineers who thrive with DynamoDB do the opposite: they start with their access patterns, work backward to the data model, and treat every limitation as a design constraint to work within, not around.
If there's one thing I hope you take from this article, it's this: DynamoDB's documentation tells us what we can do. Understanding what we can't do — and why — is what separates successful DynamoDB projects from painful ones.
Know the access patterns. Understand the constraints. Design accordingly. And maybe bookmark this article for the next time someone's tempted to add "just one more GSI."
Happy building!
