Django Transactions & PostgreSQL Locks: Why Your Database is Slower Than You Think

This is the finale in a three-part series on how Django transactions work. I cover everything from the app and framework code all the way to the database layer:

Part 1 — Python: How Django Transactions Work
Part 2 — Python: Django, what should you NEVER put in a transaction?

Content Overview

Part 2 Summary
TLDR, Summary
Background — Locks & De-conflicting Concurrent Operations
Table Locks
Row Locks
Sources

Prior Post Summary - You Should NEVER put these in Transactions

In the last part of this series, we explored which DB operations and code to avoid putting in transactions to prevent slow responses and outages, plus what you can put in the transaction to maintain its atomic behavior.

In
- Reversible operations on the DB Reversible and related business logic
Risky
- Slow queries
- Operations on multiple tables
- Data migrations
- Changes to schema (table/column structure)
Out
- Irreversible operations/code
- Blocking calls (network requests, file reading, etc.)

By managing the code within transactions, we control which locks are held in transactions and for how long. This lets us minimize the amount of time transactions take, lower the DB’s latency (locks aren’t held for too long), and ensure other critical DB operations are not blocked. But how do DB operations block each other? Which ones are conflicting?

TLDR Summary

Locks exist because DBs can have multiple users executing commands simultaneously, e.g., one user performs a DROP TABLE command while another user reads from that table. This would cause an error or, worse, data corruption. To prevent this, every DB operation acquires and holds a lock specific to the permissions that the operation needs, e.g., DROP TABLE acquires an ACCESS EXCLUSIVE lock on the table being dropped because it requires exclusive access to drop that table safely. In contrast, SELECT acquires ACCESS SHARE lock, which allows it to read the table data but does not block other operations from reading or writing to that table. If neither operation can attain those locks on the table or another operation is holding a conflicting lock, the operation waits for the locks to be released.

There are two categories of PSQL locks — table and row.

Since table-level locks affect the entire table, they are the riskiest to hold since they block operations on the entire table. We want to minimize how long we hold table locks because the longer we hold a table lock the longer other connections will have to wait which can lead to timeouts and outages.

Row level locks affect a single row/record, these locks prevent operations on that particular record and are low risk but if we put a table migration in a transaction we can inadvertently hold all the locks on every row of that table thereby locking the entire table. With these locks, we want to minimize how many of them we hold. The more row locks we hold, the more records on the table are inaccessible.

Here are the PSQL commands (operations) broken down by the severity of what they block.

Everything (reads/writes)
- ALTER INDEX/TABLE+++
- DROP TABLE
- TRUNCATE
- REINDEX
- CLUSTER
Writes (inserting, updating, and deleting data)
- CREATE TRIGGER
- ALTER TABLE++
- CREATE INDEX**

Continue reading for more details.

+++,++,** Check DB Operations That Block Table Access for the subcommand

Background — Locks & De-conflicting Concurrent Operations

Locks exist to handle the problems that arise from concurrent connections. When we allow our databases to be operated (read/written) by multiple connections simultaneously, we run into conflicts. What if one connection tries to delete a row while another is querying? Or worse, what if one connection adds a new column to a table while another inserts a new row using the old schema? Locks prevent the errors and data corruption that can occur with simultaneous operations on the DB. The locks are held by the transactions initiated by the connection — every PSQL command is wrapped automatically in a transaction.

Each lock blocks operations that might cause an error or data corruption if executed simultaneously with the current operation. Some locks are very permissive and let almost all other operations occur simultaneously, e.g., the locks held by SELECT queries allow most other commands to be executed. These promiscuous locks can be held by multiple transactions at a time because these locks don’t conflict with each other (they don’t block each other’s operations). While other locks are exclusive and block multiple or all other operations, e.g., the locks held by certain ALTER TABLE/COLUMN sub-commands. These locks can only be held by one transaction at a time because they block all other operations and hence block themselves — we can’t execute two DROP TABLE commands on the same table simultaneously.

The pg_locks table tracks all of this, along with which transaction is holding the locks. We can use this table to display outages and timeouts caused by lockouts and deadlocks.

DB Operations that Block Table Access

Every operation on the DB asks for and acquires a table lock before executing it. Most of these locks are loose and allow other operations — doing a SELECT query (filter in Django) will allow all other operations except ALTER INDEX/TABLE (Django migrations that change a column or table).

This table makes it obvious that migrations (changes to the schema) are the riskiest types of changes — CREATE INDEX and ALTER INDEX/TABLE are run in migrations.

Run:

./manage.py sqlmigrate APP_NAME MIGRATION_NUM

To find out which operations a migration executes and if they are risky. If they are, make sure the migration is quick and run it after hours (when no other long-running transactions are running).

DROP TABLE is not included in this because it is obvious that it blocks all other commands (it acquires the same access exclusive lock as ALTER INDEX/TABLE). Paradoxically DROP TABLE wont cause your app to crash due to a lockout because it does not usually take a long time to execute — unless there are a many constraints on the table and references to the table.

fk insert/update/delete will lock any referenced tables (following foreign keys on the table being operated on)
*create index concurrently
** create index (without concurrently)
- only specific alter index/table subcommand:
- alter index rename
- alter table set statistics
- alter table set (attribute)
- alter table reset (attribute)
- alter table add foreign key
- alter table validate constraint
- alter table cluster on
- alter table set without cluster
- alter table set (storage_parameter)*
- alter table attach partition (on parent table)
- alter table detach partition*
++ only for:
- alter table add foreign key (on self and referenced table)
- alter table disable/enable trigger”
+++ for all other subcommands of alter index/table
- if there are two subcommands in one operation — the subcommand with the most stringent lock will be used

DB Operations that Block Row (Data) Updates Row Locks — Command Conflicts

Besides table locks, there are row or data locks that block write access to a specific row in a table. Unlike table locks, row locks don’t block reading data, so they are less risky, but they can cause the same damage a table lock does in specific scenarios.

Suppose we are migrating data in a table — going through all the rows in it and updating a deprecated constant or denormalizing a JSON object, for example. We will lock the entire table if this migration happens in a transaction, i.e., if we wrap the function that is doing that migration in an atomic() decorator or context. This is due to the operations on each record locking each row on the table and the transaction holding onto the row locks until the entire migration is finished. This will block writes to the table until every row is migrated.

A better solution is to NOT wrap the entire migration in a transaction block. If we need to alter multiple columns for each row, then wrap only the row operations in the migration. This way, after every row is changed, the lock is released.

Django: Printing the SQL that Migrations Run

./manage.py sqlmigrate APP_NAME MIGRATION_NUMBER/NAME

This will print the operations that will be run during the migration. A comment in the query will explain what each operation does.

For a no-op migration — a migration that does not change the DB, like updates to TextChoice and similar columns — the result will look like this:

BEGIN;
 - 
 - Alter field transaction_status on transactions
 - 
COMMIT;

An empty transaction with a comment inside of it describing what the transaction represents.

This migration does nothing—hence the SQL comments in the transaction block. The django_migrations table will be updated to track which migrations have been run, but that shouldn’t impact the app.

For migrations that actually change the DB, the output will look like this:

BEGIN;
 - 
 - Add field reverted to transactions
 - 
ALTER TABLE "transactions" ADD COLUMN "reverted" timestamp with time zone NULL;
COMMIT;

This alters a table by adding a column that blocks all reads and writes.

Sources

Postgres Locks — A Deep Dive
- Great article, highly recommend reading this FIRST then the docs.
[PSQL Locks](https://Postgres Locks - A Deep Dive Great article, highly recommend reading this FIRST then the docs.)
- Table Locks
  - Two transactions cannot hold locks of conflicting modes on the same table at the same time. (However, a transaction never conflicts with itself. For example, it might acquire ACCESS EXCLUSIVE lock and later acquire ACCESS SHARE lock on the same table.) Non-conflicting lock modes can be held concurrently by many transactions. Notice in particular that some lock modes are self-conflicting (for example, an ACCESS EXCLUSIVE lock cannot be held by more than one transaction at a time) while others are not self-conflicting (for example, an ACCESS SHARE lock can be held by multiple transactions).
  - Once acquired, a lock is normally held until the end of the transaction. But if a lock is acquired after establishing a savepoint, the lock is released immediately if the savepoint is rolled back to. This is consistent with the principle that ROLLBACK cancels all effects of the commands since the savepoint. The same holds for locks acquired within a PL/pgSQL exception block: an error escape from the block releases locks acquired within it.
- Row Locks
  - Note that a transaction can hold conflicting locks on the same row, even in different subtransactions; but other than that, two transactions can never hold conflicting locks on the same row. Row-level locks do not affect data querying; they block only writers and lockers to the same row. Row-level locks are released at transaction end or during savepoint rollback, just like table-level locks.
- Deadlocks & Transactions
  - The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)
  - Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit locking is not used). Consider the case in which two concurrent transactions modify a table. [The first transaction updates row A, then the second transaction updates row B, then the first transaction tries to update row B, sees the lock is held by the second transaction and waits, and finally, the second transaction tries to update row A but can’t because the lock for row A is held by the first transaction which is waiting for it to finish.]

Django Transactions & PostgreSQL Locks: Why Your Database is Slower Than You Think

Too Long; Didn't Read

Company Mentioned

Content Overview

Prior Post Summary - You Should NEVER put these in Transactions

TLDR Summary

Background — Locks & De-conflicting Concurrent Operations

DB Operations that Block Table Access

DB Operations that Block Row (Data) Updates Row Locks — Command Conflicts

Django: Printing the SQL that Migrations Run

Sources

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Django Transactions & PostgreSQL Locks: Why Your Database is Slower Than You Think

Too Long; Didn't Read

Company Mentioned

Content Overview

Prior Post Summary - You Should NEVER put these in Transactions

TLDR Summary

Background — Locks & De-conflicting Concurrent Operations

DB Operations that Block Table Access

DB Operations that Block Row (Data) Updates Row Locks — Command Conflicts

Django: Printing the SQL that Migrations Run

Sources

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics