Content Part 1 Summary - Deep Dive into Django Transactions TLDR, Summary How can transactions harm our apps? In Risky Out Next - How PSQL commands block each other Sources Part 1 Summary - Deep Dive Into Django Transactions In the prior post, we learned what happens when a function decorated with transaction.atomic is called and what happens in with transaction.atomic(). In summary: A connection to the DB is created or retrieved. A transaction is created, e.g., BEGIN; is sent to the DB (if the DB is PSQL or another SQL variant). From now until the function exists or the with statement concludes - whether with an error or successfully - we will be in the transaction, and DB will be waiting for us. This means that (at least for ACID DBs like PSQL) the transaction will hold locks on tables and rows it is changing. When a Django DB operation is executed, the DB grabs the corresponding lock and prevents any conflicting operations on that table or row. This can cause other connections to timeout due to waiting for the lock to be released. If the operation fails or if there is a runtime error, the DB rolls back the entire transaction and releases the locks. If the entire transaction succeeds, then all the changes are committed and available to other DB connections. The locks are released. Now, we will discuss what to put in a transaction and what to avoid. Due to the following transaction behaviors, certain operations are dangerous when placed in a transaction block. Transactions hold on to locks until the transaction fails or is complete. Transactions reverse all their DB operations when they fail. Transactions request locks as soon as a DB operation is executed. TLDR, Summary In Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Risky Slow queries - Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Operations on multiple tables - A transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations - another reason to keep migrations small and focused on one or a few tables at a time. Data Migrations - Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. (For PSQL and SQLite only*) Changing tables or columns - these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Out Irreversible operations - Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can't be undone. Blocking calls - Since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Keep on reading for alternatives and code examples. How Can Transactions Harm Our Apps? The primary risk is that transactions hold locks until they are done to prevent conflicting operations on tables and rows and allow the transaction to be reversible - this is essential to make the DB operations in the transaction atomic. This means a long-running transaction that operates on multiple tables or a few critical ones may cause outages by hogging locks and preventing reads/writes to those tables/rows. In essence, if we put the wrong code in a transaction block, we can effectively take down the DB by blocking all other connections to the DB from doing operations on it. The secondary risk is that transactions need to be reversible and are EXPECTED to be reversible. The DB automatically reverses every operation if an error occurs in the transaction. Therefore, the DB operations we put in the transaction should be reversible - for the most part, we don't need to worry about this with PSQL. But what about other codes? Oftentimes, when we change our data, we need to do follow-up tasks like firing events, updating services, sending push notifications, etc. These tasks are NOT reversible - we can't unsend an event, a request, or a notification. If an error occurs, the data changes are rolled back, but we've already sent the push notification saying, "Your report has been generated; click here to view it." What happens when the user or other services act on this false information? There will be a cascade of failures. Therefore, any code that can't be reversed should not be in a transaction, or we risk leaving our system in a bad state when an error occurs in the transaction. In Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Risky These are things that, depending on how much data is being processed and DB traffic, can cause outages due to holding locks for too long. All of these things are fine if they don’t take too long. Slow queries — Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Examples Queries on unindexed columns Queries on large tables Joins @transaction.atomic def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # do the transactional work with the large_orders queryset Alternative Do these queries before and outside the transaction. # fixed def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # Start the transaction block with transaction.atomic(): # do the transactional work with the large_orders queryset Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time. Examples See Changing table or column structure class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # too many operations operations = [ migrations.RemoveField("Author", "age"), migrations.AddField("Author", "rating", models.IntegerField(default=0)), migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Alternative Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. # fixed # 1st migration class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] operations = [ migrations.RemoveField("Author", "age"), ] # 2nd migration class Migration(migrations.Migration): dependencies = [("migrations", "0002_initial")] operations = [ migrations.AddField("Author", "rating", models.IntegerField(default=0)), ] # 3rd migration class Migration(migrations.Migration): dependencies = [("migrations", "0003_initial")] operations = [ migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Data Migrations — Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction with transaction.atomic(): # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") legacy = user.legacy_profile legacy.update_new_user_profile() Alternative Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. # fixed def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") with transaction.atomic(): legacy = user.legacy_profile legacy.update_new_user_profile() (For PSQL and SQLite only*) Changing tables or columns — these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Example Alter Column Alter Table Alternative Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django *Django only wraps transactions around migrations for PSQL and SQLite. class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # this migration, if on a large table, can slow down and block other operations # do it later operations = [ migrations.RemoveField("Users", "middle_name"), ] Out Irreversible operations — Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone. Examples Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) async_notification_service.send_email(user.email, "You can login now!") Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # the transaction is still in progress, so it can still be rolled back, it is not # committed until the transaction block is exited, so putting the notification here # is not a good idea - especially if the job starts immediately tries to read the data # this creates a race condition async_notification_service.send_email(user.email, "You can login now!") def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess transaction.on_commit(partial(async_notification_service.send_email, user.email, "You can login now!")) Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Examples Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) for file_data in user_files: # transaction waits for this upload and so do all other connections that need access to table/rows the transaction # uses url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. # not bad def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) # best fix from functools import partial def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # partials create a callable with the function and arguments # so that the function is called with the arguments when the transaction is committed # TODO: diff between partial and lambda here??? transaction.on_commit(partial(create_user_files, user_files, user)) def create_user_files(user_files, user): for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Next — How PSQL Commands Block Each Other In the next post, we will dive into PSQL and find out: How different commands lock tables and rows Which commands are the riskiest to execute Sources Atomic Django Migrations & Transactions On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction. You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example: It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython PSQL Partitions & Django Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work. One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future. Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning. PSQL Transactions Python: How Django Transactions Work Content Part 1 Summary - Deep Dive into Django Transactions TLDR, Summary How can transactions harm our apps? In Risky Out Next - How PSQL commands block each other Sources Part 1 Summary - Deep Dive into Django Transactions TLDR, Summary How can transactions harm our apps? In Risky Out Next - How PSQL commands block each other Sources Part 1 Summary - Deep Dive Into Django Transactions In the prior post , we learned what happens when a function decorated with transaction.atomic is called and what happens in with transaction.atomic() . In summary: post transaction.atomic with transaction.atomic() A connection to the DB is created or retrieved. A transaction is created, e.g., BEGIN; is sent to the DB (if the DB is PSQL or another SQL variant). From now until the function exists or the with statement concludes - whether with an error or successfully - we will be in the transaction, and DB will be waiting for us. This means that (at least for ACID DBs like PSQL) the transaction will hold locks on tables and rows it is changing. When a Django DB operation is executed, the DB grabs the corresponding lock and prevents any conflicting operations on that table or row. This can cause other connections to timeout due to waiting for the lock to be released. If the operation fails or if there is a runtime error, the DB rolls back the entire transaction and releases the locks. If the entire transaction succeeds, then all the changes are committed and available to other DB connections. The locks are released. A connection to the DB is created or retrieved. A transaction is created, e.g., BEGIN; is sent to the DB (if the DB is PSQL or another SQL variant). BEGIN; From now until the function exists or the with statement concludes - whether with an error or successfully - we will be in the transaction, and DB will be waiting for us. This means that (at least for ACID DBs like PSQL) the transaction will hold locks on tables and rows it is changing. When a Django DB operation is executed, the DB grabs the corresponding lock and prevents any conflicting operations on that table or row. This can cause other connections to timeout due to waiting for the lock to be released. If the operation fails or if there is a runtime error, the DB rolls back the entire transaction and releases the locks. If the entire transaction succeeds, then all the changes are committed and available to other DB connections. The locks are released. Now, we will discuss what to put in a transaction and what to avoid. Due to the following transaction behaviors, certain operations are dangerous when placed in a transaction block. Transactions hold on to locks until the transaction fails or is complete. Transactions reverse all their DB operations when they fail. Transactions request locks as soon as a DB operation is executed. Transactions hold on to locks until the transaction fails or is complete. Transactions reverse all their DB operations when they fail. Transactions request locks as soon as a DB operation is executed. TLDR, Summary In Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Reversible operations on the DB Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Related operations on the DB that are required Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Reversible and related business logic Risky Slow queries - Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Operations on multiple tables - A transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations - another reason to keep migrations small and focused on one or a few tables at a time. Data Migrations - Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. (For PSQL and SQLite only*) Changing tables or columns - these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Slow queries - Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Slow queries Operations on multiple tables - A transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations - another reason to keep migrations small and focused on one or a few tables at a time. Operations on multiple tables Data Migrations - Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. Data Migrations (For PSQL and SQLite only*) Changing tables or columns - these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Changing tables or columns Out Irreversible operations - Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can't be undone. Blocking calls - Since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Irreversible operations - Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can't be undone. Irreversible operations Blocking calls - Since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Blocking calls Keep on reading for alternatives and code examples. How Can Transactions Harm Our Apps? The primary risk is that transactions hold locks until they are done to prevent conflicting operations on tables and rows and allow the transaction to be reversible - this is essential to make the DB operations in the transaction atomic. This means a long-running transaction that operates on multiple tables or a few critical ones may cause outages by hogging locks and preventing reads/writes to those tables/rows. In essence, if we put the wrong code in a transaction block, we can effectively take down the DB by blocking all other connections to the DB from doing operations on it. The secondary risk is that transactions need to be reversible and are EXPECTED to be reversible. The DB automatically reverses every operation if an error occurs in the transaction. Therefore, the DB operations we put in the transaction should be reversible - for the most part, we don't need to worry about this with PSQL. But what about other codes? Oftentimes, when we change our data, we need to do follow-up tasks like firing events, updating services, sending push notifications, etc. These tasks are NOT reversible - we can't unsend an event, a request, or a notification. If an error occurs, the data changes are rolled back, but we've already sent the push notification saying, "Your report has been generated; click here to view it." What happens when the user or other services act on this false information? There will be a cascade of failures. Therefore, any code that can't be reversed should not be in a transaction, or we risk leaving our system in a bad state when an error occurs in the transaction. In Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Reversible operations on the DB - If the DB operation can't be reversed and the transaction fails, the DB will be left in a bad state because the transaction will attempt to roll back the changes automatically, but they are irreversible, so it will fail. Reversible operations on the DB Related operations on the DB that are required - We can't create a new bank account record without a user record, so we need to create both in the same transaction. Related operations on the DB that are required Reversible and related business logic - Calculating the total balance the customer has after creating a new account record with a deposit (even this can be moved out of a transaction with clever modeling and coordination). Reversible and related business logic Risky These are things that, depending on how much data is being processed and DB traffic, can cause outages due to holding locks for too long. All of these things are fine if they don’t take too long. Slow queries — Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Examples Queries on unindexed columns Queries on large tables Joins @transaction.atomic def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # do the transactional work with the large_orders queryset Alternative Do these queries before and outside the transaction. Slow queries — Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Examples Queries on unindexed columns Queries on large tables Joins @transaction.atomic def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # do the transactional work with the large_orders queryset Alternative Do these queries before and outside the transaction. Slow queries — Fetching data is usually fast, except for these three scenarios. These queries will slow down the transaction and extend the time it holds locks, which will have adverse effects on other users. Slow queries Examples Queries on unindexed columns Queries on large tables Joins Examples Queries on unindexed columns Queries on large tables Joins Queries on unindexed columns Queries on large tables Joins Queries on unindexed columns Queries on large tables Joins @transaction.atomic def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # do the transactional work with the large_orders queryset @transaction.atomic def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # do the transactional work with the large_orders queryset Alternative Do these queries before and outside the transaction. Alternative Do these queries before and outside the transaction. Do these queries before and outside the transaction. Do these queries before and outside the transaction. # fixed def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # Start the transaction block with transaction.atomic(): # do the transactional work with the large_orders queryset # fixed def process_large_order_report(start_date, end_date, min_order_value=1000): # Complex query with multiple joins and aggregations large_orders = Order.objects.filter( created_at__range=(start_date, end_date), total_amount__gte=min_order_value, status='completed' ).select_related( 'customer', 'shipping_address', 'billing_address' ).prefetch_related( 'items__product__category', 'items__product__supplier' ).annotate( item_count=Count('items'), total_weight=Sum('items__product__weight'), discount_percentage=F('discount_amount') * 100 / F('total_amount') ).filter( # Additional complex filtering Q(customer__user__is_active=True) & (Q(items__product__category__name='Electronics') | Q(items__product__category__name='Furniture')) & ~Q(shipping_address__country='US') ).order_by('-total_amount') # Start the transaction block with transaction.atomic(): # do the transactional work with the large_orders queryset Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time. Examples See Changing table or column structure class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # too many operations operations = [ migrations.RemoveField("Author", "age"), migrations.AddField("Author", "rating", models.IntegerField(default=0)), migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Alternative Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time. Examples See Changing table or column structure class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # too many operations operations = [ migrations.RemoveField("Author", "age"), migrations.AddField("Author", "rating", models.IntegerField(default=0)), migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Alternative Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. Operations on multiple tables — a transaction with operations on multiple tables can lock each table until it is done. This is especially prevalent in Django migrations — another reason to keep migrations small and focused on one or a few tables at a time. Operations on multiple tables Examples See Changing table or column structure Examples See Changing table or column structure See Changing table or column structure See Changing table or column structure class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # too many operations operations = [ migrations.RemoveField("Author", "age"), migrations.AddField("Author", "rating", models.IntegerField(default=0)), migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # too many operations operations = [ migrations.RemoveField("Author", "age"), migrations.AddField("Author", "rating", models.IntegerField(default=0)), migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Alternative Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. Alternative Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. Split the transaction into multiple smaller transactions and chain them in the oncommit callbacks. # fixed # 1st migration class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] operations = [ migrations.RemoveField("Author", "age"), ] # 2nd migration class Migration(migrations.Migration): dependencies = [("migrations", "0002_initial")] operations = [ migrations.AddField("Author", "rating", models.IntegerField(default=0)), ] # 3rd migration class Migration(migrations.Migration): dependencies = [("migrations", "0003_initial")] operations = [ migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] # fixed # 1st migration class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] operations = [ migrations.RemoveField("Author", "age"), ] # 2nd migration class Migration(migrations.Migration): dependencies = [("migrations", "0002_initial")] operations = [ migrations.AddField("Author", "rating", models.IntegerField(default=0)), ] # 3rd migration class Migration(migrations.Migration): dependencies = [("migrations", "0003_initial")] operations = [ migrations.AlterField("Book", "price", models.DecimalField(max_digits=5, decimal_places=2)), ] Data Migrations — Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction with transaction.atomic(): # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") legacy = user.legacy_profile legacy.update_new_user_profile() Alternative Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. Data Migrations — Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction with transaction.atomic(): # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") legacy = user.legacy_profile legacy.update_new_user_profile() Alternative Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. Data Migrations — Since transactions hold onto a lock from when the query is executed until the transaction fails or ends, a migration that operates on every row in the table will end up locking the entire table, either by preventing reads or writes on each row. Data Migrations def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction with transaction.atomic(): # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") legacy = user.legacy_profile legacy.update_new_user_profile() def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction with transaction.atomic(): # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") legacy = user.legacy_profile legacy.update_new_user_profile() Alternative Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. Alternative Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. Wrap the transaction around the individual operations, not the entire migration. By putting the updates to each row in the transaction, we only hold on to the locks for a short period of time. # fixed def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") with transaction.atomic(): legacy = user.legacy_profile legacy.update_new_user_profile() # fixed def migrate_user_profiles(): # Get all users with legacy profiles users_with_profiles = User.objects.filter( legacy_profile__isnull=False ).select_related('legacy_profile') # Process all users in a single transaction # Track progress total = users_with_profiles.count() print(f"Migrating {total} user profiles...") # Process each user for i, user in enumerate(users_with_profiles): if i % 100 == 0: print(f"Processed {i}/{total} profiles") with transaction.atomic(): legacy = user.legacy_profile legacy.update_new_user_profile() (For PSQL and SQLite only*) Changing tables or columns — these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Example Alter Column Alter Table Alternative Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django (For PSQL and SQLite only *) Changing tables or columns — these operations require the strictest form of locks and, therefore, will prevent reads/writes on the entire table. These operations are the most likely to cause an outage. Example Alter Column Alter Table Alternative Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django only Changing tables or columns Example Alter Column Alter Table Alternative Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django Example Alter Column Alter Table Alter Column Alter Table Alter Column Alter Table Alternative Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Run them later. These queries are necessary, BUT we don’t have to run them during business hours. The best policy here is to reduce the risk of an outage by determining how crucial the table is, estimating how long the migration may take, executing the migration when the DB has the least traffic, and preparing a rollback plan. Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django Shrink the amount of time the transaction takes. This can be done by partitioning the table and running the migration on individual partitions. PSQL Partitions & Django PSQL Partitions & Django *Django only wraps transactions around migrations for PSQL and SQLite. class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # this migration, if on a large table, can slow down and block other operations # do it later operations = [ migrations.RemoveField("Users", "middle_name"), ] class Migration(migrations.Migration): dependencies = [("migrations", "0001_initial")] # this migration, if on a large table, can slow down and block other operations # do it later operations = [ migrations.RemoveField("Users", "middle_name"), ] Out Irreversible operations — Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone. Examples Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) async_notification_service.send_email(user.email, "You can login now!") Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. Irreversible operations — Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone. Examples Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) async_notification_service.send_email(user.email, "You can login now!") Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. Irreversible operations — Everything in the transaction should be reversible if the transaction is rolled back; if we put an API call in the transaction, it can’t be undone. Irreversible operations Examples Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. Examples Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. Adding an event to a queue — trigger other events Sending an API Call — status updates, trigger jobs, etc. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) async_notification_service.send_email(user.email, "You can login now!") Account.objects.create(user=user, balance=0) # rest of user creation proccess def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) async_notification_service.send_email(user.email, "You can login now!") Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. Alternatives Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. Do critical business logic that needs to happen in the oncommit callback of the transaction — The oncommit callback is ALWAYS called after a transaction is successful, and all the updates from the transaction are available in the oncommit callback. We can also make the operation reversible, i.e., create a way to delete events or send an undo API call — This is a nontrivial task for the team that owns the event queue or the API. I don’t recommend it. def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # the transaction is still in progress, so it can still be rolled back, it is not # committed until the transaction block is exited, so putting the notification here # is not a good idea - especially if the job starts immediately tries to read the data # this creates a race condition async_notification_service.send_email(user.email, "You can login now!") def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess transaction.on_commit(partial(async_notification_service.send_email, user.email, "You can login now!")) def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # the transaction is still in progress, so it can still be rolled back, it is not # committed until the transaction block is exited, so putting the notification here # is not a good idea - especially if the job starts immediately tries to read the data # this creates a race condition async_notification_service.send_email(user.email, "You can login now!") def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess transaction.on_commit(partial(async_notification_service.send_email, user.email, "You can login now!")) Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Examples Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) for file_data in user_files: # transaction waits for this upload and so do all other connections that need access to table/rows the transaction # uses url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Examples Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) for file_data in user_files: # transaction waits for this upload and so do all other connections that need access to table/rows the transaction # uses url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. Blocking calls — since transactions prevent all other queries from operating on the tables/rows the transaction is changing, any code that increases the duration of the transaction will cause the DB to be locked, causing timeouts and unresponsiveness in the apps dependent on the DB. Blocking calls Examples Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). Examples Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). Network calls — Sending requests to APIs to get data to put into the transaction. Logging goes under this — Depending on the logger we use, the library may send the logs when we are in the transaction, or if we are using datadog, it will have another process save the logs, but we still pay the price of storing the logs in memory until the batch job saves them to a file. Disk operations — Loading a CSV to insert into a table or exporting a table to a CSV. CPU heavy tasks — Matrix multiplication or any heavy math/data transformation will block the CPU and all other operations — Python’s Global Interpreter Lock forces every thread to use the CPU one at a time, and Django is single-processed (every operation in the transaction happens serially). def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) for file_data in user_files: # transaction waits for this upload and so do all other connections that need access to table/rows the transaction # uses url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Account.objects.create(user=user, balance=0) # rest of user creation proccess def transaction(user_data, user_files): with transaction.atomic(): user = User.objects.create(**user_data) for file_data in user_files: # transaction waits for this upload and so do all other connections that need access to table/rows the transaction # uses url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Account.objects.create(user=user, balance=0) # rest of user creation proccess Alternatives Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. Alternatives Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. Get the data ready before the transaction — Load disk/network data beforehand. Once the data is ready, execute the transaction with it. Use placeholder data — Create a set of apparent placeholder data (no way to mistake it for production data) that is easily retrievable and identifiable. Load it before the transaction and use it. Generate data — If there are uniqueness constraints on the fields that prevent placeholder data from being used. This is risky — It will lead to unpredictable failures due to the nature of pseudo-random algorithms — so use a good random algorithm with a good seed to avoid surprise failures. Remove or modify constraints — If the constraints are not allowing us to safely create our records, there is a problem with our schema. Make the constraints dependent on a state column that specifies when the record is complete and should be monitored. # not bad def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) # best fix from functools import partial def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # partials create a callable with the function and arguments # so that the function is called with the arguments when the transaction is committed # TODO: diff between partial and lambda here??? transaction.on_commit(partial(create_user_files, user_files, user)) def create_user_files(user_files, user): for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) # not bad def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) # best fix from functools import partial def transaction(user_data, user_files): user = None with transaction.atomic(): user = User.objects.create(**user_data) Account.objects.create(user=user, balance=0) # rest of user creation proccess # partials create a callable with the function and arguments # so that the function is called with the arguments when the transaction is committed # TODO: diff between partial and lambda here??? transaction.on_commit(partial(create_user_files, user_files, user)) def create_user_files(user_files, user): for file_data in user_files: url = Cloudinary.upload_file(file_data['data']) Files.objects.create(**file_data['meta_data'], user=user, url=url) Next — How PSQL Commands Block Each Other In the next post, we will dive into PSQL and find out: How different commands lock tables and rows Which commands are the riskiest to execute How different commands lock tables and rows Which commands are the riskiest to execute Sources Atomic Django Migrations & Transactions On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction. You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example: It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython PSQL Partitions & Django Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work. One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future. Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning. PSQL Transactions Python: How Django Transactions Work Atomic Atomic Django Migrations & Transactions On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction. You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example: It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython Django Migrations & Transactions On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction. You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example: It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython On databases that support DDL transactions (SQLite and PostgreSQL), all migration operations will run inside a single transaction by default. In contrast, if a database doesn’t support DDL transactions (e.g. MySQL, Oracle) then all operations will run without a transaction. You can prevent a migration from running in a transaction by setting the atomic attribute to False. For example: It’s also possible to execute parts of the migration inside a transaction using atomic() or by passing atomic=True to RunPython PSQL Partitions & Django Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work. One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future. Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning. PSQL Partitions & Django Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work. One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future. Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning. Django’s ORM doesn’t have built-in support for partitioned tables, so if you want to use partitions in your application, it’s going to take a little extra work. One way to use partitions is to roll your own migrations that run raw SQL. This will work, but it means you’re going to have to manually manage the migrations for all changes you make to the table in the future. Another option is to use a package called django-postgres-extra. Django-postgres-extra offers support for several PostgreSQL features that are not built into Django’s ORM, for example, support for TRUNCATE TABLE and table partitioning. PSQL Transactions PSQL Transactions Python: How Django Transactions Work Python: How Django Transactions Work