Database management is one of the most crucial aspects of backend development. A properly optimized database can help to reduce the response time and hence lead to a better user experience.
In this article, we will be discussing the ways to optimize the database for speed in Django applications. Although, we won’t be diving deep into each concept individually, hence, please refer to the official Django documentation for full details.
Understanding Queries in Django
Understanding querysets in Django is the key to optimization, hence, please remember the following:
- Querysets are lazy, which means that no corresponding database request is made until you perform certain actions on the queryset, such as iterating over it.
- Always limit the result of a database query by specifying the number of values to be returned.
- In Django, querysets can be evaluated by iteration, slicing, caching, and python methods such as
len()
,count()
etc. Make sure that you make the best use of them. - Django querysets are cached, such that if you re-use the same queryset, multiple database requests won’t be made, thus minimizing database access.
- Retrieve everything that you would need at once, but ensure that you are retrieving only what you need.
Query Optimization in Django
Database Indexing
Database indexing is a technique for speeding up queries while retrieving records from a database. As the application increases in size, it may slow down, and users will notice since it will take substantially longer to obtain the required data. Thus, indexing is a non-negotiable operation when working with large databases that generate a vast volume of data.
Indexing is a method of sorting a large number of data based on various fields. When you create an index on a field in a database, you create another data structure that contains the field value as well as a pointer to the record to which it is related. This index structure is then sorted, making Binary Searches possible.
For example, here is a Django model named Sale:
# models.py
from django.db import models
class Sale(models.Model):
sold_at = models.DateTimeField(
auto_now_add=True,
)
charged_amount = models.PositiveIntegerField()
Database Indexing can be added to a particular field while defining a Django model as follows:
# models.py
from django.db import models
class Sale(models.Model):
sold_at = models.DateTimeField(
auto_now_add=True,
db_index=True, #DB Indexing
)
charged_amount = models.PositiveIntegerField()
If you run the migrations for this model, Django will create a database index on the table Sales, and it will be locked until the index is completed. On a local development setup, with a small amount of data and very few connections, this migration might feel instantaneous, but when we talk about the production environment, there are large datasets with many concurrent connections that can cause downtime as obtaining a lock and creating a database index can take a lot of time.
You can also create a single index for two fields as shown below:
# models.py
from django.db import models
class Sale(models.Model):
sold_at = models.DateTimeField(
auto_now_add=True,
db_index=True, #DB Indexing
)
charged_amount = models.PositiveIntegerField()
class Meta:
indexes = [
["sold_at", "charged_amount"]]
Database Caching
Database caching is one of the best approaches to getting a fast response from a database. It ensures that fewer calls are made to the database, preventing overload. A standard caching operation follows the below structure:
Django provides a caching mechanism that can use different caching backends like Memcached and Redis that let you avoid running the same queries multiple times.
Memcached is an open-source in-memory system that guarantees to provide cached results in less than a millisecond. It is simple to set up and scale. Redis, on the other hand, is an open-source caching solution with similar characteristics to Memcached. Most offline apps employ previously cached data, which implies that the majority of queries never ever reach the database.
User sessions should be saved in a cache in your Django application, and because Redis maintains data on disk, all sessions for logged-in users originate from the cache rather than the database.
To use Memcache with Django, we need to define the following:
- BACKEND: To define the cache backend to use.
- LOCATION:
ip:port
values whereip
is the IP address of the Memcached daemon andport
is the port on which Memcached is running, or the URL pointing to your Redis instance, using the appropriate scheme.
To enable database caching with Memcached, install pymemcache
using pip using the following command:
pip install pymemcache
Then, you can configure the cache settings in your settings.py
as follows:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
'LOCATION': '127.0.0.1:11211',
}
}
In the above example, Memcached is running on localhost (127.0.0.1) port 11211, using the pymemcache
binding:
Similarly, to enable database caching using Redis, install Redis using pip using the command below:
pip install redis
Then configure the cache settings in your settings.py
by adding the following code:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.redis.RedisCache',
'LOCATION': 'redis://127.0.0.1:6379',
}
}
Memcached and Redis can also be used to store user authentication tokens. Because every person who logs in must supply a token, all of these procedures can result in significant database overhead. Using cached tokens will result in considerably quicker database access.
Using Iterator When Possible
A queryset in Django, typically, caches its result when evaluation happens and for any further operation with that queryset, it first checks if there are cached results. However, when you use iterator()
, it doesn’t check for cache and reads results directly from the database, neither saves the results to the queryset.
Now, you must be wondering how this is helpful. Consider a queryset that returns a large number of objects with lots of memory to cache but has to be used only once, in such case, you should use an iterator()
.
For instance, in the following code, all records will be fetched from the database and then loaded in the memory and then we’ll iterate through each one:
queryset = Product.objects.all()
for each in queryset:
do_something(each)
Whereas if we use iterator()
, Django will hold the SQL connection open and read each record, and call do_something()
before reading the next record:
queryset = Product.objects.all().iterator()
for each in queryset:
do_something(each)
Using a Persistence Database Connection
Django creates a new database connection for each request and closes it after the request is complete. This behavior is caused by CONN_MAX_AGE
, which has a default value of 0. But how long should it be set to? That is determined by the volume of traffic on your site; the higher the volume, the more seconds are required to maintain the connection. It is usually recommended to start with a low number, such as 60.
You need to wrap your extra options in OPTIONS
, as detailed in the
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'dashboard',
'USER': 'root',
'PASSWORD': 'root',
'HOST': '127.0.0.1',
'PORT': '3306',
'OPTIONS': {
'CONN_MAX_AGE': '60',
}
}
}
Using Query Expressions
Query expressions define a value or a computation that may be utilized in an update, create, filter, order by, annotation, or aggregate operation.
A commonly used built-in query expression in Django is the F expression. Let’s see how it works and can be useful.
Note: These expressions are defined in django.db.models.expressions
and django.db.models.aggregates
, but for convenience, they’re available and usually imported from django.db.models
.
F Expression
In the Django Queryset API, F()
expressions are used to refer to the model field values directly. It allows you to refer to model field values and conduct database actions on them without having to fetch them from the database and into Python memory. Instead, Django employs the F()
object to produce a SQL phrase that defines the needed database activity.
For example, let’s say that we want to increase the price of all products by 20%, then the code would look something like this:
products = Product.objects.all()
for product in products:
product.price *= 1.2
product.save()
However, if we use F()
, we can do this in a single query as follows:
from django.db.models import F
Product.objects.update(price=F('price') * 1.2)
Using select_related()
and prefetch_related()
Django provides select_related()
and prefetch_related()
arguments for optimizing your querysets by minimizing the number of database requests.
According to the official Django Documentation:
select_related()
"follows" foreign-key relationships, selecting additional related-object data when it executes its query.
prefetch_related()
does a separate lookup for each relationship, and does the "joining" in Python.
select_related()
We use select_related()
when the item to be selected is a single object which means forward ForeignKey
, OneToOne
, and backward OneToOne
field.
You may use select_related()
to create a single query that returns all the related objects for a single instance for one-many and one-to-one connections. When the query is performed, select_related()
retrieves any extra related-object data from foreign-key relationships.
select_related()
works by generating a SQL join and includes the related object's columns in the SELECT
expression. As a result, select_related()
returns related items in the same database query.
Though select_related()
produces a more sophisticated query, the data acquired is cached, thus handling of the data obtained does not necessitate any extra database requests.
The syntax simply looks like this:
queryset = Tweet.objects.select_related('owner').all()
prefetch_related()
In contrast, prefetch_related()
is utilized for many-to-many and many-to-one connections. It produces a single query that includes all of the models and filters are given in the query.
The syntax simply looks like this:
Book.objects.prefetch_related('author').get(id=1).author.first_name
NOTE: ManyToMany relationships should not be handled using SQL because many performance issues could appear when dealing with large tables. That's why the prefetch_related method joins tables inside Python avoiding making large SQL joins.
Read about the difference between select_related()
and prefetch_related()
in detail here.
Using bulk_create()
and bulk_update()
bulk_create()
is a method that creates the provided list of objects into the database with one query. Similarly, bulk_update()
is a method that updates the given fields on the provided model instances with one query.
For example, if we have a posts model as shown below:
class Post(models.Model):
title = models.CharField(max_length=300, unique=True)
time = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.title
Now, let’s say that we want to add multiple data records to this model, then we can use bulk_create()
like this:
#articles
articles = [Post(title="Hello python"), Post(title="Hello django"), Post(title="Hello bulk")]
#insert data
Post.objects.bulk_create(articles)
And the output would look like this:
>>> Post.objects.all()
<QuerySet [<Post: Hello python>, <Post: Hello django>, <Post: Hello bulk>]>
And if we want to update the data, then we can use bulk_update()
like this:
update_queries = []
a = Post.objects.get(id=14)
b = Post.objects.get(id=15)
c = Post.objects.get(id=16)
#set update value
a.title="Hello python updated"
b.title="Hello django updated"
c.title="Hello bulk updated"
#append
update_queries.extend((a, b, c))
Post.objects.bulk_update(update_queries, ['title'])
And the output would look like this:
>>> Post.objects.all()
<QuerySet [<Post: Hello python updated>, <Post: Hello django updated>, <Post: Hello bulk updated>]>
Conclusion
In this article, we covered the tips to optimize the performance of the database, reduce bottlenecks and save resources in a Django application.
I hope you found it helpful. Keep reading!