Haki Benita

@hakibenita

All you need to know about prefetching in Django

For a better reading experience, check out this article on my website.

I have recently worked on a ticket ordering system for a conference. It was very important for the customer to see a table of orders including a column with a list of program names in each order:

The models looked (roughly) like this:

class Program(models.Model):
name = models.CharField(max_length=20)

class Price(models.Model):
program = models.ForeignKey(Program)
from_date = models.DateTimeField()
to_date = models.DateTimeField()

class Order(models.Model):
state = models.CharField(max_length=20)
items = models.ManyToManyField(Price)
  • Program is a session, lecture or a conference day.
  • Prices can change over time so we used a model called Price, modeled as a type 2 slowly changing dimension (SCD) that represents the price of a program at a certain time.
  • User’s can register to one or more programs. Each item in an Order is a program price at the time the order was made.

Before we start

Throughout this article we are going to monitor the queries executed by Django. To log the queries add the following to the LOGGING settings in settings.py:

LOGGING = {

...

'loggers': {
'django.db.backends': {
'level': 'DEBUG',
},

},
}
Let’s see you trying to find an image related to prefetching…

What’s the problem?

Let’s try to fetch the program names for a single order:

> o = Order.objects.filter(state=’completed’).first()
(0.002) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
> [p.program.name for p in o.items.all()]
(0.002) SELECT … FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) WHERE “orders_order_items”.”order_id” = 29; args=(29,)
(0.001) SELECT … FROM “events_program” WHERE “events_program”.”id” = 8; args=(8,)
[‘Day 1 Pass’]
  • To fetch completed orders we need one query.
  • To fetch the program names for each order we need two more queries.

I previously wrote about the N+1 problem and this is a classic case. If we need two queries for each order, the number of queries for 100 orders will be 1 + 100 * 2 = 201 queries — that’s a lot!

Let’s use Django to reduce the amount of queries:

> o.items.values_list(‘program__name’)
(0.003) SELECT “events_program”.”name” FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” = 29 LIMIT 21;
[‘Day 1 Pass’]

Great! Django performed a join between Price and Program and reduced the amount of queries to just one per order.

At this point instead of 201 queries we only need 101 queries for 100 orders.

Can we do better?

Why can’t we join?

The first question that should come to mind is “why can’t we join the tables?”

If we have a foreign key we can use select_related or use snake case like we did above to fetch the related fields in a single query.

For example, we fetched the program name for a list of prices in a single query using values_list(‘program__name’). We were able to do that because each price is related to exactly one program.

If the relation between two models is many to many we can’t do that. Every order has one or more related prices — if we join the two tables we get duplicate orders:

SELECT
o.id AS order_id,
p.id AS price_id
FROM
orders_order o
JOIN orders_order_items op ON (o.id = op.order_id)
JOIN events_price p ON (op.price_id = p.id)
ORDER BY
1,
2;
order_id | price_id 
----------+----------
45 | 38
45 | 56
70 | 38
70 | 50
70 | 77
71 | 38

Orders 70 and 45 have multiple items so they come up more than once in the result — Django can’t handle that.

Enter prefetch_related

Django has a nice, built-in way, of dealing with this problem called prefetch_related:

> o = Order.objects.filter(
state='completed',
).prefetch_related(
'items__program',
)
.first()
(0.002) SELECT ... FROM "orders_order" WHERE "orders_order"."state" = 'completed' ORDER BY "orders_order"."id" ASC LIMIT 1;
(0.001) SELECT ("orders_order_items"."order_id") AS "_prefetch_related_val_order_id", "events_price"... FROM "events_price" INNER JOIN "orders_order_items" ON ("events_price"."id" = "orders_order_items"."price_id") WHERE "orders_order_items"."order_id" IN (29);
(0.001) SELECT "events_program"."id", "events_program"."name" FROM "events_program" WHERE "events_program"."id" IN (8);

We told Django we intend to fetch items__program from the result set. In the second and third query we can see that Django fetched the through table orders_order_items and the relevant programs from events_program. The results of the prefetch are cached on the objects.

What happens when we try to fetch program names from the result?

> [p.program.name for p in o.items.all()]
[‘Day 1 Pass’]

No additional queries — exactly what we wanted!

When using prefetch, it’s important to work on the object and not on the query. Trying to fetch the program names with a query will produce the same outcome but will result in an additional query:

> o.items.values_list(‘program__name’)
(0.002) SELECT “events_program”.”name” FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” = 29 LIMIT 21;
[‘Day 1 Pass’]

At this point, fetching 100 orders requires only 3 queries.

Can we do even better?

Introducing Prefetch

At version 1.7 Django introduced a new Prefetch object that extends the capabilities of prefetch_related.

The new object allows the developer to override the query used by Django to prefetch the related objects.

In our previous example Django used two queries for the prefetch — one for the through table and one for the program table. What if we could tell Django to join these two together?

> prices_and_programs = Price.objects.select_related(‘program’)
> o = Order.objects.filter(
state='completed'
).prefetch_related(
Prefetch('items', queryset=prices_and_programs)
).first()
(0.001) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
(0.001) SELECT (“orders_order_items”.”order_id”) AS “_prefetch_related_val_order_id”, “events_price”…., “events_program”…. INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE “orders_order_items”.”order_id” IN (29);

We created a query that joins prices with programs. Than we told Django to use this query to prefetch the values. This is like telling Django that you intend to fetch both items and programs for each order.

Fetching program names for an order:

> [p.program.name for p in o.items.all()]
[‘Day 1 Pass’]

No additional queries — it worked!

Taking it to the next level

When we talked earlier about the models we mentioned that the prices are modeled as an SCD table. This means we might want to query only active prices at a certain date.

A price is active at a certain date if it’s between from_date and end_date:

> now = timezone.now()
> active_prices = Price.objects.filter(
from_date__lte=now,
to_date__gt=now,
)

Using the Prefetch object we can tell Django to store the prefetched objects in a new attribute of the result set:

> active_prices_and_programs = (
Price.objects.filter(
from_date__lte=now,
to_date__gt=now,
).select_related('program')
)
> o = Order.objects.filter(
state='completed'
).prefetch_related(
Prefetch(
'items',
queryset=active_prices_and_programs,
to_attr='active_prices',
),

).first()
(0.001) SELECT … FROM “orders_order” WHERE “orders_order”.”state” = ‘completed’ ORDER BY “orders_order”.”id” ASC LIMIT 1;
(0.001) SELECT … FROM “events_price” INNER JOIN “orders_order_items” ON (“events_price”.”id” = “orders_order_items”.”price_id”) INNER JOIN “events_program” ON (“events_price”.”program_id” = “events_program”.”id”) WHERE (“events_price”.”from_date” <= ‘2017–04–29T07:53:00.210537+00:00’::timestamptz AND “events_price”.”to_date” > ‘2017–04–29T07:53:00.210537+00:00’::timestamptz AND “orders_order_items”.”order_id” IN (29));

We can see in the log that Django performed only two queries, and the prefetch query now include the custom filter we defined.

To fetch the active prices we can use the new attribute defined in to_attr:

> [p.program.name for p in o.active_prices]
[‘Day 1 Pass’]

No additional query!

Final Words

Prefetch is a very powerful feature of Django ORM. I strongly recommend going over the documentation — you are bound to strike a gem.

If you like Django optimization tips check out my other stories on medium:

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

More by Haki Benita

Topics of interest

More Related Stories