Listen to this story
Object-relational mappers (ORMs) are often used in Python programming when one needs to develop an application that works with databases. Examples of Python ORMs are SQLAlchemy, Peewee, Pony-ORM and Django. When choosing an ORM, performance plays a crucial role. But how are these toolsets compared? ORM performance benchmarks offer a measure of clarity but leave considerable room for improvement. I examine and extend the qualitative ORM benchmark to develop a stronger metric.
The qualitative Python ORM benchmark, Tortoise ORM, (link to the repository) analyzes the speed of six ORMs for eleven types of SQL queries. In general, the Tortoise benchmark makes it possible to evaluate the speed of query execution for the various ORMs. However there is a flaw with this approach to testing: most ORMs are selected for use in web applications. In such contexts multiple users send all manner of queries to a database often at the same time. Because no benchmark measurement tools evaluated could rate performance of Python ORMs in a scenario like this, I decided to write my own comparing PonyORM and SQLAlchemy. As a basis, I took the TPC-C benchmark.
Since 1988, TPC has been developing tests in the field of data processing. They have long become an industry standard and are used by almost all vendors of equipment on various samples of hardware and software. The main feature of these tests is that they are focused on testing under enormous load in conditions as close as possible to real ones.
TPC-C simulates a warehouse network. It includes a combination of five simultaneously executed transactions of various types and complexity.The purpose of the test is to evaluate the speed of transaction processing when several virtual users simultaneously access the database.
I decided to test two Python ORMs (SQLALchemy and PonyORM) using the TPC-C testing method adapted for this task. The purpose of the test is to evaluate the speed of transaction processing when several virtual users access the database at the same time.
Step one is to create and populate the database of a warehouse network. The database schema looks like this:
The database consists of eight relations:
Databases for Pony and SQLAlchemy are identical. Only primary and foreign keys are indexed. Pony made those indexes automatically. In SQLAlchemy I made it manually.
During the test, different types of transactions are sent to the database from several virtual users. Each transaction consists of several requests. In total, there are five types of transactions that are submitted for processing with different probability of occurrence:
Transactions:
The probability of occurrence of transactions is the same as in the original TPC-C test.
However, bearing in mind that toriginal TPC-C test is conducted on servers with 64+ GB of RAM (requiring a large number of processors and huge disk space), due to technical limitations and and the fact that I wanted to test the performance of ORMs and not the resistance of hardware to enormous load, this test is somewhat simplified.
The main differences from the TPC-C test are as follows:
Main differences:
Next, I will briefly describe what each transaction does.
Two ORMs are participating in the test:
Below are the results of running the test for 10 minutes with 2 parallel processes accessing the database. Processes are launched using the “multiprocessing” module.
X axis - time in minutes
Y axis - number of completed transactions
As DBMS I use PostgreSQL
First, I ran the test with all five transactions, as expected in the TPC-C test. As a result of this test, Pony was about twice as fast.
Average speed:
After that, I decided to evaluate the performance of ORMs on each of the five transactions separately. Below are the results for each individual transaction.
Average speed:
Average speed:
Average speed:
Average speed:
Average speed:
After receiving the results, I analyzed why they are like this and came to the following conclusions:
In 4 out of 5 transactions, PonyORM turned out to be faster, because, when generating SQL code, PonyORM remembers the result of translating Python expressions into SQL. Because of this Pony does not translate the expression again when the query is repeated, while SQLAlchemy is forced to generate SQL code every time when needed to execute a query.
Example of this kind of query in Pony:
stocks = select(stock for stock in Stock
if stock.warehouse == whouse
and stock.item in items).order_by(Stock.id).for_update()
Generated SQL:
SELECT "stock"."id", "stock"."warehouse", "stock"."item",
"stock"."quantity", "stock"."ytd", "stock"."order_cnt",
"stock"."remote_cnt", "stock"."data"
FROM "stock" "stock"
WHERE "stock"."warehouse" = %(p1)s
AND "stock"."item" IN (%(p2)s, %(p3)s)
ORDER BY "stock"."id"
FOR UPDATE
{'p1':7, 'p2':7, 'p3':37}
SQLAlchemy:
stocks = session.query(Stock).filter(
Stock.warehouse == whouse, Stock.item.in_(
items)).order_by(text("id")).with_for_update()
Generated SQL:
SELECT stock.id AS stock_id, stock.warehouse_id AS stock_warehouse_id,
stock.item_id AS stock_item_id, stock.quantity AS stock_quantity,
stock.ytd AS stock_ytd, stock.order_cnt AS stock_order_cnt,
stock.remote_cnt AS stock_remote_cnt, stock.data AS stock_data
FROM stock
WHERE stock.warehouse_id = %(warehouse_id_1)s AND stock.item_id IN
(%(item_id_1)s, %(item_id_2)s) ORDER BY id FOR UPDATE
{'warehouse_id_1': 7, 'item_id_1': 53, 'item_id_2': 54}
But, Apparently, SQLAlchemy performs delivery type transactions faster because it can combine several UPDATE operations applied to different objects into a single command.
Example:
INFO:sqlalchemy.engine.base.Engine:UPDATE order_line SET delivery_d=%
(delivery_d)s WHERE order_line.id = %(order_line_id)s
INFO:sqlalchemy.engine.base.Engine:(
{'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922281),
'order_line_id': 316},
{'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922272),
'order_line_id': 317},
{'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922261))
Pony in situations like this sends separate query for each update.
Based on the results of this test, I can say that Pony selects from the database much faster. On the other hand SQLAlchemy in some cases can produce queries of Update type with significantly higher speed.
In the future, I plan to test other ORMs (Peewee, Django) this way.
Code of the test: repository
SQLAlchemy: documentation, community
Pony: documentation, community