Performance Testing of Python ORMs Based on the TPC-C benchmark

Written by evgeniy | Published 2020/04/15
Tech Story Tags: python | python-programming | orm | sql | benchmark | testing | python3 | coding

TLDR Performance Testing of Python ORM benchmarks offer a measure of clarity but leave considerable room for improvement. I examine and extend the qualitative ORM benchmark to develop a stronger metric. I decided to test two Python ORMs (SQLAlchemy and PonyORM) using the TPC-C testing method adapted for this task. The purpose of the test is to evaluate the speed of transaction processing when several virtual users access the database at the same time. The main differences are the test runs with fewer virtual users than in the original test.via the TL;DR App

Object-relational mappers (ORMs) are often used in Python programming when one needs to develop an application that works with databases. Examples of Python ORMs are SQLAlchemy, Peewee, Pony-ORM and Django. When choosing an ORM,  performance plays a crucial role. But how are these toolsets compared? ORM performance benchmarks offer a measure of clarity but leave considerable room for improvement. I examine and extend the qualitative ORM benchmark to develop a stronger metric.
The qualitative Python ORM benchmark, Tortoise ORM, (link to the repository) analyzes the speed of six ORMs for eleven types of SQL queries. In general, the Tortoise benchmark makes it possible to evaluate the speed of query execution for the various  ORMs. However there is a flaw with this approach to testing: most ORMs are selected for use in web applications. In such contexts multiple users send all manner of queries to a database often at the same time. Because no benchmark measurement tools evaluated could rate performance of Python ORMs in a scenario like this, I decided to write my own comparing PonyORM and SQLAlchemy. As a basis, I took the TPC-C benchmark.
Since 1988, TPC has been developing tests in the field of data processing. They have long become an industry standard and are used by almost all vendors of equipment on various samples of hardware and software. The main feature of these tests is that they are focused on testing under enormous load in conditions as close as possible to real ones.
TPC-C simulates a warehouse network. It includes a combination of five simultaneously executed transactions of various types and complexity.The purpose of the test is to evaluate the speed of transaction processing when several virtual users simultaneously access the database.
I decided to test two Python ORMs (SQLALchemy and PonyORM) using the TPC-C testing method adapted for this task. The purpose of the test is to evaluate the speed of transaction processing when several virtual users access the database at the same time.

Test description

Step one is to create and populate the database of a warehouse network. The database schema looks like this:

The database consists of eight relations:
  1. Warehouse 
  2. District
  3. Order 
  4. OrderLine
  5. Stock
  6. Item
  7. Customer
  8. History
Databases for Pony and SQLAlchemy are identical. Only primary and foreign keys are indexed. Pony made those indexes automatically. In SQLAlchemy I made it manually.
During the test, different types of transactions are sent to the database from several virtual users. Each transaction consists of several requests. In total, there are five types of transactions that are submitted for processing with different probability of occurrence:
Transactions:
  1. new_order - 45%
  2. payment - 43%
  3. order_status - 4%
  4. delivery - 4%
  5. stock_level - 4%
The probability of occurrence of transactions is the same as in the original TPC-C test.
However, bearing in mind that toriginal TPC-C test is conducted on servers with 64+ GB of RAM (requiring a large number of processors and huge disk space), due to technical limitations and and the fact that I wanted to test the performance of ORMs and not the resistance of hardware to enormous load, this test is somewhat simplified. 
The main differences from the TPC-C test are as follows:
Main differences: 
  1. The test runs with fewer virtual users than in the original test
  2. My test has less table entries. For example: the number of entries in the relation “Stock” in the original test is calculated using the formula 100,000 * W, where W is the number of warehouses. In this test it’s 100 * W.
  3. In the TPC-C, some of the transactions have several options for querying data from the database. For example, in a Payment transaction, with one probability, the client will be requested from the database by ID, with another by last name and first name. At the moment, my test only makes calls by ID.
  4. My test database is one table less than the TPC-C. In the TPC-C test, when an order is created, it is added to both the Order table and the NewOrder table. After delivery of the order, it is then deleted from the NewOrder table. This speeds up work  when a huge number of transactions per minute is applied; but since I have fewer users accessing the database, this is unnecessary. Instead, in the Order table, I added the bool attribute “is_o_delivered”, which will be False, until the order is delivered. 
Next, I will briefly describe what each transaction does.

Transactions

New Order

  1. Passing two arguments to the transaction: warehouse ID and customer ID
  2. Selecting the warehouse and the customer from the database using the passed IDs
  3. Selecting one randomly chosen district of the warehouse from  database
  4. Generating a random number that indicates the number of order lines.
  5. Creating an Order object
  6. Creating OrderLine objects in loop. In each iteration of the loop selecting one random item from the Item table 
  7.  Changing Stock for each item in the order

Payment

  1. Passing two arguments to the transaction: warehouse ID and customer ID
  2. Selecting the warehouse and the customer from the database by the passed IDs
  3. Selecting one randomly chosen district of the warehouse from  database
  4.  Generating a random number that indicates the payment amount
     
  5. Increasing balances of Warehouse and District by the payment amount
  6. Decreasing balance of  customer by the payment amount
  7. Incrementing customer payment counter
  8. Increasing sum of customers payment amounts by the payment amount
     
  9. Creating History object

Order Status

  1. Passing customer ID as an argument of the transaction
  2. Selecting customer by ID and last order of this customer
     
  3. Taking order status and order lines from order.

Delivery

  1. Passing warehouse ID as an argument of the transaction
  2. Selecting warehouse and all of its districts from the database
  3. For every district selecting the oldest not delivered order.
  4. For every order changing delivery status to True
     
  5. For every customer of orders incrementing order count 

Stock Level

  1. Passing warehouse ID as an argument of the transaction
  2. Selecting warehouse from the database by ID
  3. Selecting last 20 orders of this warehouse
  4. For every item in orders taking stock level of item

Test results

Two ORMs are participating in the test:
  1. SQLAlchemy (blue line on graph) 
  2. PonyORM (orange line on graph)
Below are the results of running the test for 10 minutes with 2 parallel processes accessing the database. Processes are launched using the “multiprocessing” module.
X axis - time in minutes
Y axis - number of completed transactions
As DBMS I use PostgreSQL

All transactions

First, I ran the test with all five transactions, as expected in the TPC-C test. As a result of this test, Pony was about twice as fast.
Average speed:
  • Pony -  2543 transactions/minute
  • SQLAlchemy - 1353.4 transactions/minute
After that, I decided to evaluate the performance of ORMs on each of the five transactions separately. Below are the results for each individual transaction.

New Order

Average speed:
  • Pony - 3349.2 transactions/minute
  • SQLAlchemy - 1415.3 transactions/minute

Payment

Average speed:
  • Pony - 7175.3 transactions/minute
  • SQLAlchemy -  4110.6 transactions/minute

Order Status 

Average speed:
  • Pony - 16645.6 transactions/minute
  • SQLAlchemy - 4820.8 transactions/minute

Delivery

Average speed:
  • SQLAlchemy - 716.9 transactions/minute
  • Pony - 323.5 transactions/minute

Stock Level

Average speed:
  • Pony - 677.3 transactions/minute
  • SQLAlchemy - 167.9 transactions/minute

Analysis of test results

After receiving the results, I analyzed why they are like this and came to the following conclusions:
In 4 out of 5 transactions, PonyORM turned out to be faster, because, when generating SQL code, PonyORM remembers the result of translating Python expressions into SQL. Because of this Pony does not translate the expression again when the query is repeated, while SQLAlchemy is forced to generate SQL code every time when needed to execute a query.

Example of this kind of query in Pony:
stocks = select(stock for stock in Stock
    if stock.warehouse == whouse
    and stock.item in items).order_by(Stock.id).for_update()
Generated SQL:
SELECT "stock"."id", "stock"."warehouse", "stock"."item", 
    "stock"."quantity", "stock"."ytd", "stock"."order_cnt",
    "stock"."remote_cnt", "stock"."data"
FROM "stock" "stock"
WHERE "stock"."warehouse" = %(p1)s
    AND "stock"."item" IN (%(p2)s, %(p3)s)
ORDER BY "stock"."id"
FOR UPDATE
{'p1':7, 'p2':7, 'p3':37}
SQLAlchemy:
stocks = session.query(Stock).filter(
    Stock.warehouse == whouse, Stock.item.in_(
    items)).order_by(text("id")).with_for_update()
Generated SQL:
SELECT stock.id AS stock_id, stock.warehouse_id AS stock_warehouse_id,
    stock.item_id AS stock_item_id, stock.quantity AS stock_quantity,
    stock.ytd AS stock_ytd, stock.order_cnt AS stock_order_cnt,
    stock.remote_cnt AS stock_remote_cnt, stock.data AS stock_data
FROM stock
WHERE stock.warehouse_id = %(warehouse_id_1)s AND stock.item_id IN 
    (%(item_id_1)s, %(item_id_2)s) ORDER BY id FOR UPDATE
{'warehouse_id_1': 7, 'item_id_1': 53, 'item_id_2': 54}
But, Apparently, SQLAlchemy performs delivery type transactions faster because it can combine several UPDATE operations applied to different objects into a single command.
Example:
INFO:sqlalchemy.engine.base.Engine:UPDATE order_line SET delivery_d=%
    (delivery_d)s WHERE order_line.id = %(order_line_id)s
INFO:sqlalchemy.engine.base.Engine:(   
    {'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922281),    
    'order_line_id': 316},
    {'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922272),    
    'order_line_id': 317},
    {'delivery_d': datetime.datetime(2020, 4, 6, 14, 33, 6, 922261))
Pony in situations like this sends separate query for each update.

Conclusion

Based on the results of this test, I can say that Pony selects from the database much faster. On the other hand SQLAlchemy in some cases can produce queries of Update type with significantly higher speed.
In the future, I plan to test other ORMs (Peewee, Django) this way.

Links

Code of the test: repository

Published by HackerNoon on 2020/04/15