Impact on Insert Performance — Primary Keys in MySQL

Written by KovidRathee | Published 2019/02/14
Tech Story Tags: mysql | innodb | database-administration | database | performance-testing

TLDRvia the TL;DR App

The importance of Primary Keys in MySQL

Note: I have used MySQL and it’s storage engine InnoDB interchangeably.

A couple of days ago, someone at work asked me whether inserts are faster in a table when there’s a primary key present in the table. My immediate answer was that they were faster. I had some idea about this from experience and previous reading but didn’t remember the reason why it was so.

For a business user unaware about InnoDB internals or InnoDB’s existence, it might seem perfectly intuitive to assume that inserts will be faster without a primary key because a primary key is an additional constraint on the table, i.e., MySQL has to check for uniqueness, nullability. When it comes to InnoDB, that is not the case. Also, a table without a primary key will occupy more space than a table with a primary key. See how.

Background

MySQL has various mutexes for preventing multiple threads accessing the same resource at the same time. There is a mutex called dict_sys which locks the data dictionary, i.e., the information_schema. Ovais Tariq has written a very insightful blogpost on this.

When a primary key is not defined on a table, MySQL generates row-ids for every row that is inserted in that table. This, because InnoDB is an index-organized storage engine, it mandates a primary key to store the contents of the table. If you don’t define a primary, MySQL will do this itself by generating a 6 byte auto increment primary key which is hidden. It is referred by the name DB_ROW_ID in the documentation. This only has to exist if a table doesn’t have a primary key.

Now, when this row-id is generated on every new insert, MySQL acquires dict_sys mutex — and it becomes a system-wide contention point because what it is acquiring a mutex on is the data dictionary — which might be in use by a couple of other DDL and DML operations. This would mean that insert performance on a table might be affected by another operation taking place on another table. This contention would otherwise not exist if a primary key is defined on the table.

To check how a similar insert workload behaves on a table with a defined primary key and another table without a defined primary key, I performed a simple mysqlslap insert performance test on my local. I have attached the scripts to perform these tests. Note: Doing this with _rand()_ is probably not ideal.

mysqlslap — is a sysbench like tool for MySQL that can be used for performance and load testing of individual tables — Read more about it here.

Result

The insert performance in this case was surprisingly not very different in both the cases — checked for 10 million records. This was with the concurrency of 4 threads

  1. With primary key, 1284.848 seconds
  2. Without primary key, 1282.350 seconds

In an isolated environment and with only 4 thread, there were not many dict_sys mutexes on information_schema, hence, there was no significant difference between the insert performance. Although, with the same table configuration, these two tables occupy significantly different amount of space on disk — which comes from the fact that MySQL had to create row ids for every record that was being inserted in the table without a defined primary key.

  1. With primary key, 900 MB
  2. Without primary key, 1.1 GB

I conducted five more tests with increasing concurrency and a consistent load of 100K records. See the difference in insert performance here.

Time taken is in seconds

With increasing threads, time taken to insert into tables without primary key increases quite a lot because of the contention point added because of dict_sys mutex.

Always use a primary key!

P.S. — From Jeremy Cole’s blog about this

Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.

References

  1. Ovais Tariq’s Original Blogpost about Scalability Issues
  2. Chapter 4 from High Performance MySQL — strongly recommended book for MySQL power users
  3. Tuning Primary Keys in InnoDB
  4. Wasting InnoDB Memory — not immediately related to this, but interesting nevertheless
  5. MySQL Server Team’s Blog about InnoDB’s Intrinsic Tables
  6. Community Discussion about using mysqlslap
  7. Importance of Primary Keys in MySQL Performance
  8. UUID vs incremental ID insert performance in MySQL — not immediately relevant
  9. Jeremy Cole’s blog on How InnoDB behaves without a Primary Key — must read
  10. Lack of Primary Key May Effectively Stop Slave — this needs a deeper dive
  11. Another blogpost about Replication Issues when Primary Keys are not present
  12. InnoDB Row Format Specification

Art

Annunciazione by Leornado da Vinci


Published by HackerNoon on 2019/02/14