Note: I have used MySQL and it’s storage engine InnoDB interchangeably.
A couple of days ago, someone at work asked me whether inserts are faster in a table when there’s a primary key
present in the table. My immediate answer was that they were faster. I had some idea about this from experience and previous reading but didn’t remember the reason why it was so.
For a business user unaware about InnoDB internals or InnoDB’s existence, it might seem perfectly intuitive to assume that inserts will be faster without a primary key
because a primary key
is an additional constraint on the table, i.e., MySQL has to check for uniqueness, nullability. When it comes to InnoDB, that is not the case. Also, a table without a primary key
will occupy more space than a table with a primary key
. See how.
MySQL has various mutexes
for preventing multiple threads accessing the same resource at the same time. There is a mutex called dict_sys
which locks the data dictionary, i.e., the information_schema
. Ovais Tariq has written a very insightful blogpost on this.
When a primary key
is not defined on a table, MySQL generates row-ids
for every row that is inserted in that table. This, because InnoDB is an index-organized storage engine, it mandates a primary key
to store the contents of the table. If you don’t define a primary, MySQL will do this itself by generating a 6 byte auto increment primary key
which is hidden. It is referred by the name DB_ROW_ID
in the documentation. This only has to exist if a table doesn’t have a primary key
.
Now, when this row-id
is generated on every new insert, MySQL acquires dict_sys
mutex — and it becomes a system-wide contention point because what it is acquiring a mutex on is the data dictionary — which might be in use by a couple of other DDL and DML operations. This would mean that insert performance on a table might be affected by another operation taking place on another table. This contention would otherwise not exist if a primary key
is defined on the table.
To check how a similar insert workload behaves on a table with a defined primary key
and another table without a defined primary key
, I performed a simple mysqlslap
insert performance test on my local
. I have attached the scripts to perform these tests. Note: Doing this with _rand()_
is probably not ideal.
mysqlslap
— is a sysbench
like tool for MySQL that can be used for performance and load testing of individual tables — Read more about it here.
The insert performance in this case was surprisingly not very different in both the cases — checked for 10 million records. This was with the concurrency of 4 threads
primary key
, 1284.848 secondsprimary key
, 1282.350 secondsIn an isolated environment and with only 4 thread, there were not many dict_sys
mutexes on information_schema
, hence, there was no significant difference between the insert performance. Although, with the same table configuration, these two tables occupy significantly different amount of space on disk — which comes from the fact that MySQL had to create row ids
for every record that was being inserted in the table without a defined primary key
.
primary key
, 900 MBprimary key
, 1.1 GBI conducted five more tests with increasing concurrency and a consistent load of 100K records. See the difference in insert performance here.
Time taken is in seconds
With increasing threads, time taken to insert into tables without primary key
increases quite a lot because of the contention point added because of dict_sys
mutex.
Always use a primary key
!
P.S. — From Jeremy Cole’s blog about this
Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.
References
Annunciazione by Leornado da Vinci