Imagine writing increasingly complex requirements. More than just the retrieval of one entity by primary key, especially if you run it in a loop or in an environment that is extremely intolerant of long-duration queries. In more complex cases, you also don’t have access to the production database and execute a real query on live data and production hardware.
How do you process rough estimates of SQL query execution?
Each database runs a query through the standard process (with some deviations depending on dB implementation):
Syntax parsing
Semantic parsing
Planning
Executing
The most time-consuming part is usually executed only because the database must go to disk, get some data and send it to the client. Other parts of the process go mostly to the CPU and use some cached metadata. The planning creates schemes, compares them, selects the best one, and sends it to the executing stage. The executing stage runs the plan - goes to disk, loads data, sorts or filters it, goes to disk again for another chunk of data, joins datasets, and so on until the plan is completed and the result is ready to be sent to a customer. These operations usually take up most of the total query time.
Let’s find out what happens to the sample query:
SELECT u.id, u.last_login, ud.phone
FROM user u
JOIN user_detail ud ON ud.user_id = u.id
WHERE u.firstname = 'John';
Let’s write some plans to run such a query.
For example:
for (User u: Users) {
if (u.firstname == 'John') {
for (UserDetail ud: UserDetails) {
if (ud.user_id = u.id) {
result.put(u.id, u.last_login, ud.phone);
}
}
}
}
return result;
user_detail
extension the following code should work faster.for (UserDetail ud: UserDetails) {
for (User u: Users) {
if (ud.user_id = u.id && u.firstname == 'John') {
result.put(u.id, u.last_login, ud.phone)
}
}
}
return result;
Indexes. They work as additional structures and in terms of code. The indexes usually look like pre-calculated maps, usually TreeMaps. Let’s for example take the first plan. Some user_id to UserDetails
map can be recalculated on each change of the table and that structure can speed up our select-request pretty much.
user_id
in UserDetailsfor (User u: Users) {
if (u.firstname == 'John') {
result.put(u.id, u.last_login, user_id_to_UserDetails[u.id].phone);
}
}
return result;
firstname
in User. Since firstname
is not unique value this index works as a MultiValuedMap
.for (User u: firstname_to_User['John']) {
for (UserDetail ud: UserDetails) {
if (ud.user_id = u.id) {
result.put(u.id, u.last_login, ud.phone);
}
}
}
return result;
for (User u: firstname_to_User['John']) {
result.put(u.id, u.last_login, user_id_to_UserDetails[u.id].phone);
}
return result;
So, we have plenty of plans. How do we choose the best one?
Initially, looks like the latest plan could be the best choice. But it is not true in every situation.
Let’s find out why.
The database has already completed several queries on the table and it knows the amount of data and some other things, like how and where the data is stored. The database also knows how and where additional structures (indexes) are stored.
After all, each plan has its own complexity. The complexity also depends on additional structures because some useful things can be precalculated. The final timing (or cost) of running depends on the speed of the hardware. This depends not only on the amount of data or requests that should be done to retrieve data but also on the nature of the requests. They can generate both sequential and random reads. And the nature of the queries can greatly affect the final timing.
To select the best plan, the database should have:
the plans themselves;
knowledge of pre-calculated things such as indexes or caches from earlier queries;
knowledge of the location of data and the additional structures (disc or memory cache);
knowledge of the disc and memory work (speed of sequential and arbitrary access, number of IO operations per second).
All this helps databases choose an optimal plan to run. Moreover, the best plan can be changed during the database’s lifetime as the table itself changes. At one point the table can fit in memory entirely, at the next - it won't. At one point the index can fit completely in memory, at the next - it won’t. In some cases, the data occupies less space than the indexes themselves.
The good news is that we needn’t go too far into the timing of our request. All we have to do is predict the most obvious scenarios for our plan.
Each table can be obtained from a disc with some filters in two ways:
To get the rows of the table by firstname ='John'
:
If there are no indexes by firstname
, the database:
loads the entire table (all records). The operation requires sequence reads on the disc system and the count of operations fully depends on the table size;
filters all entries by fields;
sends a result to the client.
If there is an index by firstname
, the database:
searches by index and receives positions in which there are entries with firstname=’John’. The operation requires a few random read requests to the disc and depends on index size and rows count but since it is only logarithmic complexity, it isn’t too many requests, usually just a few;
uploads records one by one directly to their positions;
sends a result to the client.
To access the timing of these approaches, we must know the specifications of the disk system and the size of tables and indexes. For example, let’s take some characteristics of the HDD disk up to 500MB/s for consecutive reading and up to 300 operations per second for random reading. Assume we have 1 million rows and each averages 50 bytes. This means that we have a table size of ~50 Mb and 6-7 Mb of index size (provided we have an average of 6 characters per first name). Assume we have ~300 different Johns in the table.
So, to calculate the timings of both approaches:
Then we need to do up to 3000 random reads from the disc to extract line by line all required rows. Some of them could be on the same page of memory so the actual timing could be lower. But for high assumption, it is 300 reads. 300 ops/sec / 300 ops = 1 sec. This means that the second approach takes up to one second to complete.
It highlights that a database doesn’t always need an index, in some cases a fullscan of the whole table is just more efficient. It also represents in general how databases make assumptions about implementation plans and how they choose the best of them.
Actually, databases have different tools to speed up queries. But in the end, all of them work over the disc and memory system trying to reduce operations over the disc system or move them to the memory.
For example, when query can use a few indexes over the same data the database can do it with a bitmap operation. It is an operation in which a database loads two or more indexes at the same time. It then performs logical operations over them - union
for operation OR
and intersection
for operation AND
. Then, the database can load the table using result operation over indexes.
So in that case the whole operation requires fewer and smaller disc reads, some operations on the CPU, and random reads from the disk to sample table rows.
Having knowledge about typical internal db operations helps developers not only estimate queries but also choose the correct set of indexes and write faster queries.
One important modern thing about assumptions is that today we use clouds everywhere and sometimes it is just difficult to find out about hardware in the depth of the cloud provider, and, after all, can be difficult to do an assumption. Nevertheless, there are laws of physics that we cannot circumvent. Each reading requires some time and each operation has a delay. And if you do not have terabytes of memory in your database setup, there is sufficient knowledge to make certain assumptions based on typical queries.