paint-brush
Comparing Diesel and rust-postgresby@sgrif
6,343 reads
6,343 reads

Comparing Diesel and rust-postgres

by Sean GriffinDecember 10th, 2016
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Following the release of <a href="https://github.com/diesel-rs/diesel/releases/tag/v0.9.0" target="_blank">Diesel 0.9.0</a>, I wanted to revisit the performance differences between Diesel and rust-postgres to see if things had changed significantly <a href="https://gist.github.com/sgrif/da762c08f9a567c80d44" target="_blank">since I last measured both in 0.1.0.</a>

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Comparing Diesel and rust-postgres
Sean Griffin HackerNoon profile picture

Following the release of Diesel 0.9.0, I wanted to revisit the performance differences between Diesel and rust-postgres to see if things had changed significantly since I last measured both in 0.1.0.

TL;DR: Diesel consistently outperforms rust-postgres by 25–30% in idiomatic code. It is also more concise and catches errors at compile time to help you write working code faster. You can find the full benchmarking code here. The rust-postgres benchmark could be further optimized to find about a 10% performance increase. Doing this, however, results in code that is more brittle and harder to understand.

If you’re looking for something to do this holiday season, and you haven’t tried Diesel yet, check out our Getting Started guide. The rest of this article is going to examine the differences between the usage of the two.

The benchmark results are based purely upon the results of select statements, not the insertion of the data. The majority of applications backed by a SQL database are read heavy, not write-heavy. Additionally, rust-postgres relies heavily on dynamic dispatch for bind parameters. This means that for queries with an excessive amount of bind parameters such as batch inserts, its performance was reduced to a point that I didn’t feel was a fair comparison.

Setting up data

We’ll look at two benchmarks. The first one will set up with some number of rows into a single table with 3 columns, and then perform SELECT * FROM users or equivalent. The second will use two tables, with more data, and a join between the two. This is the setup for Diesel:

The definition of NewUser is at the top and looks like:

The #[derive(Insertable)] comes from the diesel_codegen crate, and generates the appropriate code to use this struct in an insert statement. We know the database schema from a line at the top that looks like infer_schema!("env:DATABASE_URL") which will load the database schema at compile time. Using a vector or slice of NewUser for a batch insert works automatically. We’re doing an assert_eq! on the result to double check that we actually inserted the number of rows we expected.

The setup for the rust-postgres benchmark is a bit more complex.

Of course this is always going to be a bit more verbose, since we have to construct the SQL manually. For batch inserts, this means lots of dynamic query construction of bits of SQL that are nothing more than ($1, $2) which is always difficult to read. This is to be expected, rust-postgres isn’t trying to be a query builder. However, the hoops we had to jump through with the bind parameters was surprising to me.

The rust-postgres crate defers the serialization of the bind parameters as long as possible. However, it also doesn’t provide any data structure with which to hold the bind parameters. This means that we’ll have to stick them into a list of trait objects, even when all the binds are of the same type. This also means that it’s relying on dynamic dispatch, which will severely hinder the ability for the compiler to optimize the serialization code.

The API provided by the rust-postgres crate seems like it’s probably fine if you’re building simple queries or if all your bind values are hard coded or simple borrows from something that’s already in scope. But once you’re doing anything dynamic, it’s quite hard to figure out exactly what to do. It took me quite a while to figure out the exact incantation I needed to construct the borrowed bind params there.

Finally, since we’re just operating on simple SQL strings, we also need a check to make sure we’re not trying to insert 0 rows. Since rust-postgres only deals with raw SQL, it would have no way of understanding our intent to handle that case automatically for us.

Simple Query Execution

For the first benchmark, we are going to execute SELECT * FROM users or equivalent, deserialize all of the rows into a Vec<User>, and then check that the length of that vector is what we expect. The struct definition for both benchmarks looks like this:

In the Diesel benchmark, that struct is annotated with #[derive(Queryable)], which generates the code to deserialize the row into the struct. The benchmark for Diesel looks like this:

One point of note is that the query this generates is slightly different that you would write yourself. The exact query is SELECT "users"."id", "users"."name", "users"."hair_color" FROM "users". We have to add a turbofish for ::<User> here, since we’re never using the vector and the compiler wouldn’t be able to figure out the type.

The rust-postgres benchmark is somewhat similar:

One point of note is that we’ve written the deserialization code inline, rather than extracting a function called User::from_row or similar. Creating that function is a common pattern in code using rust-postgres in the wild, but it’s ultimately quite fragile. Outside of this one query, we don’t know whether the columns have been renamed or not. As we’ll see later, a rename is often required. We also don’t know if this code appears inside of a join or not. If it’s on the right side of a left outer join, we’ll need to return Option<User> instead. Since a User::from_row function would be of limited reuse, I’ve opted not to extract one.

We could also gain some improvement here if we were to list out each of the columns explicitly, and fetch the result by index rather than by name. The goal of this benchmark, however, was to write code which from the user’s point of view takes the same amount of effort and does approximately the same thing. Fetching results by index is extremely brittle, and since the Row type knows nothing about the underlying SQL type of the data, it’s extremely easy to accidentally deserialize junk data into the wrong type. All code I’ve seen in the wild accesses fields by name, and that’s probably the right decision.

These were the results for the “simple query” benchmark.

There’s a good bit of variance due to the overhead of executing a query with postgres, but Diesel consistently came in at between 25% and 30% faster for every run.

Complex Queries

For the second benchmark, we’ll get a little more complex. For the data, we’ll alternate hair colors for users, and give every third user a post. This is the setup code for Diesel.

For NewPost, we’re working with mostly static data. #[derive(Insertable)] is designed to work with both owned and borrowed data, so you can use whatever is most convenient for that situation. In the case of NewUser where we were generating dynamic strings, it was much easier to put the result in an owned string.

The rust-postgres setup is similar to the first benchmark as well.

Collecting the bind parameters for the posts query was even more difficult than the first benchmark. If the data was just *a bit* more static, we would have been able to put it into a Vec<&ToSql> off the bat, as its author seemed to have intended. However, we have to add 1 to the counter to get the user_id, meaning that we have a dynamically constructed value. Since this time the types of the bind parameters are different, we have to box them up to get a single type for the vector. Even after that though, we still need the second iteration to go from Vec<Box<ToSql>> to Vec<&ToSql> which finally we can borrow to the specific &[&ToSql] that rust-postgres wants.

The query for the second benchmark is SELECT * FROM users LEFT OUTER JOIN posts ON posts.user_id = users.id WHERE hair_color = "black" ORDER BY name DESC or equivalent. The code for the Diesel version looks like this:

You’ll notice that we’re never specifying the ON clause for this query. The information on how to perform the join is generated by an annotation on User that looks like #[has_many(posts)]. We assume by default that the foreign key is in the form parent_id. If you don’t use this convention, or have a more complex join, it’s an easy default to override.

The rust-postgres code for this benchmark looks like this:

The query here is much longer. We need to list out every column manually, and alias them so we can be sure we’re fetching the right column later. The API provided by rust-postgres does not let you specify the table name for the column. In the case of conflict, doing row.get("id") will silently give you whichever id column came first, with no indication that you might be using the wrong data.

We’ve once again performed our deserialization inline. This example demonstrates my point from earlier. The common from_row abstraction that many codebases use is quite fragile. Unfortunately, the rust-postgres crate doesn’t provide the APIs needed to write generic deserialization code for these sort of cases.

One high point for rust-postgres here was that this was a much easier query to execute than the rest. All our bind parameters are static, so we were able to write it inline with the query. I believe this is the type of case that its author optimized for, and it works quite well when you stay within those boundaries.

The results of the second benchmark were:

Conclusion

Hopefully this helps to illustrate some of the key differences between Diesel and rust-postgres. Despite the focus on remaining lower level than Diesel, rust-postgres does not provide any improvement in performance over Diesel.

It should be noted that the rust-postgres benchmarks compiled in 2.76 seconds, while the Diesel benchmarks took 3.4. I expect that gap to increase on large codebases. The binary for the rust-postgres benchmarks was roughly 25% larger than the binary for the Diesel benchmarks, and I would expect Diesel’s binary size to remain smaller regardless of the size of the codebase.

I truly believe that the safety guarantees that Diesel provide increase developer productivity significantly. It took me many tries to get the rust-postgres benchmarks to actually work. The feedback cycle of waiting for the runtime errors to tell me what to do, assuming the message was helpful at all, was quite slow compared to the quick compile-time feedback given by Diesel.

If you haven’t already given Diesel a try, I hope you’ll take the opportunity to do so over the holiday break. You can learn more at http://diesel.rs

Thanks for reading!

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!