In one of projects at work we needed to generate big report and ActiveRecord was too slow and too memory hungry for this. In some edge cases it took more then an hour to generate report with 100,000–200,000 rows. Memory was more then 2GB, VM start swapping and all instances running on that VM become slow.
First I try to implement find_in_batches that will use limit() and offset(), but it didn’t work well. Every next batch become slower because DB needed to find first N batches and give me only N-th, on next batch it will N+1 batches, and so on. Better for memory and worse for speed.
Next I start to think why exactly it takes so much time? Most of the time it spent creating ActiveRecord objects and accessing attributes. That was biggest memory consumer too. I come up with idea to make light version of ActiveRecord object, with as less allocations and better performance. I call it LightRecord.
LightRecord is extension for ActiveRecord, that can create read-only objects with as less allocations as possible. It simply use hash from database driver (mysql2) and define accessor methods for fields. No type casting, no dirty attributes (ActiveModel::Dirty), no copying data.
Benchmark for generating CSV report with 36634 rows and 18 columns. OS X 10.11.6, ruby 3.0.1, rails 5.0.0.1
As you can see there is also LightRecord With Streaming. Uses streaming feature from mysql2 client, which loads records one by one and can iterate them. The memory on a chart is growing straight, because it doesn’t load all the data at beginning.
You can see code example on LightRecord github page. We use it at one project in production and I’m happy with performance it gain