Hackernoon logoAdvanced and Efficient MongoDB Pagination by@maxvynohradov

Advanced and Efficient MongoDB Pagination

Max Vynohradov Hacker Noon profile picture

@maxvynohradovMax Vynohradov

Node.js, TypeScript, Serverless

Onсe upon a time, we had a complex project enough (ride-sharing and taxi application) with stack Node.js and MongoDB. We have chosen this stack because it was preferable by the customer, good known by our team, and at the same time looks like good a suite for project tasks.

Everything was great, the number of users became more than twelve thousand, the number of active drivers was close to three hundred drivers. In one year, the number of rides becomes more than two million.

But once we need to create an admin panel to control and monitor all processes (from the business point of view) in the main application. The huge percent of requirements was to have advanced lists of a variety of entities, with bind statistics over them. 

Because we use mongoose, as ODM, first of all, we take a look at its plugins. The most popular of them, that related to pagination are:

Another requirement was to have a possibility to choose a specific page on-demand, so the approach to use “previous-next”-like pagination, that based on cursors was prohibited immediately — its mongoose-paginate-v2 and mongoose-aggregate-paginate-v2 libraries.

The oldest, and probably the simplest in usage is mongoose-paginate — it uses simple search queries, limit, sort, and the skip operations. I guess it’s a good variant for simple pagination — just install a plugin, add few lines of code to your endpoint, and that’s all — work is done. It even can use “populate” of mongoose, — something that emulates joins from SQL world. Technically it just makes additional queries to the database, that probably not the way you want. Even more, when you just have a little bit more complicated query, with any data transformation, it will be totally unusable.

I know just one way to normally use it in such cases — first create MongoDB View — technically its pre-saved queries, that MongoDB represents as read-only collections. And just then run pagination using mongoose-paginate over this view. Not bad — you will hide complicated queries under view, but we have a better idea of how to solve this problem.

MongoDB Aggregation Framework is here!

You know, I guess, it was really day for the MongoDB community when Aggregation Framework was released. Probably it allows for most of the queries that you can imagine. So, we think about taking mongoose-aggregate-paginate into use.

But the next two things that disappointed us:

What does this plugin need? I mean — what task does it help to solve, that cannot be solved without this plugin, with the same effort. Looks like it just one more additional dependency in your project, because it doesn’t bring any profit, even don’t save your time…

Internal codebase, and general approach to making queries. This library makes TWO calls into a database and waits for responses via Promise.all. First — to fetch query result and second — to calculate the count of total records that query returns, without $filter and $limit stages. It needs this to calculate the number of total pages.

How we can avoid additional queries into the database? The worst thing here that we need to run all aggregation pipeline twice, that can be costly enough in terms of memory and CPU usage. Even more, if collection huge, and documents tend to be few megabytes, it can impact Disc I/O usage, that also a big problem.

The good news — Aggregation Framework has a specific stage in its arsenal, that can solve this problem. It’s $facet:

Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its field in the output document where its results are stored as an array of documents.

MongoDB documentation about $facet stage . 

Aggregation Pipeline for pagination will have the next shape:

{ 
  $facet: {
      <outputField1>: [ <stage1>, <stage2>, ... ],
      <outputField2>: [ <stage1>, <stage2>, ... ],
      ...

   }
}
  • Run all operations, that don’t directly affect on final pagination result, after all possible filters ($match stages). There are stages like $project or $lookup — that don’t change the number or order of result documents. Try to cut off as many documents as you can at once.
  • Try to make your models as more self-sufficient, to avoid additional $lookups. It’s normal to duplicate some data or make pre-computing fields.
  • If you have a really huge pipeline, that processes many data, your query will probably use more than 100MB. In this case, you need to use allowDiskUse flag.
  • Follow Aggregation Pipelines performance optimization guide. This advice helps you to make your queries more efficient.
  • And technically — you can make dynamic queries on your application code side — depends on conditions you can add, remove or modify specific stages. It can speed up your queries, and moreover, make your code more eloquent.

Coz NDA, I cannot show you real database schema and real queries. But let me show you a small example of such pagination. 

Imagine that you have two collections — Statistic and Drivers. Drivers collection is static enough in thinking of types and amount of fields in different documents. But Statistic is polymorphic, can be changed during time, as a result of business requirements updates.

Also, some drivers could have big statistic documents and history in general. So you cannot make Statistic as subdocument of Driver.

So code and MongoDB query will have the next shape:

const ridesInfoPaginationPipeline = (filter = {}, skip = 0, limit = 10, sort = {}) => [{
    $match: {
      ...filter,
      active: true,
    }
  },
  {
    $sort: {
      ...sort,
      createdAt: -1,
    }
  },
  {
    $lookup: {
      from: 'statistic',
      localField: '_id',
      foreignField: 'driverId',
      as: 'driver',
    },
  },
  {
    $unwind: {
      path: '$driver',
      preserveNullAndEmptyArrays: true,
    },
  },
  {
    $project: {
      driver: {
        $ifNull: [{
          $concat: ['$driver.firstName', ' ', '$driver.lastName']
        }, 'Technical']
      },
      entityId: 1,
      message: 1,
      meta: 1,
      createdAt: 1,
    },
  },
  {
    $facet: {
      total: [{
        $count: 'createdAt'
      }],
      data: [{
        $addFields: {
          _id: '$_id'
        }
      }],
    },
  },
  {
    $unwind: '$total'
  },
  {
    $project: {
      data: {
        $slice: ['$data', skip, {
          $ifNull: [limit, '$total.createdAt']
        }]
      },
      meta: {
        total: '$total.createdAt',
        limit: {
          $literal: limit
        },
        page: {
          $literal: ((skip / limit) + 1)
        },
        pages: {
          $ceil: {
            $divide: ['$total.createdAt', limit]
          }
        },
      },
    },
  },
];

const executePagination = async () => {
    return Statistic.aggregate(ridesInfoPaginationPipeline());
}

As you see, using Aggregation Framework and $facet stage we can:

  • make data transformation and complex queries;
  • fetch data from multiple collections;
  • get metadata of pagination (total, page, pages)in the one query without additional query execution.

Regarding the main drawbacks of such an approach, I guess just one is major — higher complicity of development and debug process, along with higher entry threshold. It includes performance troubleshooting, knowledge of a variety of stages, and data modeling approaches.

So, pagination, that based on MongoDB Aggregation Framework, is not pretending to be a silver bullet. But after many attempts and pitfalls — we found that this solution is covered all our cases, with no effects and no high coupling to a specific library.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.