Efficient Server Side Pagination using Mongodb

As a developer, you may have come across a situation where you are required to show some enormous amount of data to be displayed in the UI, in tabular format/grid without a drop in performance. If you are a developer who spends a decent amount of time coding, you might have guessed it already.

Server Side Pagination! yeah, you are right. It is the most optimum way to show large data.

Why is that?

Consider the following scenario. Say that your application sells products online and you have a product base with more than one million items, which are added to the db using a master form. It is obvious now the end-user should have provision to delete and edit the already added products.

To achieve this you should have to give them an interface, presumably a grid, where the user can browse the particular product he/she intends to edit/delete. Now think about a scenario where this interface is getting populated with all the 1 million products altogether. Messy. isn’t it? No one could ever agree with what you have given because it is messy, slow, and inconvenient. So in these situations intellectuals suggest you paginate the grids with proper page size and page numbers.

Yeah, you are right again. You just need client-side pagination to solve the said problem. Then why should we go for server-side pagination?

The reason is performance. If you are using client-side pagination, even if it is implemented using any third-party library(like jquery data table or kendo grid) or your custom logic, somewhere somehow the data has to be kept on the client-side. There are 5 problems with this approach.

High bandwidth is required since the data(1 million products we are talking about) has to travel through the internet from the database to the client-side.
In the case of relational databases, the data has to be mapped to a DTO before returning which costs a huge computation as well as memory consumption in a single go.
The high probability of the database server timeout the request since it is needed to fetch a huge data, maybe with filtering as well.
If anything goes wrong in between, all the computation so far spent is wasted. Think about a situation where a single product that is at the 99999th position has some missing value and could not be mapped in the DTO due to a mandatory validation, which throws an exception. All the work is so far done by the database engine, till the 99998th product goes in vain. All the effort put in by the mapping method so far is also wasted. The end-user gets nothing even after all the struggle by the system.
Say in a hypothetical situation, all went fine, still, the huge data has to be stored either in the DOM or the browser memory. I bet that much of the data processing on the client-side causes the browser to perform incorrigibly wrong

MongoDB implementation of server-side pagination

There are a lot of articles and tutorials which cover this particular topic available on the internet. But most or all of them miss a very basic thing required for the processing of server-side pagination, i.e:- they miss to return the Total Count of data that we fetch. Yeah a lot of them don’t cover that. So what is the problem with that?

Okay, let me explain. The things we need to send as parameters to the server from the front end are page size (how much data to be shown on a single page) and page number (starting from zero). Without the server sending the total count of data along with the first page’s result, we never can determine how many pages need to be shown in the UI. What does it mean? It means you never can let the user jump from page-1 to page-3 or page-3 to page-n. In this case, we can only give the user to go to the next page or the previous one. Hence, if the user needs to go to the 30th page, he/she has to go through 1 to 29 first. I don’t think it would be a good idea to give a demo of this to any client, especially to those who irk for everything.

We will use the same product example which I mentioned earlier. I will keep this simple as possible so that we all will be on the same page going forward. Step by step we will try to implement a MongoDB aggregation pipeline to return the products in a server-side manner with an optional filter of product category.

This is the basic structure of all the products which are in the Products collection. Assume that this collection has more than 1 million such products in different categories.

{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
    "Name" : "Product-1",
    "Price" : 25.3,
    "Category" : "Food"
}

{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
    "Name" : "Product-1",
    "Price" : 263.0,
    "Category" : "Non-Food"
}

{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
    "Name" : "Product-1",
    "Price" : 159.0,
    "Category" : "Stationary"
}

Let’s start from the basics. First, we will try to get all the products in a single go along with the optional filter parameter, which is the product category. For this, we only need to use the $match stage in the pipeline

//in real world these are parameters from the code
var category = null; //the optional filter for category
db.getCollection('Products').aggregate
([
    {
        $match:
        {
            $or:
            [
                { null:category },
                { "Category":category }
            ]
        }
    }
])

In the above code, the $match stage matches all the products under a specified category if any category is mentioned as the filter(if a category has value) and returns all the products under all the categories if no category is mentioned as the filter(if category is null)

In the $match stage, value at the left hand side of a match condition is always considered as property path of document. So if you change { null:category } to { category : null} our expectation wont meet

So we know how to get all the data in a single go even if it has any filter applied or not. Now we can do server-side pagination in the above code. For this, the aggregate pipeline gives us two stages:- $skip and $limit

//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
    {
        $match:
        {
            $or:
            [
                { null:category },
                { "Category":category }
            ]
        }
    },
    {
        $skip: pageSize * pageNumber
    },
    {
        $limit:pageSize
    }
])

In addition to the previous code, we are passing the output from the $match stage to the next stage in the pipeline, i.e the $skip stage. The skip stage will remove the first n products from all the products it received as input, where n is the value given in the $skip stage. The next stage in the pipeline is the $limit, which does exactly what it says, except the first m products others will be removed, where m is the value given in the $limit and the result will be passed to the next stage in the pipeline if any. In our case, there are no more stages in the pipeline and hence returned out.

Example cases

pageNumber=0 & pageSize=20 will return the first 20 products

pageNumber=1 & pageSize=20 will return the next 20 products after skipping the first 20(pageNumber*pageSize)

pageNumber=25 & pageSize=20 will return the next 20 products after skipping the first 500(pageNumber*pageSize)

But still, we could not able to return the total count of products along with the result which is needed for the grid libraries to render page numbers in the UI. From the below figure, it is plausible to give the end-user an idea of how many pages are available in total for him\her to surf through.

So we will now try to get the total count of products along with the products themselves. In earlier days, it was a practice by the developers to get the count and the data using two separate calls to the DB because of some limitations MongoDB had. This approach will end up in filtering and scanning through the huge data twice. Almost the same computation will have to be used by the database engine to get both the results. However, by the arrival of version 3.4, we can use the so sophisticated $facet stage in the pipeline.

Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.

Basically what it says is that the input of a $facet stage can be passed through multiple separate pipelines and the results from each sub pipeline will be saved as a property of the $facet’s output document. In our case, we can use the $facet as the first stage with two sub pipelines inside. One for getting the required data and the other to get the total count. Let’s refactor the code we wrote so far.

//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
    {
        $facet:
        {
            "Products":
            [
                {
                    $match:
                    {
                        $or:
                        [
                            { null:category },
                            { "Category":category }
                        ]
                    }
                },
                {
                    $skip: pageSize * pageNumber
                },
                {
                    $limit:pageSize
                }
            ],
            "Count":
            [
                {
                    $group:
                    {
                        _id:null,
                        "Total":{ $sum:1 }
                    }
                }                
            ]
                
        }
    }
])

If we execute the above code, the result will have a single document with two properties:- Products which contains the resultant data, and Count which has the total count of actual data. In this case, the count will be the total number of products that are in the DB, which will be the same for every page we return. This count can be used by third-party libraries or by your custom code to determine the number of pages to be rendered in the UI.

Intermediate Result

{
    "Products" : [ 
        {
            "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
            "Name" : "Product-1",
            "Price" : 25.3,
            "Category" : "Food"
        }, 
        {
            "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
            "Name" : "Product-1",
            "Price" : 263.0,
            "Category" : "Non-Food"
        }, 
        {
            "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
            "Name" : "Product-1",
            "Price" : 159.0,
            "Category" : "Stationary"
        }
    ],
    "Count" : [ 
        {
            "_id" : null,
            "Total" : 3.0
        }
    ]
}

A little cleanup is required in the above result which is to flatten out the products and move the total property inside the products. We can use $unwind, $addFields and $replaceRoot

//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
    {
        $facet:
        {
            "Products":
            [
                {
                    $match:
                    {
                        $or:
                        [
                            { null:category },
                            { "Category":category }
                        ]
                    }
                },
                {
                    $skip: pageSize * pageNumber
                },
                {
                    $limit:pageSize
                }
            ],
            "Count":
            [
                {
                    $group:
                    {
                        _id:null,
                        "Total":{ $sum:1 }
                    }
                }                
            ]
                
        }
    },
    {
        $unwind:"$Products"
    },
    {
        $addFields:
        {
            "Products.Total": { $arrayElemAt:["$Count.Total",0]}
        }
    },
    {
        $replaceRoot:
        {
            newRoot:"$Products"
        }
    }
])

Final Result

{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
    "Name" : "Product-1",
    "Price" : 25.3,
    "Category" : "Food",
    "Total" : 3.0
}
{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
    "Name" : "Product-1",
    "Price" : 263.0,
    "Category" : "Non-Food",
    "Total" : 3.0
}
{
    "_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
    "Name" : "Product-1",
    "Price" : 159.0,
    "Category" : "Stationary",
    "Total" : 3.0
}

Further performance enhancements

There are some pressure points you could see while implementing the above MongoDB query in a real-world application having humongous data. For instance, the $skip stage in the pipeline becomes so costly especially when end users try to navigate to farther pages. Either you could simply ignore it knowing how often the user might need to go to the 300th page or you could perfectly avoid the $skip stage by taking leverage on the natural ordering property of the _id field, which will be unique for each document by default. A nice implementation is found in this article by Arpit Bhayani. But please do note that, if you contemplate using this approach, end-users will lose their ability to jump from one page to some other pages which are not adjacent to the current one i.e.; the user has to go from 1 to 29 pages first if he\she needs to go to the 30th page. So take this approach only if you don't plan for that flexibility to be given.

In addition, mongodb provides a very efficient indexing mechanism that you can index the data in such a way that it can be used as an advantage for the $match and $sort stages. However, it is all depended on how you implement the pagination since the application is all yours.

Happy coding!

This article was originally published here.