Okey, let’s talk about sharding. Sharding is distributing database queries across multiple servers. Before we do that, let’s get the idea of storage engines. New in [MongoDB](https://hackernoon.com/tagged/mongodb)
3.0 is that they offer pluggable storage engines. A storage engine is the interface between the persistent storage, which will call the disks which might be a solid state disk, and the database itself.
So, the database talks to the persistent storage through a storage engine. Now, it may be the case that the storage engine itself decides to use the memory to optimize the process. So, in other words disk is very slow. Since the idea of databases is to store stuff persistently, what happens is the storage engine has control of memory on the computer. And it can decide what to put in memory and what to take out of memory and what to persist to disk and when. So the database server itself defers the handling of the memory on the server as well as the disk itself to the store engine. MongoDB
provides pluggable storage engine where we can use more than one. With different storage engines, we get different performance characteristics. For MongoDB
, there’re 2 storage engines shipped with:
What a storage engine is not?
If we have a bunch of MongoDB
servers all running in a cluster, the storage engine doesn’t affect the communication between those different MongoDB
servers. The storage engine doesn’t affect the API that the database presents to the programmer. The storage engine directly determines the data file format and format of indexes
We call it MMAPv1 — the original storage engine of MongoDB
because it internally uses the mmap
call under the covers in order to implement storage management. Let’s look at what the MMAP system call looks like. On Linux, it talks about memory allocation, or mapping files or devices into memory. Causes the pages starting address and continuing for at most length bytes to be mapped from the object described by the file descriptor at an offset. So, what does that really practically mean?
Well, MongoDB
practically needs a place to put documents. And it puts the documents inside files. And to do that it initially allocates, let’s say a large file. Let’s say it allocates a 100GB file on disk. So, we wind up with 100GB file on disk. The disk may or may not be physically contiguous on the actual disk, because there are some algorithms that occur beneath that layer that control the actual allocation of space on a disk. But from our point, it’s a 100GB contiguous file. If MongoDB
calls mmap
system call, it can map this 100GB file into 100GB of virtual memory. To get this big virtual memory, we need to be on a x64 machine. And these are all page-sized. So pages on an OS or either 4k or 16k large. So, there is lot of them inside the 100GB virtual memory. And the operating system is going to decide what can fit in the memory. So, the actual physical memory of the box is let’s say 32GB, then if we go to access one of the pages in this memory space, it may not be in memory at any given time. The operating system decides which of these pages are going to be in memory. We’re showing the ones available in memory as green ones. So, when we go to read the document, if it hits a page that’s in memory, then we get it. If it hits a page that’s not in memory (the white ones), the OS has to bring it from the disk.
MMAPv1 storage engine provides
MongoDB
is it’s own file (can be seen in ~\data\db
). If multiple write
s are fired on the same collection, one has to wait for another to finish. It’s a multiple reader. Only one write can happen at a time to a particular collection.Also, notice that since, OS decides what is in memory and what is on disk — we cannot do much about it. The OS is smart enough for memory management.
For a lot of workloads, it’s faster. It provides
mongod --storageEngine wiredTiger
.A
,B
, C
. We can search rapidly from left side byA
,B
, C
A
, B
A
But to search from the right side is not supported by index. The following will not be speeded:
C
B
Indexing is not free because whenever we change anything in the document which affects the index, we’re going to have to update the index. It needs to be written on memory and finally on disk. Also, it’s nice to know that indexing slows down the writes. Yes, the reads are faster. Creating indexes also takes time even on fast computers, because we have to scan the entire collection. Create new data structures and write them all to disk.
To create a compound index use command db.collectionName.createIndex({A: 1, B: -1})
. Where 1
means ascending, -1 means descending. To drop the index, use db.collectionName.dropIndex({A: 1, B: -1})
. Compound indexes cannot have both the columns with array types. If we try to do this, MongoDB
will tell that it can’t index parallel arrays and the reason is that there’s an explosion of index points that create it because it has to create index points for the cartesian product of the items in the array and it doesn’t permit that. Multikey indexes are applied when we have arrays inside documents on indexed keys. To get the currently available indexes, use command db.collectionName.getIndexes()
. The nested keys can also be indexed, e.g.: db.collectionName.createIndex({'A.aa': 1})
. Where, aa
is a sub-key inside key A
.
Let’s create a collection with duplicate entries:
db.collectionName.insertOne(thing: 'apple'); db.collectionName.insertOne(thing: 'pear'); db.collectionName.insertOne(thing: 'apple');
To make sure that the entries are unique, we can apply a unique index. To create a unique index, use db.collectionName.createIndex({thing:1}, {unique:true})
. This command will give an error because there’re multiple apples in the collectionName
. To remove one, use command db.collectionName.remove( {thing:'apple'}, {justOne: true} )
.
After this index is in place, any attempts for duplicate inserts will result in duplicate key error
. To see whether the index is indeed created, use command db.collectionName.getIndexes()
. One interesting thing to notice is that the index on _id
is shown as unique, although it is.
{a:1, b:5, c:2} {a:8, b:15, c:7} {a:4, b:7} {a:3, b:10}
Let’s assume that we wish to create an index on the above documents. Creating index on a
& b
will not be a problem. But what if we need to create an index on c
. The unique constraint will not work for c
keys because null value is duplicated for 2 documents. The solution in this case is to use sparse
option. This option tells the database to not include the documents which misses the key. The command in concern is db.collectionName.createIndex({thing:1}, {unique:true, sparse:true})
. The sparse index lets us use less space as well.
Notice that even if we have a
_sparse_
index, the database performs all documents scan especially when doing sort. This can be seen in the winning plan section of_explain_
’s result.
There are multiple indexing options — foreground (default) & background. Foreground is relatively fast and it blocks all writers and readers. Other databases we can still get to. This is supposed to be not done in a production environment.
Background index creation is a bit slower and they don’t block readers and writers. With MongoDB
2.4 and later, you can create multiple background indexes in parallel even on the same database.
Beginning in MongoDB
2.6, creating an index in the background on the primary will cause the indexes to be created in the background on secondaries, as well. The secondaries will begin index creation when the primary completes building its index.
There is another way to create index very efficiently in a production system. That is to create an index on a different server that is being used to serve most queries. Say, in a replica set of multiple database servers working in tandem, one can be taken out and the requests can be routed to the available ones. The foreground index creation can be done on the separated server. After the creation is done successfully, it can be brought back to the cluster.
The explain
command shows what action the database takes in executing a query. It returns an explainable object. What indexes it uses, how many documents are scanned. It does almost everything similar to what the actual query would do, but it’s not the entire simulation of the query. This command can be used bith from a shell as well as from the driver. It can be used in multiple cases:
db.collectionName.explain().find()
db.collectionName.explain().update()
db.collectionName.explain().remove()
db.collectionName.explain().aggregate()
But the following is not allowed:
db.collectionName.explain().insert()
db.collectionName.remove({a:1, b:2}).exlain()
To find out which functions can be used in conjunction with explain
, use db.collectionName.explain().help()
. In previous versions, it was possible to append this command at last on a cursor (remember that find()
returns a cursor). e.g.: db.collectionName.find().explain()
. This was changed to be pre-pended by an explainable object’s function. The reason is not all methods return a cursor. e.g.: count()
will return a scalar.
The explain
command can run in 3 modes:
queryPlanner
- tells mostly what the database would use in terms of indexes
executionStats
- includes the queryPlanner
mode, also includes what the result of using the indexes will be
allPlansExecution
- includes the queryPlanner
& executionStats
mode. It does what the query optimizer does periodically is it runs all the possible indexes that could be used for a particular shape of query, and it runs them in parallel. Then makes a decision about which one is fastest.
var exp = db.collectionName.explain("executionStats"); exp.find();
One thing to understand when studying the performance stats is to see the number of returned documents (nReturned
) and number of documents examined (docsExamined
). The following data says that The query scanned 999,999 documents, returning 10,000 in 619 milliseconds.
> exp = db.example.explain("executionStats")Explainable(test.example)> exp.find( { a : 7 } ){"queryPlanner" : {"plannerVersion" : 1,"namespace" : "test.example","indexFilterSet" : false,"parsedQuery" : {"a" : {"$eq" : 7}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"a" : {"$eq" : 7}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 10000,"executionTimeMillis" : 619,"totalKeysExamined" : 0,"totalDocsExamined" : 999999,"executionStages" : {"stage" : "COLLSCAN","filter" : {"a" : {"$eq" : 7}},"nReturned" : 10000,"executionTimeMillisEstimate" : 520,"works" : 1000001,"advanced" : 10000,"needTime" : 990000,"needFetch" : 0,"saveState" : 7812,"restoreState" : 7812,"isEOF" : 1,"invalidates" : 0,"direction" : "forward","docsExamined" : 999999}},"serverInfo" : {"host" : "cross-mb-air.local","port" : 27017,"version" : "3.0.1","gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"},"ok" : 1}
Covered query is a query, which can entirely be satisfied by an index. Hence, zero documents needs to be inspected to satisfy the query. This makes the query a lot faster. All the projection keys needs to be indexed.
Like any other database, it’s important to keep “working set” in the memory. Working set is the data that is frequently being access by the client. This is especially true for indexes. If the index is in the disk, taking it from there for every query is very expensive. Estimating the amount of memory we’ll need for a MongoDB
deployment. This can be found by using command db.collectionName.stats()
, below is the output in concern:
This output changes from one database engine to another. There’s also a shortcut db.collectionName.totalIndexSize()
which gives the required size detail. With WiredTiger storage engine, the sizes of the indexes is much smaller because it supports a technology called prefix compression. But this compression comes at the cost of CPU. The moral of the story is that, we need to always make sure that the index fits into the working set and which in turn needs to fit in the memory.
Index cardinality: it refers to the number of index points for each different type of index that MongoDB
supports.
Let’s say you update a document with a key called tags and that update causes the document to need to get moved on disk. Assume you are using the MMAPv1 storage engine. If the document has 100 tags in it, and if the tags array is indexed with a multikey index, 100 index points need to be updated in the index to accommodate the move?
These indexes allow us to find things based on location. We’ve 2 options: 2D & 3D In 2-D, we’ve a cartesian plane with x
& y
coordinates and a bunch of different objects.
The document needs to store some sort of x
,y
location with createIndex({'location':'2d', 'type':1})
. The type option specifies the direction of the index i.e. ascending or descending (and it’s optional). It can be a compound index. To find near locations use command db.collectionName.find({location: {$near:[x,y]}})
. And practically speaking, the way this is often used is through a limit
. To limit the results to 20 append .limit(20)
.
The location on the globe are represented using longitude and latitude. Latitude is how far off the equator is. Equator is at zero degrees latitude. We can go from -90 to 90. A special type of index called 2d sphere. To specify location of things, we use GeoJSON. The locations can be described using a point or a polygon. The type
& coordinates
are reserved words. The values for type
are reserved words. For this type of data, the indexing will be done using createIndex({'location':'2dsphere', 'type':1})
For collection named “stores” to return the stores that are within 1,000,000 meters of the location latitude=39, longitude=-130? Assuming the stores collection has a 2dsphere index on “loc”. Each store record looks like this:
{"_id":{"$oid":"535471aaf28b4d8ee1e1c86f"},"store_id":8,"loc":{"type":"Point","coordinates":[-37.47891236119904,4.488667018711567]}}
The query will be db.stores.find({ loc:{ $near: { $geometry: { type: "Point", coordinates: [-130, 39]}, $maxDistance:1000000 } } })
.
To create a text based index on a key, use command db.collectionName.ensureIndex({'textColumnName': 'text'})
. After this index is applied, use the search commands to search for a word i.e. db.collectionName.find({$text: {$search:'your text here'}})
. There is a text score based on which the results are ranked, to see it project it in the score key like this : db.collectionName.find({$text: {$search:'your text here'}}, {score: {$meta: 'textScore'}}).sort({score: {$meta: 'textScore'}})
.
If we create a text index on the title
field of the movies
collection, and then perform the text search db.movies.find( { $text : { $search : "Big Lebowski" } } )
. The following documents will be returned, assuming they are in the movies collection:
{ "title" : "The Big Lebowski" , star: "Jeff Bridges" }
{ "title" : "Big" , star : "Tom Hanks" }
{ "title" : "Big Fish" , star: "Ewan McGregor" }
This is because, there will be a **logical OR**ing on Big
& Lebowski
.
Say, MongoDB
gets a query in. Assume that it has the following indexes:
b
,c
c
,b
d
,e
e
,f
a
,b
,c2
MongoDB
based on the shape of the query, index selection for executing the query is decided. Let’s say 1, 2 & 5 are selected. Then the execution plan is executed in parallel, with these indexes:
The one with fastest results for all plans or sorted threshold results wins. This index is cached and used for similar queries in future. This cached index is removed from the cache when the threshold writes happens on the collection, which is 1000 writes as of this writing. If indexes are rebuilt, then also this caching is restructured. Or if the mongod
process is restarted, this indexing information is removed from the cache.
Like for doing anything greater, designing indexes requires some forward thinking. The goal is:
Selectivity is the primary factor that determines how efficiently an index can be used. Ideally, the index enables us to select only those records required to complete the result set, without the need to scan a substantially larger number of index keys (or documents) in order to complete the query. Selectivity determines how many records any subsequent operations must work with. Fewer records means less execution time.
Think about what queries will be used most frequently by the application. Use explain
command and specifically see the executionStats
:
nReturned
totalKeysExamined
- if the number of keys examined very large than the returned documents? We need some index to reduce it.Look at queryPlanner
, rejectedPlans
. Look at winningPlan
which shows the keyPattern
which shows which keys needed to indexed. Whenever we see stage:SORT
, it means that the key to sort is not part of the index or the database was not able to sort documents based on the sort order specified in the database. And needed to perform in-memory sort. If we add the key based on which the sort happens, we will see that the winningPlan
’s’ stage
changes from SORT
to FETCH
. The keys in the index needs to be specified based on the range of the data for them. e.g.: the class will have lesser volume than student. Doing this needs us to have a trade-off. Although the executionTimeMillis
will be very less but the docsExamined
and keysExamined
will be relatively a little large. But this trade-off is worth making.
There is also a way to force queries to use a particular index but this is not recommended to be a part of deployment. The command in concern is the .hint()
which can be chained after find
or sort
for sorting etc. It requires the actual index name or the shape of the index.
In general, when building compound indexes for:
The following rules of thumb should we keep in mind:
MongoDB
automatically logs slow queries with 100+ ms execution.
MongoDB
has a sophisticated feature of profiling. The logging happens in system.profile
collection. The logs can be seen from db.system.profile.find()
. There are 3 logging levels:
To see what profiling level the database is running in, use db.getProfilingLevel()
and to see the status db.getProfilingStatus()
. To change the profiling status, use the command db.setProfilingLevel(level, milliseconds)
- where level
refers to the profiling level and milliseconds
is the ms of which duration the queries needs to be logged. To turn off the logging, use db.setProfilingLevel(0)
.
The query to look in the system profile collection for all queries that took longer than one second, ordered by timestamp descending will be db.system.profile.find( { millis : { $gt:1000 } } ).sort( { ts : -1 } )
.
For performance tuning, we have seen explain
, hint
and profile
options. But what if we want to look at the high level inside a program and figure out where it’s spending it’s time, how would we do that? We’ve Mongotop
, named after Unix
command top
. To see the log on the shell use command mongotop seconds
- where seconds
is the number of seconds after which the next log entry shall be printed.
e.g.: mongotop 3
Mongostat
is similar to iostat
command from the Unix
. It gives the database stats in the intervals of 1 seconds. It shows what operations happened during that 1 second. Output may vary depending upon what storage engine is being used. The results of it also shows getmore
is how many getmore
command we’re running every seconds. It’s the way we get more from a cursor, if we’re doing a query that has a large result. The getmore column concerns the number of requests per time interval to get additional data from a cursor. command
shows the number of commands running per second. flushes
refers to the number of times the disk is flushed out per second. mapped
is the amount of mapped memory. res
refers to the resident memory. faults
refers to the number of page faults that we’re causing every second. It’s an important number because page faults mean we’re getting more I/O and more I/O means slower database. qr
and qw
are the queues length for a number of sockets that are requested or waiting for read and write. ar
and aw
are the number of active readers and writers. netIn
& netOut
refers to the amount that was sent into and out of the database during this time frame. dirty
refers to the amount of cache WiredTiger
storage engine has written to and needs to be written back to the disk. used
is the total cache size we’re using.
Sharding is a technique of splitting up a large collection amongst multiple servers. When we shard, we deploy multiple mongod
servers. And in the front, mongos
which is a router. The application talks to this router. This router then talks to various servers, the mongod
s. The application and the mongos
are usually co-located on the same server. We can have multiple mongos
services running on the same machine. It’s also recommended to keep set of multiple mongod
s (together called replica set), instead of one single mongod
on each server. A replica set keeps the data in sync across several different instances so that if one of them goes down, we won’t lose any data. Logically, each replica set can be seen as a shard. It’s transparent to the application, the way MongoDB
chooses to shard is we choose a shard key.
Assume, for student
collection we have stdt_id
as the shard key or it could be a compound key. And the mongos
server, it’s a range based system. So based on the stdt_id
that we send as the shard key, it’ll send the request to the right mongod
instance.
So, what do we need to really know as a developer?
insert
must include a shard key, so if it’s a multi-parted shard key, we must include the entire shard keyupdate
, remove
, find
- if mongos
is not given a shard key - then it’s going to have to broadcast the request to all the different shards that cover the collection.update
- if we don’t specify the entire shard key, we have to make it a multi update so that it knows that it needs to broadcast itOriginally published at xameeramir.github.io.