credit Storage Engines: Introduction Okey, let’s talk about sharding. Sharding is distributing queries across multiple servers. Before we do that, let’s get the idea of storage engines. New in 3.0 is that they offer pluggable storage engines. A storage engine is the interface between the persistent storage, which will call the disks which might be a solid state disk, and the database itself. database [MongoDB](https://hackernoon.com/tagged/mongodb) So, the database talks to the persistent storage through a storage engine. Now, it may be the case that the storage engine itself decides to use the memory to optimize the process. So, in other words disk is very slow. Since the idea of databases is to store stuff persistently, what happens is the storage engine has control of memory on the computer. And it can decide what to put in memory and what to take out of memory and what to persist to disk and when. So the database server itself defers the handling of the memory on the server as well as the disk itself to the store engine. provides pluggable storage engine where we can use more than one. With different storage engines, we get different performance characteristics. For , there’re 2 storage engines shipped with: MongoDB MongoDB What a storage engine is not? If we have a bunch of servers all running in a cluster, the storage engine doesn’t affect the communication between those different servers. The storage engine doesn’t affect the API that the database presents to the programmer. The storage engine directly determines the data file format and format of indexes MongoDB MongoDB Storage Engines: MMAPv1 We call it MMAPv1 — the original storage engine of because it internally uses the call under the covers in order to implement storage management. Let’s look at what the system call looks like. On Linux, it talks about memory allocation, or mapping files or devices into memory. Causes the pages starting address and continuing for at most length bytes to be mapped from the object described by the file descriptor at an offset. So, what does that really practically mean? MongoDB mmap MMAP Well, practically needs a place to put documents. And it puts the documents inside files. And to do that it initially allocates, let’s say a large file. Let’s say it allocates a 100GB file on disk. So, we wind up with 100GB file on disk. The disk may or may not be physically contiguous on the actual disk, because there are some algorithms that occur beneath that layer that control the actual allocation of space on a disk. But from our point, it’s a 100GB contiguous file. If calls system call, it can map this 100GB file into 100GB of virtual memory. To get this big virtual memory, we need to be on a x64 machine. And these are all . So pages on an OS or either 4k or 16k large. So, there is lot of them inside the 100GB virtual memory. And the operating system is going to decide what can fit in the memory. So, the actual physical memory of the box is let’s say 32GB, then if we go to access one of the pages in this memory space, it may not be in memory at any given time. The operating system decides which of these pages are going to be in memory. We’re showing the ones available in memory as green ones. So, when we go to read the document, if it hits a page that’s in memory, then we get it. If it hits a page that’s not in memory (the white ones), the OS has to bring it from the disk. MongoDB MongoDB mmap page-sized source MMAPv1 storage engine provides . Each collection inside is it’s own file (can be seen in ). If multiple s are fired on the same collection, one has to wait for another to finish. It’s a multiple reader. Only one write can happen at a time to a particular collection. Collection level concurrency (locking) MongoDB ~\data\db write . So, if a document is sitting here in one of the available (green) page and we do an update to it, then we’ll try to update it right in place. And if we can’t update it, then what we’ll do is we’ll mark it as a whole, and then we’ll move it somewhere else where there is some space. And finally we’ll update it there. In order to make it possible that we update the document in place without having to move it, we uses Allows in place updates when we allocate the initial storage for a document. So, if we try to create a 3bytes document, we’ll get 4bytes. 8bytes, if we create 7bytes. 32bytes when creating 19bytes document. In this way, it’s possible to grow the document a little bit. And that space that opens up, that we can re-use it more easily. Power of 2 sizes Also, notice that since, OS decides what is in memory and what is on disk — we cannot do much about it. The OS is smart enough for memory management. Storage Engines: WiredTiger For a lot of workloads, it’s faster. It provides . We don’t call it document level locking because it’s a lock free implementation which has an optimistic concurrency model where the storage assumes that two writes are not going to be the same document, and if they are to the same document, then one of those is unwound and has to try again and it’s invisible to the application. But we do get this document level concurrency versus collection level concurrency in the MMAP storage engine and that’s a huge win. document level concurrency of both indexes and data. It’s because WiredTiger is maging it’s own storage. Support for compression — if we update, it’s marked as and a new space allocation is done on disk and written there. Eventually, the vacant space is reclaimed. To invoke a WiredTiger storage engine, we would use . No inplace updates no longer user mongod --storageEngine wiredTiger Indexes An index is an ordered set of things. An index in the database has ordered column(s) with their references (_ids or memory location). Ordered things are faster to search for. This is called b-tree indexes. The indexes works from the left most side. Let’s assume that the index is on 3 columns: , , . We can search rapidly from by A B C left side , , A B C , A B A But to search from the right side is not supported by index. The following will not be speeded: C B etcetra Indexing is not free because whenever we change anything in the document which affects the index, we’re going to have to update the index. It needs to be written on memory and finally on disk. Also, it’s nice to know that indexing slows down the writes. Yes, the reads are faster. Creating indexes also takes time even on fast computers, because we have to scan the entire collection. Create new data structures and write them all to disk. To create a compound index use command . Where means ascending, -1 means descending. To drop the index, use . Compound indexes cannot have both the columns with array types. If we try to do this, will tell that it can’t index parallel arrays and the reason is that there’s an explosion of index points that create it because it has to create index points for the cartesian product of the items in the array and it doesn’t permit that. are applied when we have arrays inside documents on indexed keys. To get the currently available indexes, use command . The nested keys can also be indexed, e.g.: . Where, is a sub-key inside key . db.collectionName.createIndex({A: 1, B: -1}) 1 db.collectionName.dropIndex({A: 1, B: -1}) MongoDB Multikey indexes db.collectionName.getIndexes() db.collectionName.createIndex({'A.aa': 1}) aa A Index Creation Option, Unique Let’s create a collection with duplicate entries: db.collectionName.insertOne(thing: 'apple'); db.collectionName.insertOne(thing: 'pear'); db.collectionName.insertOne(thing: 'apple'); To make sure that the entries are unique, we can apply a unique index. To create a unique index, use . This command will give an error because there’re multiple apples in the . To remove one, use command . db.collectionName.createIndex({thing:1}, {unique:true}) collectionName db.collectionName.remove( {thing:'apple'}, {justOne: true} ) After this index is in place, any attempts for duplicate inserts will result in . To see whether the index is indeed created, use command . One interesting thing to notice is that the index on is shown as , although it is. duplicate key error db.collectionName.getIndexes() _id unique Index Creation, Sparse {a:1, b:5, c:2} {a:8, b:15, c:7} {a:4, b:7} {a:3, b:10} Let’s assume that we wish to create an index on the above documents. Creating index on & will not be a problem. But what if we need to create an index on . The unique constraint will not work for keys because for 2 documents. The solution in this case is to use option. This option tells the database to not include the documents which misses the key. The command in concern is . The sparse index lets us use less space as well. a b c c null value is duplicated sparse db.collectionName.createIndex({thing:1}, {unique:true, sparse:true}) Notice that even if we have a _sparse_ index, the database performs all documents scan especially when doing sort. This can be seen in the winning plan section of _explain_ ’s result. Index Creation, Background There are multiple indexing options — foreground (default) & background. Foreground is relatively fast and it blocks all writers and readers. Other databases we can still get to. This is supposed to be not done in a production environment. Background index creation is a bit slower and they don’t block readers and writers. With 2.4 and later, you can create multiple background indexes in parallel even on the same database. MongoDB Beginning in 2.6, creating an index in the background on the primary will cause the indexes to be created in the background on secondaries, as well. The secondaries will begin index creation when the primary completes building its index. MongoDB There is another way to create index very efficiently in a production system. That is to create an index on a different server that is being used to serve most queries. Say, in a replica set of multiple database servers working in tandem, one can be taken out and the requests can be routed to the available ones. The foreground index creation can be done on the separated server. After the creation is done successfully, it can be brought back to the cluster. Using Explain The command shows what action the database takes in executing a query. It returns an explainable object. What indexes it uses, how many documents are scanned. It does almost everything similar to what the actual query would do, but it’s not the entire simulation of the query. This command can be used bith from a shell as well as from the driver. It can be used in multiple cases: explain db.collectionName.explain().find() db.collectionName.explain().update() db.collectionName.explain().remove() db.collectionName.explain().aggregate() But the following is not allowed: db.collectionName.explain().insert() db.collectionName.remove({a:1, b:2}).exlain() To find out which functions can be used in conjunction with , use . In previous versions, it was possible to append this command at last on a cursor (remember that returns a cursor). e.g.: . This was changed to be pre-pended by an explainable object’s function. The reason is not all methods return a cursor. e.g.: will return a scalar. explain db.collectionName.explain().help() find() db.collectionName.find().explain() count() Explain: Verbosity The command can run in 3 modes: explain - tells mostly what the database would use in terms of indexes queryPlanner - includes the mode, also includes what the result of using the indexes will be executionStats queryPlanner - includes the & mode. It does what the query optimizer does periodically is it runs all the possible indexes that could be used for a particular shape of query, and it runs them in parallel. Then makes a decision about which one is fastest. allPlansExecution queryPlanner executionStats var exp = db.collectionName.explain("executionStats"); exp.find(); One thing to understand when studying the performance stats is to see the number of returned documents ( ) and number of documents examined ( ). The following data says that The query scanned 999,999 documents, returning 10,000 in 619 milliseconds. nReturned docsExamined > exp = db.example.explain("executionStats")Explainable(test.example)> exp.find( { a : 7 } ){"queryPlanner" : {"plannerVersion" : 1,"namespace" : "test.example","indexFilterSet" : false,"parsedQuery" : {"a" : {"$eq" : 7}},"winningPlan" : {"stage" : "COLLSCAN","filter" : {"a" : {"$eq" : 7}},"direction" : "forward"},"rejectedPlans" : [ ]},"executionStats" : {"executionSuccess" : true,"nReturned" : 10000,"executionTimeMillis" : 619,"totalKeysExamined" : 0,"totalDocsExamined" : 999999,"executionStages" : {"stage" : "COLLSCAN","filter" : {"a" : {"$eq" : 7}},"nReturned" : 10000,"executionTimeMillisEstimate" : 520,"works" : 1000001,"advanced" : 10000,"needTime" : 990000,"needFetch" : 0,"saveState" : 7812,"restoreState" : 7812,"isEOF" : 1,"invalidates" : 0,"direction" : "forward","docsExamined" : 999999}},"serverInfo" : {"host" : "cross-mb-air.local","port" : 27017,"version" : "3.0.1","gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"},"ok" : 1} Covered Queries Covered query is a query, which can entirely be satisfied by an index. Hence, to satisfy the query. This makes the query a lot faster. All the projection keys needs to be indexed. zero documents needs to be inspected How Large is the Index? Like any other database, it’s important to keep in the . Working set is the data that is frequently being access by the client. This is especially true for indexes. If the index is in the disk, taking it from there for every query is very expensive. Estimating the amount of memory we’ll need for a deployment. This can be found by using command , below is the output in concern: “working set” memory MongoDB db.collectionName.stats() This output changes from one database engine to another. There’s also a shortcut which gives the required size detail. With WiredTiger storage engine, the sizes of the indexes is much smaller because it supports a technology called prefix compression. But this compression comes at the cost of CPU. The moral of the story is that, we need to always make sure that the index fits into the working set and which in turn needs to fit in the memory. db.collectionName.totalIndexSize() Number of Index Entries it refers to the number of index points for each different type of index that supports. Index cardinality: MongoDB Regular — for every single key that we put in the index, there’s certainly going to be an index point. And in addition, if there is no key, then there’s going to be an index point under the null entry. We get 1:1 relative to the number of documents in the collection in terms of index cardinality. That makes the index a certain size. It’s proportional to the collection size in terms of it’s end pointers to documents Sparse — when a document is missing a key being indexed, it’s not in the index because it’s a null and we don’t keep nulls in the index for a sparse index. We’re going to have index points that could be potentially less than or equal to the number of documents. Multikey — this is an index on array values. There’ll be multiple index points (for each element of the array) for each document. So, it’ll be greater than the number of documents. Let’s say you update a document with a key called tags and that update causes the document to need to get moved on disk. Assume you are using the MMAPv1 storage engine. If the document has 100 tags in it, and if the tags array is indexed with a multikey index, 100 index points need to be updated in the index to accommodate the move? Geospatial Indexes These indexes allow us to find things based on location. We’ve 2 options: & In 2-D, we’ve a cartesian plane with & coordinates and a bunch of different objects. 2D 3D x y The document needs to store some sort of , location with . The type option specifies the direction of the index i.e. ascending or descending (and it’s optional). It can be a compound index. To find near locations use command . And practically speaking, the way this is often used is through a . To limit the results to 20 append . x y createIndex({'location':'2d', 'type':1}) db.collectionName.find({location: {$near:[x,y]}}) limit .limit(20) Geospatial Spherical The location on the globe are represented using longitude and latitude. Latitude is how far off the equator is. Equator is at zero degrees latitude. We can go from to . A special type of index called 2d sphere. To specify location of things, we use . The locations can be described using a point or a polygon. The & are reserved words. The values for are reserved words. For this type of data, the indexing will be done using -90 90 GeoJSON type coordinates type createIndex({'location':'2dsphere', 'type':1}) For collection named “stores” to return the stores that are within 1,000,000 meters of the location latitude=39, longitude=-130? Assuming the stores collection has a 2dsphere index on “loc”. Each store record looks like this: {"_id":{"$oid":"535471aaf28b4d8ee1e1c86f"},"store_id":8,"loc":{"type":"Point","coordinates":[-37.47891236119904,4.488667018711567]}} The query will be . db.stores.find({ loc:{ $near: { $geometry: { type: "Point", coordinates: [-130, 39]}, $maxDistance:1000000 } } }) Text Indexes To create a text based index on a key, use command . After this index is applied, use the search commands to search for a word i.e. . There is a text score based on which the results are ranked, to see it project it in the score key like this : . db.collectionName.ensureIndex({'textColumnName': 'text'}) db.collectionName.find({$text: {$search:'your text here'}}) db.collectionName.find({$text: {$search:'your text here'}}, {score: {$meta: 'textScore'}}).sort({score: {$meta: 'textScore'}}) If we create a text index on the field of the collection, and then perform the text search . The following documents will be returned, assuming they are in the movies collection: title movies db.movies.find( { $text : { $search : "Big Lebowski" } } ) { "title" : "The Big Lebowski" , star: "Jeff Bridges" } { "title" : "Big" , star : "Tom Hanks" } { "title" : "Big Fish" , star: "Ewan McGregor" } This is because, there will be a ** **ing on & . logical OR Big Lebowski When is an Index Used? Say, gets a query in. Assume that it has the following indexes: MongoDB , b c , c b , d e , e f , , a b c2 based on the shape of the query, index selection for executing the query is decided. Let’s say , & are selected. Then the execution plan is executed in parallel, with these indexes: MongoDB 1 2 5 The one with fastest results for all plans or sorted threshold results wins. This index is cached and used for similar queries in future. This cached index is removed from the cache when the threshold writes happens on the collection, which is 1000 writes as of this writing. If indexes are rebuilt, then also this caching is restructured. Or if the process is restarted, this indexing information is removed from the cache. mongod Designing / using Indexes Like for doing anything greater, designing indexes requires some forward thinking. The goal is: Efficiency — fast read / write operations Selectivity — minimize records scanning Other requirements — e.g. how are sorts handled? Selectivity is the primary factor that determines how efficiently an index can be used. Ideally, the index enables us to select only those records required to complete the result set, without the need to scan a substantially larger number of index keys (or documents) in order to complete the query. Selectivity determines how many records any subsequent operations must work with. Fewer records means less execution time. Think about what queries will be used most frequently by the application. Use command and specifically see the : explain executionStats nReturned - if the number of keys examined very large than the returned documents? We need some index to reduce it. totalKeysExamined Look at , . Look at which shows the which shows which keys needed to indexed. Whenever we see , it means that the key to sort is not part of the index or the database was not able to sort documents based on the sort order specified in the database. And needed to perform in-memory sort. If we add the key based on which the sort happens, we will see that the ’s’ changes from to . The keys in the index needs to be specified based on the range of the data for them. e.g.: the will have lesser volume than . Doing this needs us to have a trade-off. Although the will be very less but the and will be relatively a large. But this trade-off is worth making. queryPlanner rejectedPlans winningPlan keyPattern stage:SORT winningPlan stage SORT FETCH class student executionTimeMillis docsExamined keysExamined little There is also a way to force queries to use a particular index but this is . The command in concern is the which can be chained after or for sorting etc. It requires the actual index name or the shape of the index. not recommended to be a part of deployment .hint() find sort In general, when building compound indexes for: : field on which queries will perform an equality test equality field : field on which queries will specify a sort sort field : field on which queries perform a range test range field The following rules of thumb should we keep in mind: Equality fields before range fields Sort fields before range fields Equality fields before sort fields automatically logs slow queries with 100+ ms execution. MongoDB Profiling has a sophisticated feature of profiling. The logging happens in collection. The logs can be seen from . There are 3 logging levels: MongoDB system.profile db.system.profile.find() Level 0 — no logging Level 1 — logs slow queries Level 2 — log all queries, this is much of a debugging feature rather than a performance debugging option To see what profiling level the database is running in, use and to see the status . To change the profiling status, use the command - where refers to the profiling level and is the ms of which duration the queries needs to be logged. To turn off the logging, use . db.getProfilingLevel() db.getProfilingStatus() db.setProfilingLevel(level, milliseconds) level milliseconds db.setProfilingLevel(0) The query to look in the system profile collection for all queries that took longer than one second, ordered by timestamp descending will be . db.system.profile.find( { millis : { $gt:1000 } } ).sort( { ts : -1 } ) Mongotop For performance tuning, we have seen , and options. But what if we want to look at the high level inside a program and figure out where it’s spending it’s time, how would we do that? We’ve , named after command . To see the log on the shell use command - where is the number of seconds after which the next log entry shall be printed. explain hint profile Mongotop Unix top mongotop seconds seconds e.g.: mongotop 3 Mongostat is similar to command from the . It gives the database stats in the intervals of 1 seconds. It shows what operations happened during that 1 second. Output may vary depending upon what storage engine is being used. The results of it also shows is how many command we’re running every seconds. It’s the way we get more from a cursor, if we’re doing a query that has a large result. The getmore column concerns the number of requests per time interval to get additional data from a cursor. shows the number of commands running per second. refers to the number of times the disk is flushed out per second. is the amount of mapped memory. refers to the resident memory. refers to the number of page faults that we’re causing every second. It’s an important number because page faults mean we’re getting more I/O and more I/O means slower database. and are the queues length for a number of sockets that are requested or waiting for read and write. and are the number of active readers and writers. & refers to the amount that was sent into and out of the database during this time frame. refers to the amount of cache storage engine has written to and needs to be written back to the disk. is the total cache size we’re using. Mongostat iostat Unix getmore getmore command flushes mapped res faults qr qw ar aw netIn netOut dirty WiredTiger used Sharding Sharding is a technique of splitting up a large collection amongst multiple servers. When we shard, we deploy multiple servers. And in the front, which is a router. The application talks to this router. This router then talks to various servers, the s. The application and the are usually co-located on the same server. We can have multiple services running on the same machine. It’s also recommended to keep set of multiple s (together called ), instead of one single on each server. A replica set keeps the data in sync across several different instances so that if one of them goes down, we won’t lose any data. Logically, each replica set can be seen as a . It’s transparent to the application, the way chooses to shard is we choose a . mongod mongos mongod mongos mongos mongod replica set mongod shard MongoDB shard key Assume, for collection we have as the shard key or it could be a compound key. And the server, it’s a range based system. So based on the that we send as the shard key, it’ll send the request to the right instance. student stdt_id mongos stdt_id mongod So, what do we need to really know as a developer? must include a shard key, so if it’s a multi-parted shard key, we must include the entire shard key insert we’ve to understand what the shard key is on collection itself for an , , - if is not given a shard key - then it’s going to have to broadcast the request to all the different shards that cover the collection. update remove find mongos for an - if we don’t specify the entire shard key, we have to make it a multi update so that it knows that it needs to broadcast it update Photos Originally published at xameeramir.github.io .