Near two months ago I started seriously. At Growthfunnel.io we use MongoDB, and we need to scale our system for a large volume of data(approximately 6TB+) and high throughput. Sharding, database clustering it was all new to me, so I started learning. The purpose of this article is sharing and validating my knowledge with the community. I’m not an expert on any of this. I’m just sharing what have I have learned. learning MongoDB This tutorial explains step by step how to create a MongoDB sharded cluster. We will deploy this demo on a single machine. Prerequisites: MongoDB — 3.6.2 OpenSSL NodeJS Bash Basic knowledge of mongodb sharding What is database sharding anyway? Sharding is a process of splitting data across multiple machines that separate large database into smaller, faster, easily managed parts called data shards. The word shard means a small part of a whole cluster. Our sharding architecture Our sharded cluster will run on a single machine, each component will start on separate process & port. This cluster partitioned into three shards, each shard contains two data members and one arbitrary member. Each shard replica has: 1 Primary member 1 Secondary member 1 Arbitrary member Prepare the environment 1. Install MongoDB from . official documentation 2. Configure hostname Append this line into file; will be our database hostname. 127.0.0.1 database.fluddi.com /etc/hosts database.fluddi.com echo '127.0.0.1 ' | sudo tee --append /etc/hosts database.fluddi.com 3. Make sure data directory & Log directory have read and write permissions Create data directory and log directory and own these directories to your user for reading & writing. For my case my user and usergroup name is . vagrant mkdir -p /data/mongodb /var/log/mongodb/test-clustersudo chown -R vagrant:vagrant /data/mongodbsudo chown -R vagrant:vagrant /var/log/mongodb/test-cluster 4. Clone repo [mongodb-sample-cluster](https://github.com/joynal/mongodb-sample-cluster) _mongodb-sample-cluster - MongoDB sharded cluster_github.com joynal/mongodb-sample-cluster This repo contains configuration files for the cluster. git clone https://github.com/joynal/mongodb-sample-cluster directory contains cluster components configurations; You can customize for your needs. Make sure that your data directory & log directory have read & write permission. By default data directory pointed on & log directory pointed on . confs /data/mongodb/ /var/log/mongodb/test-cluster/ Generate self signed SSL certificate 1. Generate certificate authority Let’s generate a self-signed certificate for the sharded cluster, this is only for this demonstration. For production use, your MongoDB deployment should use valid certificates generated and signed by a single certificate authority. You or your organization can generate and maintain an independent certificate authority, or use certificates generated by a third-party SSL vendor. sudo mkdir -p /opt/mongodb Now own this directory, use your user and usergroup name. sudo chown -R vagrant:vagrant /opt/mongodb OK, let’s create a certificate authority. Generate a private key for CA certificate and keep it very safe. cd /opt/mongodb openssl genrsa -out CA.key 4096 Now self-sign to this certificate. openssl req -new -x509 -days 1825 -key CA.key -out CA.crt This will prompt for certificate information. 2. Generate certificate for cluster members Generate private key & CSR. openssl genrsa -out certificate.key 4096 certificate certificate openssl req -new -key .key -out .csr This will prompt for information, make sure domain name support wildcard domain. Now self sign it. openssl x509 -req -days 1825 -in Output will be something like this: Signature ok Getting CA Private Key subject=/C=BD/ST=Dhaka/L=Dhaka/O=Fluddi/OU=database/CN=*.fluddi.com/emailAddress=support@fluddi.com Create file. .pem cat 3. Generate client certificates Each client certificate must have a unique & different SAN from cluster member certificate. Otherwise, MongoDB will consider it as a cluster member. Each certificate belongs to a MongoDB x.509 user, . more details OK, let’s generate two certificates by following the previous step, just make sure is different. OU For Web app openssl genrsa -out client.key 4096openssl req -new -key client.key -out client.csr Everything will be same as member certificate only will be different. OU Organizational Unit Name (eg, section) []:webapp For Database admin openssl genrsa -out admin-client.key 4096openssl req -new -key admin-client.key -out admin-client.csr Everything will be same as member certificate only will be different. OU Organizational Unit Name (eg, section) []:appadmin Now change all files permission to read-only. cd /opt/mongodb/chmod 400 * Create the config server replica set Create data directories, replace with your actual DB path. $DB_PATH mkdir -p $DB_PATH/config/rs0 $DB_PATH/config/rs1 $DB_PATH/config/rs2 Change directory to code repo, you just have cloned. mongodb-sample-cluster cd ~/mongodb-sample-cluster Start all member of config server replica set. mongod --config ./confs/config/r0.confmongod --config ./confs/config/r1.confmongod --config ./confs/config/r2.conf Connect to one of the config servers. mongo --port 57040 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem Initiate the replica set. rs.initiate({_id: "cfg",configsvr: true,members: [{ _id : 0, host : "database.fluddi.com:57040" },{ _id : 1, host : "database.fluddi.com:57041" },{ _id : 2, host : "database.fluddi.com:57042" }]}) Create the shard replica sets Deploy shard 0 Create data directories for replica instances. mkdir -p /data/mongodb/shard0/rs0 /data/mongodb/shard0/rs1 /data/mongodb/shard0/rs2 Start each member of the shard replica set. mongod --config ./confs/shard0/r0.confmongod --config ./confs/shard0/r1.confmongod --config ./confs/shard0/r2.conf Connect to one member of the shard replica set. mongo --port 37017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem Initiate the replica set. rs.initiate({_id: "s0",members: [{ _id : 0, host : "database.fluddi.com:37017" },{ _id : 1, host : "database.fluddi.com:37018" },{ _id : 2, host : "database.fluddi.com:37019", arbiterOnly: true }]}) It will return something like: {"ok": 1} Deploy shard 1 Create data directories, replace with your actual db path. $DB_PATH mkdir -p $DB_PATH/shard1/rs0 $DB_PATH/shard1/rs1 $DB_PATH/shard1/rs2 Start each member of the shard replica set. mongod --config ./confs/shard1/r0.confmongod --config ./confs/shard1/r1.confmongod --config ./confs/shard1/r2.conf Connect to one member of the shard replica set. mongo --port 47017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem Initiate the replica set. rs.initiate({_id: "s1",members: [{ _id : 0, host : "database.fluddi.com:47017" },{ _id : 1, host : "database.fluddi.com:47018" },{ _id : 2, host : "database.fluddi.com:47019", arbiterOnly: true }]}) Deploy shard 2 Create data directories, replace with your actual db path. $DB_PATH mkdir -p $DB_PATH/shard2/rs0 $DB_PATH/shard2/rs1 $DB_PATH/shard2/rs2 Start each member of the shard replica set. mongod --config ./confs/shard2/r0.confmongod --config ./confs/shard2/r1.confmongod --config ./confs/shard2/r2.conf Connect to one member of the shard replica set. mongo --port 57017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem Initiate the replica set. rs.initiate({_id: "s2",members: [{ _id : 0, host : "database.fluddi.com:57017" },{ _id : 1, host : "database.fluddi.com:57018" },{ _id : 2, host : "database.fluddi.com:57019", arbiterOnly: true }]}) Connect a to the cluster mongos mongos --config ./confs/mongos/m1.conf View mongod & mongos processes. ps aux | grep mongo Now we are ready to add databases and shard collections. Add shards to the cluster Connect a mongo shell to the mongos. mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem 1. Create a admin user db.getSiblingDB("admin").createUser({user: "admin",pwd: "grw@123",roles: [{ role: "userAdminAnyDatabase", db: "admin" },{ role : "clusterAdmin", db : "admin" }]}) Lets authenticate, db.getSiblingDB("admin").auth("admin", "grw@123") 2. Add shard members sh.addShard("s0/database.fluddi.com:37017")sh.addShard("s1/database.fluddi.com:47017")sh.addShard("s2/database.fluddi.com:57017") 3. Add x509 user for webapp and administration db.getSiblingDB("$external").runCommand({createUser: " ,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",roles: [{ role : "clusterAdmin", db : "admin" },{ role: "dbOwner", db: "fluddi" },],writeConcern: { w: "majority" , wtimeout: 5000 }}) emailAddress=support@fluddi.com db.getSiblingDB("$external").runCommand({createUser: " ,CN=*.fluddi.com,OU=webapp,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",roles: [{ role: "readWrite", db: "fluddi" },],writeConcern: { w: "majority" , wtimeout: 5000 }}) emailAddress=support@fluddi.com Enable sharding for a database Before shard a collection, you must enable sharding for the collection’s database. Enabling sharding for a database does not redistribute data but make it possible to shard the collections in that database. Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores all data before sharding begins. Enable sharding on a database, in this demo I’m using the name fluddi sh.enableSharding("fluddi") Shard collection 1. Shard the collection You need to enable sharding on a per-collection basis. Determine what you will use for the shard key. Your selection of the shard key affects the efficiency of sharding. Now connect to mongos with a client certificate & authenticate. mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem Authenticate user, db.getSiblingDB("$external").auth( { mechanism: "MONGODB-X509", user: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD" } ) Create a collection in database. fluddi use fluddidb.createCollection("visitors") Create an index of visitors collection. db.visitors.ensureIndex({"siteId": 1, "_id": 1}) Let’s shard the collection. I’m choosing a compound key. sh.shardCollection("fluddi.visitors", {"siteId": 1, "_id": 1}) Check the cluster status. sh.status() Sample output: 2. Modify chunk size Make chunk size smaller for demonstration purpose. Otherwise, you will need to generate a large volume of data. That is only for demonstration purpose, don’t do this in production. use configdb.settings.save( { _id:"chunksize", value: 8 } ) Now chunk size will be 8MB. Generate some dummy data (Optional) Go to code directory, you cloned. mongodb-sample-cluster Configure .env file, follow .env.example Install packages, use or npm i yarn Run , this will generate 50000 visitor records node index.js Now again connect to mongos with a client certificate & authenticate. mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem Authenticate user, db.getSiblingDB("$external").auth( { mechanism: "MONGODB-X509", user: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD" } ) Check cluster status. sh.status() Sample output: Bind IP address If you deployed whole things on a remote machine and wanted to access the database from your computer or you want to connect a web app from another server, you need to bind IP address. Change to . Now mongos will listen on all the interfaces configured on your system. bindIp 0.0.0.0 Before you bind to other IP addresses, consider and other security measures listed in to prevent unauthorized access. enabling access control Security Checklist Terminology A group of mongod processes that maintain the same data set. Replica member accept writes. Pull and Replicates changes from Primary (via oplog). Replica set: Primary: Secondary: Thanks for reading. Hope you guys enjoyed this article and got the idea. On my I included a init script to automate whole sharding setup process, take a look at it. code repo , _mongodb-sample-cluster - MongoDB sharded cluster_github.com joynal/mongodb-sample-cluster Credits I borrowed some text from following links because it suits better than my version. http://searchcloudcomputing.techtarget.com/definition/sharding https://cloudmesh.github.io/introduction_to_cloud_computing/class/vc_sp15/mongodb_cluster.html