Near two months ago I started learning MongoDB seriously. At Growthfunnel.io we use MongoDB, and we need to scale our system for a large volume of data(approximately 6TB+) and high throughput. Sharding, database clustering it was all new to me, so I started learning. The purpose of this article is sharing and validating my knowledge with the community. I’m not an expert on any of this. I’m just sharing what have I have learned.
This tutorial explains step by step how to create a MongoDB sharded cluster. We will deploy this demo on a single machine.
Sharding is a process of splitting data across multiple machines that separate large database into smaller, faster, easily managed parts called data shards. The word shard means a small part of a whole cluster.
Our sharded cluster will run on a single machine, each component will start on separate process & port. This cluster partitioned into three shards, each shard contains two data members and one arbitrary member. Each shard replica has:
Append this line 127.0.0.1 database.fluddi.com
into /etc/hosts
file; database.fluddi.com
will be our database hostname.
echo '127.0.0.1 database.fluddi.com
' | sudo tee --append /etc/hosts
Create data directory and log directory and own these directories to your user for reading & writing. For my case my user and usergroup name is vagrant
.
mkdir -p /data/mongodb /var/log/mongodb/test-clustersudo chown -R vagrant:vagrant /data/mongodbsudo chown -R vagrant:vagrant /var/log/mongodb/test-cluster
[mongodb-sample-cluster](https://github.com/joynal/mongodb-sample-cluster)
repo
joynal/mongodb-sample-cluster_mongodb-sample-cluster - MongoDB sharded cluster_github.com
This repo contains configuration files for the cluster.
git clone https://github.com/joynal/mongodb-sample-cluster
confs
directory contains cluster components configurations; You can customize for your needs. Make sure that your data directory & log directory have read & write permission. By default data directory pointed on /data/mongodb/
& log directory pointed on /var/log/mongodb/test-cluster/
.
Let’s generate a self-signed certificate for the sharded cluster, this is only for this demonstration. For production use, your MongoDB deployment should use valid certificates generated and signed by a single certificate authority. You or your organization can generate and maintain an independent certificate authority, or use certificates generated by a third-party SSL vendor.
sudo mkdir -p /opt/mongodb
Now own this directory, use your user and usergroup name.
sudo chown -R vagrant:vagrant /opt/mongodb
OK, let’s create a certificate authority. Generate a private key for CA certificate and keep it very safe.
cd /opt/mongodbopenssl genrsa -out CA.key 4096
Now self-sign to this certificate.
openssl req -new -x509 -days 1825 -key CA.key -out CA.crt
This will prompt for certificate information.
Generate private key & CSR.
openssl genrsa -out certificate.key 4096openssl req -new -key
certificate.key -out
certificate.csr
This will prompt for information, make sure domain name support wildcard domain.
Now self sign it.
openssl x509 -req -days 1825 -in
Output will be something like this:
Signature oksubject=/C=BD/ST=Dhaka/L=Dhaka/O=Fluddi/OU=database/CN=*.fluddi.com/[email protected]Getting CA Private Key
Create .pem
file.
cat
Each client certificate must have a unique & different SAN from cluster member certificate. Otherwise, MongoDB will consider it as a cluster member. Each certificate belongs to a MongoDB x.509 user, more details.
OK, let’s generate two certificates by following the previous step, just make sure OU is different.
For Web app
openssl genrsa -out client.key 4096openssl req -new -key client.key -out client.csr
Everything will be same as member certificate only OU will be different.
Organizational Unit Name (eg, section) []:webapp
For Database admin
openssl genrsa -out admin-client.key 4096openssl req -new -key admin-client.key -out admin-client.csr
Everything will be same as member certificate only OU will be different.
Organizational Unit Name (eg, section) []:appadmin
Now change all files permission to read-only.
cd /opt/mongodb/chmod 400 *
Create data directories, replace $DB_PATH
with your actual DB path.
mkdir -p $DB_PATH/config/rs0 $DB_PATH/config/rs1 $DB_PATH/config/rs2
Change directory to mongodb-sample-cluster
code repo, you just have cloned.
cd ~/mongodb-sample-cluster
Start all member of config server replica set.
mongod --config ./confs/config/r0.confmongod --config ./confs/config/r1.confmongod --config ./confs/config/r2.conf
Connect to one of the config servers.
mongo --port 57040 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem
Initiate the replica set.
rs.initiate({_id: "cfg",configsvr: true,members: [{ _id : 0, host : "database.fluddi.com:57040" },{ _id : 1, host : "database.fluddi.com:57041" },{ _id : 2, host : "database.fluddi.com:57042" }]})
Create data directories for replica instances.
mkdir -p /data/mongodb/shard0/rs0 /data/mongodb/shard0/rs1 /data/mongodb/shard0/rs2
Start each member of the shard replica set.
mongod --config ./confs/shard0/r0.confmongod --config ./confs/shard0/r1.confmongod --config ./confs/shard0/r2.conf
Connect to one member of the shard replica set.
mongo --port 37017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem
Initiate the replica set.
rs.initiate({_id: "s0",members: [{ _id : 0, host : "database.fluddi.com:37017" },{ _id : 1, host : "database.fluddi.com:37018" },{ _id : 2, host : "database.fluddi.com:37019", arbiterOnly: true }]})
It will return something like:
{"ok": 1}
Create data directories, replace $DB_PATH
with your actual db path.
mkdir -p $DB_PATH/shard1/rs0 $DB_PATH/shard1/rs1 $DB_PATH/shard1/rs2
Start each member of the shard replica set.
mongod --config ./confs/shard1/r0.confmongod --config ./confs/shard1/r1.confmongod --config ./confs/shard1/r2.conf
Connect to one member of the shard replica set.
mongo --port 47017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem
Initiate the replica set.
rs.initiate({_id: "s1",members: [{ _id : 0, host : "database.fluddi.com:47017" },{ _id : 1, host : "database.fluddi.com:47018" },{ _id : 2, host : "database.fluddi.com:47019", arbiterOnly: true }]})
Create data directories, replace $DB_PATH
with your actual db path.
mkdir -p $DB_PATH/shard2/rs0 $DB_PATH/shard2/rs1 $DB_PATH/shard2/rs2
Start each member of the shard replica set.
mongod --config ./confs/shard2/r0.confmongod --config ./confs/shard2/r1.confmongod --config ./confs/shard2/r2.conf
Connect to one member of the shard replica set.
mongo --port 57017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem
Initiate the replica set.
rs.initiate({_id: "s2",members: [{ _id : 0, host : "database.fluddi.com:57017" },{ _id : 1, host : "database.fluddi.com:57018" },{ _id : 2, host : "database.fluddi.com:57019", arbiterOnly: true }]})
mongos
to the clustermongos --config ./confs/mongos/m1.conf
View mongod & mongos processes.
ps aux | grep mongo
Now we are ready to add databases and shard collections.
Connect a mongo shell to the mongos.
mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem
db.getSiblingDB("admin").createUser({user: "admin",pwd: "grw@123",roles: [{ role: "userAdminAnyDatabase", db: "admin" },{ role : "clusterAdmin", db : "admin" }]})
Lets authenticate,
db.getSiblingDB("admin").auth("admin", "grw@123")
sh.addShard("s0/database.fluddi.com:37017")sh.addShard("s1/database.fluddi.com:47017")sh.addShard("s2/database.fluddi.com:57017")
db.getSiblingDB("$external").runCommand({createUser: "[email protected],CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",roles: [{ role : "clusterAdmin", db : "admin" },{ role: "dbOwner", db: "fluddi" },],writeConcern: { w: "majority" , wtimeout: 5000 }})
db.getSiblingDB("$external").runCommand({createUser: "[email protected],CN=*.fluddi.com,OU=webapp,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",roles: [{ role: "readWrite", db: "fluddi" },],writeConcern: { w: "majority" , wtimeout: 5000 }})
Before shard a collection, you must enable sharding for the collection’s database. Enabling sharding for a database does not redistribute data but make it possible to shard the collections in that database.
Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores all data before sharding begins.
Enable sharding on a database, in this demo I’m using the namefluddi
sh.enableSharding("fluddi")
You need to enable sharding on a per-collection basis. Determine what you will use for the shard key. Your selection of the shard key affects the efficiency of sharding.
Now connect to mongos with a client certificate & authenticate.
mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem
Authenticate user,
db.getSiblingDB("$external").auth( { mechanism: "MONGODB-X509", user: "[email protected],CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD" } )
Create a collection in fluddi
database.
use fluddidb.createCollection("visitors")
Create an index of visitors collection.
db.visitors.ensureIndex({"siteId": 1, "_id": 1})
Let’s shard the collection. I’m choosing a compound key.
sh.shardCollection("fluddi.visitors", {"siteId": 1, "_id": 1})
Check the cluster status.
sh.status()
Sample output:
Make chunk size smaller for demonstration purpose. Otherwise, you will need to generate a large volume of data. That is only for demonstration purpose, don’t do this in production.
use configdb.settings.save( { _id:"chunksize", value: 8 } )
Now chunk size will be 8MB.
Go to mongodb-sample-cluster
code directory, you cloned.
npm i
or yarn
node index.js
, this will generate 50000 visitor recordsNow again connect to mongos with a client certificate & authenticate.
mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem
Authenticate user,
db.getSiblingDB("$external").auth( { mechanism: "MONGODB-X509", user: "[email protected],CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD" } )
Check cluster status.
sh.status()
Sample output:
If you deployed whole things on a remote machine and wanted to access the database from your computer or you want to connect a web app from another server, you need to bind IP address. Change bindIp
to 0.0.0.0
. Now mongos will listen on all the interfaces configured on your system.
Before you bind to other IP addresses, consider enabling access control and other security measures listed in Security Checklist to prevent unauthorized access.
Replica set: A group of mongod processes that maintain the same data set.Primary: Replica member accept writes.Secondary: Pull and Replicates changes from Primary (via oplog).
Thanks for reading. Hope you guys enjoyed this article and got the idea. On my code repo, I included a init script to automate whole sharding setup process, take a look at it.
joynal/mongodb-sample-cluster_mongodb-sample-cluster - MongoDB sharded cluster_github.com
I borrowed some text from following links because it suits better than my version.