Joynal Abedin

@joynaluu

Create a MongoDB sharded cluster with SSL enabled

Near two months ago I started learning MongoDB seriously. At Growthfunnel.io we use MongoDB, and we need to scale our system for a large volume of data(approximately 6TB+) and high throughput. Sharding, database clustering it was all new to me, so I started learning. The purpose of this article is sharing and validating my knowledge with the community. I’m not an expert on any of this. I’m just sharing what have I have learned.

This tutorial explains step by step how to create a MongoDB sharded cluster. We will deploy this demo on a single machine.

Prerequisites:

  • MongoDB — 3.6.2
  • OpenSSL
  • NodeJS
  • Bash
  • Basic knowledge of mongodb sharding

What is database sharding anyway?

Sharding is a process of splitting data across multiple machines that separate large database into smaller, faster, easily managed parts called data shards. The word shard means a small part of a whole cluster.

Our sharding architecture

Our sharded cluster will run on a single machine, each component will start on separate process & port. This cluster partitioned into three shards, each shard contains two data members and one arbitrary member. Each shard replica has:

  • 1 Primary member
  • 1 Secondary member
  • 1 Arbitrary member

Prepare the environment

1. Install MongoDB from official documentation.

2. Configure hostname

Append this line 127.0.0.1 database.fluddi.com into /etc/hosts file; database.fluddi.com will be our database hostname.

echo '127.0.0.1 database.fluddi.com' | sudo tee --append /etc/hosts

3. Make sure data directory & Log directory have read and write permissions

Create data directory and log directory and own these directories to your user for reading & writing. For my case my user and usergroup name is vagrant.

mkdir -p /data/mongodb /var/log/mongodb/test-cluster
sudo chown -R vagrant:vagrant /data/mongodb
sudo chown -R vagrant:vagrant /var/log/mongodb/test-cluster

4. Clone mongodb-sample-cluster repo

This repo contains configuration files for the cluster.

git clone https://github.com/joynal/mongodb-sample-cluster

confs directory contains cluster components configurations; You can customize for your needs. Make sure that your data directory & log directory have read & write permission. By default data directory pointed on /data/mongodb/ & log directory pointed on /var/log/mongodb/test-cluster/.

Generate self signed SSL certificate

1. Generate certificate authority

Let’s generate a self-signed certificate for the sharded cluster, this is only for this demonstration. For production use, your MongoDB deployment should use valid certificates generated and signed by a single certificate authority. You or your organization can generate and maintain an independent certificate authority, or use certificates generated by a third-party SSL vendor.

sudo mkdir -p /opt/mongodb

Now own this directory, use your user and usergroup name.

sudo chown -R vagrant:vagrant /opt/mongodb

OK, let’s create a certificate authority. Generate a private key for CA certificate and keep it very safe.

cd /opt/mongodb
openssl genrsa -out CA.key 4096

Now self-sign to this certificate.

openssl req -new -x509 -days 1825 -key CA.key -out CA.crt

This will prompt for certificate information.

2. Generate certificate for cluster members

Generate private key & CSR.

openssl genrsa -out certificate.key 4096
openssl req -new -key certificate.key -out certificate.csr

This will prompt for information, make sure domain name support wildcard domain.

Now self sign it.

openssl x509 -req -days 1825 -in certificate.csr -CA CA.crt -CAkey CA.key -set_serial 01 -out certificate.crt

Output will be something like this:

Signature ok
subject=/C=BD/ST=Dhaka/L=Dhaka/O=Fluddi/OU=database/CN=*.fluddi.com/emailAddress=support@fluddi.com
Getting CA Private Key

Create .pem file.

cat certificate.key certificate.crt > certificate.pem

3. Generate client certificates

Each client certificate must have a unique & different SAN from cluster member certificate. Otherwise, MongoDB will consider it as a cluster member. Each certificate belongs to a MongoDB x.509 user, more details.

OK, let’s generate two certificates by following the previous step, just make sure OU is different.

  • For Web app
openssl genrsa -out client.key 4096
openssl req -new -key client.key -out client.csr

Everything will be same as member certificate only OU will be different.

Organizational Unit Name (eg, section) []:webapp
  • For Database admin
openssl genrsa -out admin-client.key 4096
openssl req -new -key admin-client.key -out admin-client.csr

Everything will be same as member certificate only OU will be different.

Organizational Unit Name (eg, section) []:appadmin

Now change all files permission to read-only.

cd /opt/mongodb/
chmod 400 *

Create the config server replica set

Create data directories, replace $DB_PATH with your actual DB path.

mkdir -p $DB_PATH/config/rs0 $DB_PATH/config/rs1 $DB_PATH/config/rs2

Change directory to mongodb-sample-cluster code repo, you just have cloned.

cd ~/mongodb-sample-cluster

Start all member of config server replica set.

mongod --config ./confs/config/r0.conf
mongod --config ./confs/config/r1.conf
mongod --config ./confs/config/r2.conf

Connect to one of the config servers.

mongo --port 57040 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem

Initiate the replica set.

rs.initiate({
_id: "cfg",
configsvr: true,
members: [
{ _id : 0, host : "database.fluddi.com:57040" },
{ _id : 1, host : "database.fluddi.com:57041" },
{ _id : 2, host : "database.fluddi.com:57042" }
]
})

Create the shard replica sets

Deploy shard 0

Create data directories for replica instances.

mkdir -p /data/mongodb/shard0/rs0 /data/mongodb/shard0/rs1 /data/mongodb/shard0/rs2

Start each member of the shard replica set.

mongod --config ./confs/shard0/r0.conf
mongod --config ./confs/shard0/r1.conf
mongod --config ./confs/shard0/r2.conf

Connect to one member of the shard replica set.

mongo --port 37017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem

Initiate the replica set.

rs.initiate({
_id: "s0",
members: [
{ _id : 0, host : "database.fluddi.com:37017" },
{ _id : 1, host : "database.fluddi.com:37018" },
{ _id : 2, host : "database.fluddi.com:37019", arbiterOnly: true }
]
})

It will return something like:

{"ok": 1}

Deploy shard 1

Create data directories, replace $DB_PATH with your actual db path.

mkdir -p $DB_PATH/shard1/rs0 $DB_PATH/shard1/rs1 $DB_PATH/shard1/rs2

Start each member of the shard replica set.

mongod --config ./confs/shard1/r0.conf
mongod --config ./confs/shard1/r1.conf
mongod --config ./confs/shard1/r2.conf

Connect to one member of the shard replica set.

mongo --port 47017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem

Initiate the replica set.

rs.initiate({
_id: "s1",
members: [
{ _id : 0, host : "database.fluddi.com:47017" },
{ _id : 1, host : "database.fluddi.com:47018" },
{ _id : 2, host : "database.fluddi.com:47019", arbiterOnly: true }
]
})

Deploy shard 2

Create data directories, replace $DB_PATH with your actual db path.

mkdir -p $DB_PATH/shard2/rs0 $DB_PATH/shard2/rs1 $DB_PATH/shard2/rs2

Start each member of the shard replica set.

mongod --config ./confs/shard2/r0.conf
mongod --config ./confs/shard2/r1.conf
mongod --config ./confs/shard2/r2.conf

Connect to one member of the shard replica set.

mongo --port 57017 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem

Initiate the replica set.

rs.initiate({
_id: "s2",
members: [
{ _id : 0, host : "database.fluddi.com:57017" },
{ _id : 1, host : "database.fluddi.com:57018" },
{ _id : 2, host : "database.fluddi.com:57019", arbiterOnly: true }
]
})

Connect a mongos to the cluster

mongos --config ./confs/mongos/m1.conf

View mongod & mongos processes.

ps aux | grep mongo

Now we are ready to add databases and shard collections.

Add shards to the cluster

Connect a mongo shell to the mongos.

mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/certificate.pem --sslCAFile /opt/mongodb/CA.pem

1. Create a admin user

db.getSiblingDB("admin").createUser(
{
user: "admin",
pwd: "grw@123",
roles: [
{ role: "userAdminAnyDatabase", db: "admin" },
{ role : "clusterAdmin", db : "admin" }
]
}
)

Lets authenticate,

db.getSiblingDB("admin").auth("admin", "grw@123")

2. Add shard members

sh.addShard("s0/database.fluddi.com:37017")
sh.addShard("s1/database.fluddi.com:47017")
sh.addShard("s2/database.fluddi.com:57017")

3. Add x509 user for webapp and administration

db.getSiblingDB("$external").runCommand(
{
createUser: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",
roles: [
{ role : "clusterAdmin", db : "admin" },
{ role: "dbOwner", db: "fluddi" },
],
writeConcern: { w: "majority" , wtimeout: 5000 }
}
)
db.getSiblingDB("$external").runCommand(
{
createUser: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=webapp,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD",
roles: [
{ role: "readWrite", db: "fluddi" },
],
writeConcern: { w: "majority" , wtimeout: 5000 }
}
)

Enable sharding for a database

Before shard a collection, you must enable sharding for the collection’s database. Enabling sharding for a database does not redistribute data but make it possible to shard the collections in that database.

Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores all data before sharding begins.

Enable sharding on a database, in this demo I’m using the namefluddi

sh.enableSharding("fluddi")

Shard collection

1. Shard the collection

You need to enable sharding on a per-collection basis. Determine what you will use for the shard key. Your selection of the shard key affects the efficiency of sharding.

Now connect to mongos with a client certificate & authenticate.

mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem

Authenticate user,

db.getSiblingDB("$external").auth(
{
mechanism: "MONGODB-X509",
user: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD"
}
)

Create a collection in fluddi database.

use fluddi
db.createCollection("visitors")

Create an index of visitors collection.

db.visitors.ensureIndex({"siteId": 1, "_id": 1})

Let’s shard the collection. I’m choosing a compound key.

sh.shardCollection("fluddi.visitors", {"siteId": 1, "_id": 1})

Check the cluster status.

sh.status()

Sample output:

2. Modify chunk size

Make chunk size smaller for demonstration purpose. Otherwise, you will need to generate a large volume of data. That is only for demonstration purpose, don’t do this in production.

use config
db.settings.save( { _id:"chunksize", value: 8 } )

Now chunk size will be 8MB.

Generate some dummy data (Optional)

Go to mongodb-sample-cluster code directory, you cloned.

  • Configure .env file, follow .env.example
  • Install packages, use npm i or yarn
  • Run node index.js, this will generate 50000 visitor records

Now again connect to mongos with a client certificate & authenticate.

mongo --port 27018 --ssl --host database.fluddi.com --sslPEMKeyFile /opt/mongodb/admin-client.pem --sslCAFile /opt/mongodb/CA.pem

Authenticate user,

db.getSiblingDB("$external").auth(
{
mechanism: "MONGODB-X509",
user: "emailAddress=support@fluddi.com,CN=*.fluddi.com,OU=appadmin,O=Fluddi,L=Dhaka,ST=Dhaka,C=BD"
}
)

Check cluster status.

sh.status()

Sample output:

Bind IP address

If you deployed whole things on a remote machine and wanted to access the database from your computer or you want to connect a web app from another server, you need to bind IP address. Change bindIp to 0.0.0.0 . Now mongos will listen on all the interfaces configured on your system.

Before you bind to other IP addresses, consider enabling access control and other security measures listed in Security Checklist to prevent unauthorized access.

Terminology

Replica set: A group of mongod processes that maintain the same data set.
Primary: Replica member accept writes.
Secondary: Pull and Replicates changes from Primary (via oplog).

Thanks for reading. Hope you guys enjoyed this article and got the idea. On my code repo, I included a init script to automate whole sharding setup process, take a look at it.

Credits

I borrowed some text from following links because it suits better than my version.

  1. http://searchcloudcomputing.techtarget.com/definition/sharding
  2. https://cloudmesh.github.io/introduction_to_cloud_computing/class/vc_sp15/mongodb_cluster.html
Topics of interest

More Related Stories