What is Cassandra? Apache is a free and distributed wide column store NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra open-source Wikipedia Apache Cassandra is a high performance, extremely scalable, fault tolerant (i.e. no single point of failure), distributed post-relational database solution. Cassandra combines all the benefits of Google Bigtable and Amazon Dynamo to handle the types of database management needs that traditional RDBMS vendors cannot support. (Source: DataStax). Why do we even need to self manage and run Apache Cassandra if we have Amazon Managed DynamoDB ? Cassandra and DynamoDB both origin from the same paper: . (By the way — it has been a very influential paper and set the foundations for several NoSQL databases). Dynamo: Amazon’s Highly Available Key-value store Of course it means that DynamoDB and Cassandra have a lot in common! (They have the same DNA). However both AWS DynamoDB and Apache Cassandra have evolved quite a lot since this paper was written back in 2007 and there are now some key differences to be aware of when choosing between the two. Both databases have their own advantages and disadvantages, you can choose the one that best matches your requirements.Read in detail. here We were initially using DynamoDB. Our primary reason to switch from DynamoDB to Cassandra was Total Cost of Ownership (TCO). We have been able to reduce the cost to almost half what we were paying for DynamoDB.Other benefits are: Cassandra is Open Source, it provides full active-active multi-region support, significantly lower latency than DynamoDB, etc.Read detailed comparison over TCO of DynamoDB and Cassandra . here Bootstrap the cluster: Simple Multi-AZ Architecture for Cassandra We are currently using 3 node cluster and host OS is Ubuntu running on AWS EC2. Step 1: Launch 3 Ubuntu based instances in 3 different AZs. Step 2: Update, Upgrade and Restart the instances$ sudo apt update $ sudo apt upgrade -yYou may need to reboot instance. Step 3: Add the Apache repository of Cassandra to /etc/apt/sources.list.d/cassandra.sources.list.$ sudo echo “deb 311x main” | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list http://www.apache.org/dist/cassandra/debian Step 4: Add the Apache Cassandra repository keys:$ sudo curl | sudo apt-key add - https://www.apache.org/dist/cassandra/KEYS Step 5: Update the repositories:$ sudo apt update Step 6: Install Cassandra $ sudo apt install cassandra Step 7: Stop Cassandra Service$ sudo service cassandra stop Steps 1–7 ensure that all the instances are up to date and have Cassandra installed on it. Now in order to create a Cluster of these 3 nodes or add a new node to existing cluster, follow above 1–7 steps and then below steps: Step 1: Goto Cassandra conf Directory.$ cd /etc/cassandra Step 2: Take backup of main configuration file before you make any change in it.$ sudo cp cassandra.yaml cassandra.yaml.bak Step 3: Open cassandra.yaml in your favorite editor and edit below parameters as mentioned below: cluster_name: ‘My Cluster’authenticator: PasswordAuthenticator ( optional )seeds: “node_private_ip_address”listen_address:<node_private_ip_address>rpc_address: 0.0.0.0broadcast_rpc_address:<node_private_ip_address>endpoint_snitch: Ec2Snitch Step 4: Save the cassandra.yaml file. Step 5: Clear the default data from the Cassandra system table in order to import the new values set in the cassandra.yaml config file:$ sudo rm -rf /var/lib/cassandra/data/system/* Step 6: Start Cassandra Service on that node.$ sudo service cassandra start Step 7: Wait for 10 second and check cluster status.$ sudo nodetool status Result something like above will appear. If new node is in Joining state, you will see in the beginning of that node. UJ Step 8: After all new nodes are running, run on each of the previously existing nodes to remove the keys that no longer belong to those nodes. Wait for cleanup to complete on one node before running nodetool cleanup on the next node. Cleanup can be safely postponed for low-usage hours. nodetool cleanup Note: Do not use new node as Seed Node, once a node is a part of Cluster, it can be promoted as Seed Node. Maximum 3 should in a cluster should be fine. Do not make all nodes seed nodes. Read about initializing a multiple node cluster (single datacenter) and seed node . here here Some important point : For a Cassandra cluster running on AWS, we use single region cluster and multi-region cluster as name suggests. Know more about Cassandra Snitch Classes . Ec2Snitch Ec2MultiRegionSnitch here It’s a fairly simple cluster to get started with, there is lots of scope of improvements. Thanks for reading. Happy Cloud Computing :)