First Impression and A Little Story When Trying Aerospike in Kurio aerospike logo from google images search Last week we develop a new feature in . This feature is big enough for me because there are only two of us from Backend Team + one from Infra-team, was assigned to finish this project. So we need to find a Database application that suited to our cases that are: Kurio Support — has an official library or easy to integrate — with Golang, because our current project already running on Golang We want a fast read and write database, either is RDBMS or NoSQL. Which mean, while we write the data, it does not affect the read performances. The database must easy to scale. And the data was persistent and saved to disk. If drawn in a picture, we have something similar like this: Our System Schema After figuring the cases, we list a few options of databases. Redis We know Redis is very good, and fast because it saves the data in memory. But we know it’s not suitable for our case. Redis will persist the data to disk if there was any trigger. As our case need, we need the data stored in the disk and persisted plus fast read. MongoDB MongoDB comes to our second option because, in our current system, we use MongoDB as the data store. But we need more performances. MyRocks Another option that we started to think is MyRocks. MyRocks introduced by Facebook, using MySql with RocksDB as the storage engine. Because it was used by Facebook, we think it was a better option. But later, after discussing with our infrastructure team, MyRocks same as other MySql, it cannot scale out. The difference is only the storage engine, so there’s no big difference compared to another normal Mysql in term of “scaling out”. Aerospike Later, I found many databases that have great performance out there, something like Cassandra, Scylla, and etc. I don’t remember many of them. Until I found Aerospike. It was like a rising star database. Also, there were many article benchmark about it on the Internet. Well, to be honest, Aerospike is something new to me, and also for the team in Kurio, but after seeing all the review, and the feature of Aerospike, and quite fit to our cases, we decide to try Aerospike. So, after discussing with the team, also with the Infra-team, we decide to use Aerospike, thanks to its scalability, the infra team does not need extra effort for maintaining Aerospike in scaling out. First Impression This is a few features of Aerospike that amazed us the team and also fit our case. Redis-way After learning the concept and how the data saved in Aerospike, I learn that Aerospike has a similar concept with Redis. It's using concept. key:value Talking about performance in retrieving data, of course, it same with redis. It was a anyway. key:value Secondary Index Another thing, I learned from Aerospike is, they supported secondary index. So, even the aerospike was a , it also possible to us to query using another index that we created. key:value Asynchronously Persisted Not like Redis, Aerospike persists the data asynchronously to disk. If redis persist the data by trigger or action, Aerospike can persist the data to disk automatically, because in Aerospike we can use the Hybrid data storage. It will save to memory and disk. Data Model and Schema In Aerospike, there a few terms related to data that must we know first. They are : Namespace Set Record Bin Aerospike Data Schema Namespaces Namespaces are top-level of the container. The namespace contains one or more Set, Records, Bins, Index. If we compare to RDBMS, is similar to a Database Schema. namespace Namespace image from Aerospike documentations Sets Set is more similar like a collection in MongoDB, or a table in RDBMS. It contains many records and bins. Set in Aerospike Records Records are more similar like rows in RDBMS. One record has one PK ( ), and have one or many bins. And in one , it may have many records. key set/collection Record in Aerospike Bins Bin in Aerospike Bins in Aerospike is more similar like a column in RDBMS. We can add the index to any bin as any RDBMS does. The different is, it’s more flexible and dynamic. It can have a lot of bins in one record. And for a single bin, it’s can store any data type ( Int, String, Byte, etc). It’s more like the column but more flexible. Example of Bins More about this already explained well in the official documentation here: . So I will not tell much about this four here. https://www.aerospike.com/docs/architecture/data-model.html Querying and Indexing So, after developing the feature (which using aerospike as the data store), we must and had to learn how to query in Aerospike. Luckily, Aerospike already creates many client library and support for many programming language. We can see in their official GitHub account here . Also to help in debugging control data, they also create the (Aerospike Query Language). It provides a SQL-like command line interface for database, UDF (User Defined Function) and index management. http://github.com/aerospike aql With the , we can do a query to the Aerospike server like : aql $ aql> SELECT * FROM test.user$ aql> SELECT * FROM test.user WHERE PK=2 More about command and information about you can read it here: aql https://www.aerospike.com/docs/tools/aql For our case, because we use golang in our project, we use the official client created by Aerospike here: https://github.com/aerospike/aerospike-client-go Indexing As we know, Aerospike is a data storage. But, aerospike is also support for the seconday index. That’s mean, we also add an index on the . Then, with that index, we can query to the value. So it’s not just a like get the data by key, but also we can get the data by value, or indexed bin. key:value value/bin For example, let say I have User set, that has bins: . For this example, I will make the be the PK. So in total, for one record, I will have minimum of 2 bins. user_id,name,email user_id Example of User With this record, I can directly query or get Record by PK. If using it just like this command: aql $ aql> SELECT * FROM sample.user WHERE PK=12 Another case, let’s say I want to query by email. I want to get user by email . If using it will more like this. ganteng@gmail.com aql # Add Index on email bin$ aql> CREATE INDEX email_user_idx ON sample.user (email) STRING# Query by Email$ aql> SELECT * FROM sample.user WHERE email="ganteng@gmail.com"# Will Display the result|-----|---------------|------------------|| PK | name | email || 14 | Iman Ganteng | ganteng@gmail.com||----------------------------------------| More about this querying and indexing you can read in the official documentation. Deploying To Production Well, back to our story, if you want know more about Aerospike you can read in the official documentations in their website. After finishing all the feature and environment, we trying to deploy it to production. We deploy it at midnight, around 11.00 PM till 11.59 PM, and we just leave it until in the morning to gather the data. But at morning 06.00 AM, our CPU usage going high and spike. And unfortunately, we must rollback the service to the stable version. Detecting and Fixing Issues So, after trying to release it to the production, we get some critical issue. When the request is high, our CPU usage is going abnormal than the old version. Well to be honest, in this new version feature, it has many computation process, than the previous version. Also, we don’t implement the autoscaling mechanism yet. So we assume it was because our added function that cause the CPU usage going high. But until we trying to profiling our application, we get unexpected case. From profiling we can see that the client library has slow process and quite a lot of CPU usage. Profiling golang using pprof. Show the CPU usage in client library aerospike More usage caused by Syscall. But later, after looking for the slide presentation by the CTO of Aerospike here: , and also after looking all the pprof images, we can see that this is happen by the network I/O. So, to fix the issue, we implement the autoscaling mechanism to our system. https://www.slideshare.net/brian-aerospike/go-meetup-nov142 Conclusion So, after trying the Aerospike, it is quite challenging. Because it was new for us. We are just two person to doing this, three with one extra of the Infra-team member. And from my own perspective, Aerospike is a worth to try for them to seek a data store like our cases. Redis-like but persisted (Hybrid: memory and disk). And also support for secondary Index. Talking about the drawback, I found some drawback, it was in the library golang itself, not the Aerospike. The library return the data in . I wish someone out there will submit a PR to the repository, so it will allow the client-library return only when querying results, so we can handle the marshalling by ourself. LOL 😈 map[string]interface{} bytes Well, maybe there was a few things that I missed, but I hope I can write it well. And by the way, I write this based on my perspective and opinion and also my own experience when trying the Aerospike directly. If you think this story worth enough to read, share it to your circle, so your friend can also read this. Or if you have a question or another perception or if I write something wrong, just put a response below, or you can email me. Thank you