Abhishek Nandi

@Abhishek_Nandi

Making your Firebase data searchable with little help from AWS

May 14th 2017

CloudSearch is a managed service provided by AWS. We used AWS Lambdas, Step Functions to get the data from Firebase and move it to CloudSearch

Components that were used to achieve search on firebase data

Firebase does not provide an inherent way to search the data that you store. You either use ElasticSearch with their open source project Flashlight or depend on Full Text comparison. In this post we talk about how to simplify that using AWS Cloud Search

For those who are not familiar with Firebase; it is a “realtime” NoSQL database, currently part of Google and is pretty good when prototyping apps. It does have good SDKs for Android, iOS and Web. The entire DB is treated like a JSON document can be easily manipulated from their web console. It provides automated backups in their highest tier and has built in auth with security rules.

Amazon CloudSearch is a managed service provided by AWS that makes it simple and cost-effective to set up, manage, and scale a search solution for all applications, supporting 34 languages along with features like autocomplete , highlighting, calculated fields and scoring based sort. For those dealing with geo locations, CloudSearch provides support for that as well.

Listen to the full story at odiocast.com

We at @odiocast needed to make our content searchable within the app. We found AWS Cloudsearch to be a perfect fit. I did realize I would be charged for network egress from Google Cloud/Firebase (whatever you want to call it), but that charge is too negligible when compared to the cost of running a ES instance, clubbed with monitoring , auto scaling and up keep. For a fraction of cost and some small scripts I was able to get our content searchable.

Setting up CloudSearch

Head over to the AWS Console, and look for CloudSearch. To Get Started you will need to create a CloudSearch domain. The setup wizard can analyze a data file to find the fields which needs to be indexed, you can also configure it manually. I prefer to do it manually. After you have completed the setup wizard, it would take some time to prepare the index.

Grab a coffee or come back to the console after about 5–15mins. You can use Amazon CloudSearch to index and search both structured data and plain text. Amazon CloudSearch features:

  • Full text search with language-specific text processing
  • Boolean search
  • Prefix searches
  • Range searches
  • Term boosting
  • Faceting
  • Highlighting
  • Autocomplete Suggestions

You can construct both simple and compound queries. I would strongly suggest you go through the resources.

Take it for a spin with ZERO code

With CloudSearch you can upload the documents which needs to be searched from the dashboard and start searching it from the test search tab available in the left navigation bar.

Setting up CloudSearch domain and configuring indexes was the easy part. Now lets look at how to get the data from Google’s Firebase to AWS CloudSearch.

Sending data to CloudSearch

In our case the content is dynamically generated so we need to keep updating the service with any new data that is generated. To keep things simple, I broke the task into the following steps:

  • Calculate the duration for which data needs to be fetched
  • Fetch user data and stories
  • Merge and format the data if required
  • Check if we need to add data to cloud search
  • Add data to cloud search
  • Cleanup

This can be done in several ways but I decided to go with AWS Step Function.

AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly. Step Functions is a reliable way to coordinate components and step through the functions of your application.

If we imagine the above subtasks to be states and treat it as a finite state machine then we can imagine something like this

I created 5 lambda functions and deployed them using Apex. I wired these lambdas into a state machine up using the state machine language. In my case I need to fetch both user data and the stories data, which I execute in parallel and then pass it to the formatting function. AWS provides a graphical interface to view the state machine in and after execution and also highlight every step.

The image is kind of self explanatory, all the green boxes have been successfully executes. The one in blue is currently in execution. “Check if new data found” is a choice state which decides whether to proceed forward or not.

In reality the lambdas execute a lot faster and it takes a second or two to update UI. The spec file of a state machine, is a simple JSON file, its pretty much readable and self explanatory.

Our State Machine spec which we currently use

The reason I added calculateInterval was to be able to keep some flexibility in syncing old data. Now to keep this running at a fixed interval I use AWS Cloud Watch, and configure a rule at a fixed interval and target the State Machine that we just created, passing the required input interval.

AWS Cloud Watch Console

Finally with some android code, we added search and auto complete to our Android and iOS apps.

If you have any doubts with respect to AWS CloudSearch, Firebase, Step Functions/AWS Lambda you can reach me at abhishek@odiocast.com.

TL;DR

In case you are wondering; lambdas are stateless then how does step functions work? Well behind the scene AWS stores the output of the lambda function into some storage/cache and passes it to the next lambda function based on the state machine definition. If you ever run a parallel task like I am doing here, you’ll find the result is a JSON Array which has input and output in each JSON Object of the array. So if you were to run 10 lambdas parallely, you would end up getting a JSON Array with size 10.

Pricing

Do note you will be charged a nominal fee of $0.025 per 1,000 state transitions thereafter ($0.000025 per state transition) with the first 4,000 state transitions free each month. Apart from this, you will be charged the cost incurred by your lambda functions. As far as Cloud Search goes, it is an instance based price based on the number of hours consumed which starts as low as $0.059 per hour for a search.m1.small instance. In addition to this there is an additional cost for network egress from firebase which is $1 per GB.

Before building this solution I did realise I could have gone the other way and written Firebase Cloud Functions to send data from Firebase to AWS Cloud Search.

There are many ways in which you can provide a search functionality in your apps. You might even be tempted to use something like an Algolia, Apache Solr, or ElasticSearch or even the oxygen library. The decision of building versus using managed services should be properly weighed. If search is just a functionality among the several features in your app, you should not spend days and hours building, maintaining and monitoring it. If it is very fundamental to your service and none of the existing tools provide support for your need or maybe the estimated cost for managed services is too high only then you should build your own.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

More by Abhishek Nandi

More Related Stories