paint-brush
How we migrated our tasker search engine from MongoDB to Elasticsearchby@jturolla
290 reads

How we migrated our tasker search engine from MongoDB to Elasticsearch

by Júlio TurollaSeptember 7th, 2016
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

At <a href="https://hackernoon.com/tagged/parafuzo" target="_blank">Parafuzo</a>, one of our biggest challenges is to select the best professional to execute a cleaning job. This is key to the success of our service, as there are many intricacies that must be considered, ranging from customer and tasker preference, geolocation, previous feedbacks, weekly or monthly fixed jobs, tasker abilities with certain job types.

Company Mentioned

Mention Thumbnail
featured image - How we migrated our tasker search engine from MongoDB to Elasticsearch
Júlio Turolla HackerNoon profile picture

Scaling a home-services hiring platform.

At Parafuzo, one of our biggest challenges is to select the best professional to execute a cleaning job. This is key to the success of our service, as there are many intricacies that must be considered, ranging from customer and tasker preference, geolocation, previous feedbacks, weekly or monthly fixed jobs, tasker abilities with certain job types.

Every few months we sit back and consider upgrades to our tasker (professional) search engine. Back in 2015, when we had an SMS-based job offering system, we realized there was an optimal time distance from the offers to the job date. If it was too short notice, we wouldn’t be able to fulfill all jobs automatically and our ops team had to assign taskers manually to a number of the jobs. If it was too distant from the job, about 6 to 10 days, the no-show rate would skyrocket especially for jobs on Tuesday that were accepted the previous Wednesday. This was all covered by our engine.

Our previous setup

We used to have a very simple and unoptimized engine that took a few tens of seconds to generate a list of the 20 best pros for a job, considering all above rules. At first, we would run multiple queries to determine exclusions from the list, then a geolocation query with the job position as a base would get the available taskers, limited to a 20km radius, sorting the taskers by average score, and filtering for bad feedbacks, service requirements (does the tasker knows how to iron clothes?), tas_ker availability, etc._

We would then run a weighted ranking considering the tasker average score and the tasker location in a way that we prefer better scored taskers, and really nearby taskers. If the job is walking distance from the tasker home, it would get the maximum distance score, but the tasker average score would still be the more important decisor.

We then generate offers for the 5 best positioned taskers, and this offers would be notified (firstly by SMS, then later in 2015 by Parafuzo App), and the best pros would have up to 4 hours to accept or reject the offer (as long as the offer is still available). As we approach closer to the job date, we would send offers more often to more taskers, always preferring the best pros.

Enter Elasticsearch

When we decided to upgrade the engine again, we had a new issue to solve. As our platform is growing, we have to provide better taskers with more value (meaning more jobs), and the cumbersome and slow Mongodb-based engine would not be able to comply with our multiple requirements for matching the best pros to the best jobs.

As we try to aways build cutting edge tech, even with our small 4-person engineering team, we evaluated the possibilities we had to enhance this engine, and we found elasticsearch to be the best fit for the job.

Elasticsearch with their powerful query language allows us to filter specific criteria that must be fulfilled, and to sort and rank the dimensional criteria like score, tasker preference and distance. It’s doing almost the same as our engine did previously (filtering and sorting), but in a very specialized and efficient way.

Microservices Architecture

As part of our infraestructure, every change to our models are asynchronously posted to an SNS topic that is enqueued in SQS, which is consumed by Chiron, our small but powerful app that syncs ou data across systems.

Chiron is a small service that consumes an SQS message like this:

And posts this resource to wherever we want, in this case, elasticsearch.

So, instead of looking up the best tasker for a job, we ended up looking up the best job for a tasker. This way, we would have more control of which taskers would get which jobs, and we would be able to offer better, closer jobs, or even 2 or more jobs in a row for the taskers, one near the other, optimizing our marketplace.

We guarantee that our elasticsearch is hot by syncing available jobs to it in very few seconds, and we replaced our multiple mongodb queries for a single, really optimized query that does all the matching work for us.

Elasticsearch has information of all available jobs, like this:

And matches the best jobs for each tasker every time it’s needed. :)

In the next few months I’ll try to tell the story of how I built a home services marketplace in Brazil with a amazing team. As of July/2017 I left Parafuzo and I’m feeling I should share with the community the knowledge I gathered. Keep in touch via Twitter @jturolla.