we rebuilt the recommendations feature in Yubl in less than 2 weeks using a combination of Lambda, BigQuery and GrapheneDB The Road So Far part 1 : overview part 2 : testing and continuous delivery strategies part 3 : ops part 4 : building a scalable push notifications system part 5 : building a better recommendation system When I joined Yubl in April 2016, it had launched just 2 months earlier, after a long and chaotic development cycle that lasted more than 2 years — all the while there was a fully armed sales team before there was even a product! Some seriously bad decisions happened at Yubl.. and judging by this kind of decision making is far more common than we realised. Silicon Valley That said, many good things also happened at Yubl, and I had the pleasure to work with some of the best people I have met in my career. This post is about one of the ailing features we were able to quickly turn around with the power of AWS Lambda and using the right tool for the job. One of the devs made a valiant attempt to improve the feature by returning only users who have shared connections with you — either you both follow X or you are both followed by X. However, the implementation was a series of expensive (and complicated) MongoDB queries per user request. Ultimately it was an approach that would not scale with throughput nor complexity as it’s using the wrong tool for the job. Lambda + GrapheneDB = Efficient Graph Queries I had previously worked with Neo4j at Gamesys and used it to . analyze and model the complex in-game economy of a MMORPG A graph database like Neo4j is the perfect place to store our social graph, and allows us to efficiently perform the kind of graph queries we need in order to find users you should follow, eg. 2nd/3rd degree connections. offers , with built-in monitoring, dashboards, automated backup and scaling up. It was the perfect choice to get us going and start delivering value to our users quickly. GrapheneDB hosted Neo4j database as a service At this point in time we were already streaming all state changes in the system into Kinesis. To export all of our social graph into GrapheneDB and to keep it in sync with MongoDB we: ran a one-off task to export all the relationship data into GrapheneDB subscribed a Lambda function to the Kinesis stream to process any subsequent relationship changes and update the social graph (in GrapheneDB) in real time Relationship a Lambda function would process relationship changes and update the social graph in real time We then exposed the data via API Gateway and Lambda so that the client app and other internal services can use it to easily find suggested users for a user to follow. Future Plans Given the limitation that Neo4j requires all of your graph to be stored on one machine (and it has pretty taxing too) it was not the long term solution for us. hardware requirement Based on my estimates, the biggest instance available on GrapheneDB would suffice until we have more than 10M users. It was calculated based on the average no. of connections per user in our platform and using Twitter’s user stats as a guideline for where we might be at 10M users. We can push that ceiling much further by moving to a batch model and preprocess recommendations for each user to reduce the no. of live queries against a large graph. The recommendations can be restricted to active (eg. users that have logged in in the last X days) users only, and only when: the recommendations are stale, ie. not acted upon by the user for more than X days so they might not be what the user wants; or when the user’s extended social graph has changed, ie. followers/followees have new connections From what I was able to gather, all the big social networks use a batch model for scalability and cost reasons. As for a long term solutions, we hadn’t settled on anything. I looked at Facebook’s briefly but it’s far more sophisticated than we were ready for. There are other “fantasy” ideas like the Mosaic system described in this . It would have been a fantastic challenge had we got that far. Giraph paper Finding Trending Users Because we were still a small social network — with just over 800K installs, it’s not sufficient to make recommendations based on a user’s social graph alone as most users have a pretty small social graph. To bridge the gap we decided to also include trending users on the platform in your recommendations. Thankfully, all of our events (eg. X followed Y, X liked Y’s post, etc.) are streamed into Google BigQuery. We chose BigQuery because AWS Athena hadn’t been announced yet and RedShift is not the right model for making ad-hoc, live queries that need to respond quickly. Also, I had many years of experience using BigQuery at Gamesys so it was a no-brainer at the time. ps. if you’re curious about the difference between Athena and BigQuery, Lynn Langit gave a comprehensive comparison at Serverless Austin this year. To find trending users, we worked with the product team to create a formula to calculate a user’s “trendiness” based on no. of new followers in the last 24 hours. The follower count is weighted exponentially by how recently the user was followed. For instance, a follower that followed you in the past hour gives you a score of 1, but a follower that followed you 3 hours ago would only earn you a score of 0.1. We created a cron job with CloudWatch Events and Lambda to perform the aforementioned query against BigQuery every 3 hours. To save on cost, our query would only process events that were inserted in the last 24 hours. The result are then saved into a DynamoDB table, which is overwritten at the end of each run. Once again, we exposed the data via API Gateway and Lambda. Migration to new APIs Now, we have 2 new APIs to provide live suggestions based on a user’s social graph, and to find users who are currently trending on our platform. However, the client apps would need to be updated to take advantage of these new APIs. Instead of waiting for the client teams to catch up, we updated the legacy API’s suggestion endpoint to use results from both so we can provide value to our users earlier. “The lead time to someone saying thank you is the only reputation metric that matters.” — Dan North This is how it looks when we put everything together: One of the most satisfying aspect of this work was how quickly we were able to turn this feature around and deploy the new system into production. Everything came together in less than 2 weeks, which is largely because we were able to focus on our business needs and let services such as Lambda, BigQuery and GrapheneDB deal with the undifferentiated efforts. Like what you’re reading but want more help? I’m happy to offer my services as an and help you with your serverless project — architecture reviews, code reviews, building proof-of-concepts, or offer advice on leading practices and tools. independent consultant I’m based in and currently the only UK-based . I have nearly of with running production workloads in AWS at scale. I operate predominantly in the UK but I’m open to travelling for engagements that are longer than a week. To see how we might be able to work together, tell me more about the problems you are trying to solve . London, UK AWS Serverless Hero 10 years experience here I can also run an to help you get with your serverless architecture. You can find out more about the two-day workshop , which takes you from the basics of AWS Lambda all the way through to common operational patterns for log aggregation, distribution tracing and security best practices. in-house workshops production-ready here If you prefer to study at your own pace, then you can also find all the same content of the workshop as a I have produced for Manning. We will cover topics including: video course authentication authorization with API Gateway Cognito & & testing running functions locally & CI/CD log aggregation monitoring best practices distributed tracing with X-Ray tracking correlation IDs performance cost optimization & error handling config management canary deployment VPC security leading practices for Lambda, Kinesis, and API Gateway You can also get the face price with the code . Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP). 40% off ytcui

Amazon

Apache

Facebook

Google

Mosaic

Twitter

The problems with DynamoDB Auto Scaling and how it might be improved

Beware of dilution of DynamoDB throughput due to excessive scaling

Ask me for help about serverless

Read My Stories

Too Long; Didn't Read

Yubl’s road to Serverless — Part 5, Building better recommendations with Lambda, BigQuery and…

Yubl’s road to Serverless — Part 5, Building better recommendations with Lambda, BigQuery and…

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

6 Tips To Scale an AppSync Project To 200+ Resolvers That Will Blow Your Mind

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

6 Tips To Scale an AppSync Project To 200+ Resolvers That Will Blow Your Mind

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps