is an API that enables developers to build news feeds and activity streams ( ). We are used by over 500 companies and power the feeds of more than 300 million end users. Companies such as Product Hunt, , Powerschool, , , Compass and Fabric (Google) rely on Stream to power their news feeds. In addition to the API, the founders of Stream also wrote the most widely used . Stream try the API Under Armour Bandsintown Dubsmash open source solution for building scalable feeds Here’s what Stream looks like today: Number of servers: 180 Feed updates per month: 34 billion Average API response time: 12ms Average real-time response time: 2ms Regions: 4 (US-East, EU-West, Tokyo and Singapore) 1B API requests per month (~20k/minute) Given that most of our customers are engineers, we often talk about our stack. Here’s a high level overview: is our primary programming language. Go We use a custom solution built on top of + for our primary database (we started out on but wanted more control over performance). stores configs, API keys, etc. RocksDB Raft Cassandra PostgreSQL and handle tracing, and provide detailed monitoring and we use the stack for centralized logging. OpenTracing Jaeger StatsD Grafana ELK is our language of choice for machine learning, devops and our website ( , & ). Python https://getstream.io Django DRF React Stream uses a combination of and . This results in fast read performance when a users open their feed, as well as fast propagation times when a famous user posts an update. fanout-on-write fanout-on-read and I ( ) are the developers who started Stream nearly 3 years ago. We founded the company in Amsterdam, participated in Techstars NYC 2015 and opened up our Boulder, Colorado office in 2016. It’s been quite a crazy ride in a fairly short amount of time! With over 15 developers and a host of supporting roles, including sales and marketing, the team feels enormous compared to the early days. Tommaso Barguli @tschellenbach The Challenge: News Feeds & Activity Streams The nature of follow relationships makes it hard to scale feeds. Most of you will remember Facebook’s long load times, Twitter’s fail whale or Tumblr’s year of technical debt. Feeds are hard to scale since there is no clear way to shard the data. Follow relationships connect everyone to everyone else. This makes it difficult to split data across multiple machines. If you want to learn more about this problem, check out these papers: Twitter’s tech back in the days with 150 million active users LinkedIn’s ranked feeds Yahoo/Princeton research paper At a very high level there are 3 different ways to scale your feeds: Fanout-on-write: basically precompute everything. It’s expensive, but its easy to shard the data. Fanout-on-read: hard to scale, but more affordable and new activities show up faster. Combination of the above two approaches: better performance and reduced latency, but increased code complexity. Stream uses a combination of fanout-on-write and fanout-on-read. This allows us to effectively support both customers with highly connected graphs, as well as customers with a more sparse dataset. This is important since the ways in which our customers use Stream are very different. Have a look at these screenshots from , Unsplash, and : Bandsintown Product Hunt Switching from to Python Go After years of optimizing our existing feed technology we decided to make a larger leap with 2.0 of Stream. While the first iteration of Stream was powered by Python and Cassandra, for Stream 2.0 of our infrastructure we . The main reason why we switched from Python to Go is performance. Certain features of Stream such as aggregation, ranking and serialization were very difficult to speed up using Python. switched to Go We’ve been using Go since March 2017 and it’s been a great experience so far. Go has greatly increased the productivity of our development team. Not only has it improved the speed at which we develop, it’s also 30x faster for many components of Stream. The performance of Go greatly influenced our architecture in a positive way. With Python we often found ourselves delegating logic to the database layer purely for performance reasons. The high performance of Go gave us more flexibility in terms of architecture. This led to a huge simplification of our infrastructure and a dramatic improvement of latency. For instance, we saw a 10 to 1 reduction in web-server count thanks to the lower memory and CPU usage for the same number of requests. Initially we struggled a bit with package management for Go. However, using together with the contributed to creating a great workflow. Dep VG package If you’ve never tried Go, you’ll want to try this online tour: https://tour.golang.org/welcome/1 Go as a language is heavily focused on performance. The built-in tool is amazing for finding performance issues. Uber’s library is great for visualizing data from PPROF and will be bundled in PPROF in Go 1.10. PPROF Go-Torch Switching from Cassandra to & Raft RocksDB 1.0 of Stream leveraged for storing the feed. Cassandra is a common choice for building feeds. Instagram, for instance started, out with but eventually to handle their rapid usage growth. Cassandra can handle write heavy workloads very efficiently. Cassandra Redis switched to Cassandra Cassandra is a great tool that allows you to scale write capacity simply by adding more nodes, though it is also very complex. This complexity made it hard to diagnose performance fluctuations. Even though we had years of experience with running Cassandra, it still felt like a bit of a black box. When building Stream 2.0 we decided to go for a different approach and build . Keevo is our in-house key-value store built upon RocksDB, gRPC and Raft. Keevo is a highly performant embeddable database library developed and maintained by Facebook’s data engineering team. RocksDB started as a fork of Google’s LevelDB that introduced several performance improvements for SSD. Nowadays RocksDB is a project on its own and is under active development. It is written in C++ and it’s fast. Have a look at how this . In terms of technology it’s much more simple than Cassandra. This translates into reduced maintenance overhead, improved performance and, most importantly, more consistent performance. It’s interesting to note that also uses RocksDB for . RocksDB benchmark handles 7 million QPS LinkedIn their feed Our infrastructure is hosted on and is designed to survive entire availability zone outages. Unlike Cassandra, Keevo clusters organizes nodes into leaders and followers . When a leader (master) node becomes unavailable the other nodes in the same deployment will start an election and pick a new leader. Electing a new leader is a fast operation and barely impacts live traffic. AWS To do this, Keevo implements the Raft consensus algorithm using . This ensures that every bit stored in Keevo is stored on 3 different servers and operations are always consistent. This site does a great job of visualizing how Raft works: Hashicorp’s Go implementation https://raft.github.io/ Not Quite Microservices By leveraging Go and RocksDB we’re able to achieve great feed performance. The average response time is around 12ms. The architecture lies somewhere between a monolith and a microservice. Stream runs on the following 7 services: Stream API Keevo Real time & Firehose Analytics Personalization & Machine learning Site & Dashboard Async Workers To see all our services divided into stacks, head over . here Personalization & Machine Learning Almost all large apps with feeds use machine learning and personalization. For instance, LinkedIn prioritizes the items in your feed. Instagram’s explore feed displays pictures outside of the people you follow that you might be interested in. Etsy uses a similar approach to optimize ecommerce conversion. Stream supports the following 5 use cases for personalization: Documentation for building personalized feeds. All of these personalization use cases rely on combining feeds with analytics and machine learning. For the machine learning side we generate the models using . The models are different for each of our enterprise customers. Typically we’ll use one of these amazing libraries: Python — LightFM How to implement recommender systems XGBoost Scikit learn , and Dask Numpy Pandas (For development) Jupyter Notebook Mesa Framework Analytics Analytics data is collected using a tiny Go-based server. In the background, it will spawn go-routines to rollup the data as needed. The resulting metrics are stored in . In the past, we looked at , which seems like a solid project. For now, we could get away with a simpler solution though. Elastic Druid Dashboard & Site The dashboard is powered by and . We also use React and Redux for all of our example applications: React Redux Cabin: A tutorial series on building a social network with React & Redux (Winds 2.0 is coming out very soon) Winds: Open source RSS & Podcast app The site, as well as the API for the site, is powered by , and . Stream is sponsoring Django Rest Framework since it’s a pretty great open source project. If you need to build an API quickly there is no better tool than DRF and Python. Python Django Django Rest Framework We use to resize the images on our site. For us, Imgix is cost efficient, fast and overall a great service. is a good open source alternative. Imgix Thumbor Real time Our real time infrastructure is based on Go, Redis and the excellent library. It implements the Bayeux protocol. In terms of architecture it’s very similar to the node based library. gorilla websocket Faye It was interesting to read the “ ” post on Hacker News. The author moves from Go to Node to improve performance. We actually did the exact opposite and moved from Node to Go for our real time system. The new Go-based infrastructure handles 8x the traffic per node. Ditching Go for Node.js Devops, Testing & Multiple Regions In terms of devops the provisioning and configuration of instances is fully automated using a combination of: CloudFormation Cloud-Init Puppet Boto & Fabric Because our infrastructure is defined in code it has become trivial to launch new regions. We heavily use . Every single piece of our stack is defined in a CloudFormation template. If needed we are able to spawn a new dedicated shard in a few minutes. In addition, AWS Parameter Store is used to hold application settings. Our largest deployment is in US-East, but we also have regions in Tokyo, Singapore and Dublin. CloudFormation A combination of Puppet and Cloud-init is used to configure our instances. We run our self-contained Go’s binaries directly on the EC2 instance without any additional containerization layer. Releasing new versions of our services is done by . Travis first runs our test suite. Once it passes, it publishes a new release binary to . Common tasks such as installing dependencies for the Go project, or building a binary are automated using plain old Makefiles. (We know, crazy old school, right?) Our binaries are compressed using UPX. Travis GitHub Tool highlight: Travis has come a long way over the past years. I used to prefer in some cases since it was easier to debug broken builds. With the addition of the aptly named “debug build” button, Travis is now the clear winner. It’s easy to use and free for open source, with no need to maintain anything. Travis Jenkins Next we use to do a rolling deploy to our AWS instances. If anything goes wrong during the deploy it will halt the deploy. We take stability very seriously: Fabric A high level of test coverage is required. Releases are created by Travis (making it hard to deploy without running tests). Code is reviewed by at least 2 team members. Our extensive QA integration test suite evaluates if all 7 components still work. We’ve written about our experience with . When things do break we do our best to be transparent about the ongoing issue: testing our Go codebase StatusPage for 24/7 phone support Twilio for notifying our team VictorOps #firefighting channel on Slack for customer support Intercom VictorOps is a recent addition to our support stack. It’s made it very easy to collaborate on ongoing issues. Tool Highlight: VictorOps The best part about is how they use a timeline to collaborate amongst team members. VictorOps is an elegant way to keep our team in the loop about outages. It also integrates well with . This setup enables us to quickly react to any problems that make it into production, work together and resolve them faster. VictorOps Slack The vast majority of our infrastructure runs on AWS: for load balancing ELB for reliable Postgres hosting RDS for Redis hosting ElastiCache for our DNS Route53 for our CDN Cloudfront for ELK and tracing with AWS ElasticSearch Jaeger The devops responsibilities are shared across our team. While we do have one dedicated devops engineer, all our developers have to understand and own the entire workflow. Monitoring Stream uses for tracing and for beautiful dashboards. The tracking for Grafana is done using . The end result is this beauty: OpenTracing Grafana StatsD We track our errors in and use the ELK stack to centralize our logs. Sentry Tool Highlight: OpenTracing + Jaegar One new addition to the stack is . In the past we used , which works like a charm for Python, but isn’t able to automatically measure tracing information for Go. OpenTracing with is a great solution that works very well for Stream. It also has, perhaps, the best logo for a tracing solution: OpenTracing New Relic Jaeger Closing Thoughts is an absolutely amazing language and has been a major win in terms of performance and developer productivity. For tracing we use OpenTracing and Jaeger. Our monitoring is running on StatsD and Graphite. Centralized logging is handled by the ELK stack. Go Stream’s main database is a custom solution built on top of and Raft. In the past we used Cassandra, which we found hard to maintain and which didn’t give us enough control over performance when compared to RocksDB. RocksDB We leverage external tools and solutions for everything that’s not a core competence. hosting is handled by ElastiCache, Postgres by RDS, email by , test builds by Travis and error reporting by Sentry. Redis Mailgun Thank you for reading about our stack! If you’re a user of Stream, please be sure to add to your stack on StackShare. If you’re a talented individual, ! And finally, if you haven’t tried out Stream yet, take a look at this quick . Stream come work with us tutorial for the API This post was originally written by Thierry Schellenbach, CEO at GetStream.io . The original post can be found at https://stackshare.io/stream/stream-and-go-news-feeds-for-over-300-million-end-users .