Quick — name one startup you know using a single-tenant architecture. Got one? Yeah, me neither. Multi-tenant architectures are the standard way to run a startup these days. Create a database, provision some servers, add a load balancer, top it off with some caching and call it a day. But why? What is it about multi-tenant that’s superior to single-tenant? Is it cost? Complexity? Security? Scale? Recently I started working on a new product and the multi-tenant architecture felt ungodly complex. There had to be a better solution — a quicker way to get up and running that wouldn’t create scaling issues for us later on. Now, I’ll be honest. I’ve been developing software and running websites for 15 years and I’ve never had to deal with “internet scale” concerns. Hell, I’ve barely even had to deal with caching. I’ve been working on Kumu the past five years and — — Kumu’s traffic can be handled by a single large server. It’s a powerful, sophisticated tool, and growth has been solid, but it’s simply not the kind of stuff that goes viral. outside of embeds We expect to be a different story. Compass Compass helps you visualize your Slack team’s communication. Unlike Kumu — where you start small, build up your data incrementally, and handle most calculations locally in your browser — Compass taps into a firehose of data the second you sign up and most calculations are handled server-side. Compass may never reach the scale we’re anticipating, but if it does we want to be prepared. We have , and , and , and , and we’d much rather be doing. Like sleeping. (Trying to, at least.) And if it goes viral we don’t want to be woken up in the middle of the night to put out fires. lives wives kids hobbies many other things So instead of attempting to build a system that scale, we’re putting in some time upfront to an (or at least make it somebody else’s responsibility). should design any-scale system: a system that’s built to remove scale from the equation With that goal, we’ve been exploring two potential architectures on top of AWS: (typical model supporting all teams on a pool of shared resources, such as load balancers, gateways, subnets, kinesis, lambdas, redshift, ECS, EC2, autoscaling groups, security groups, cloudfront, etc — ) Multi-tenant SAAS honestly if I were to list every AWS resource required there would be dozens of them — hence the reason for exploring the simpler, single-tenant architecture (each teams gets a dedicated EC2 instance, exposed directly to the internet, with everything running locally through Docker) Single-tenant The remainder of this post explores the trade-offs we’ve considered between these two architectures. If there are any big ones we’ve missed, please mention them in the comments! Cost As a bootstrapped startup, cost is a big one. Get it wrong and it can cripple you before you get out of the gate. Get it right and you’re off to sipping mai tais on the beach while you still look good in a bathing suit. (Or you could use that money productively to create jobs and grow the product. You’re the boss.) You’re paying a lot up front but the good news is each new customer you add drives down the marginal cost of adding the next one. Outside of customer support, adding new customers doesn’t really cost you anything. With multi-tenant architectures, the cost to run the system is fixed. It’s fixed. Each new customer requires a new instance and each of those new instances has to be paid for. Worse than that, the cost per customer actually goes up! Larger teams need larger instances. While we might be able to support teams of 10 to 20 on a t2.nano, we’ll need a much larger instance to support teams with hundreds of members. With single-tenant architectures, the marginal cost of adding new customers never goes down. So — since multi-tenant lowers cost per customer — it’s the clear winner here, right? Well… no. Not really. Besides the cost of the underlying infrastructure, there’s also these things called . And they’re expensive. humans The single-tenant architecture is a simpler architecture with fewer moving pieces. Simpler systems can be supported by smaller teams. And those teams can be made up of developers instead of dedicated sysadmins — it’s the exact same system you’re already running locally for development. Which brings us to the next concern: parity. Parity Parity is the notion of similarity between environments. One of the major downsides of the multi-tenant architecture is the lack of parity between the environments we need to support: development (everything running locally on your own machine) staging (everything running in the cloud but not designed to scale) production (everything running in the cloud and designed to scale) enterprise (everything running on somebody else’s machine) Each one of those environments is complex on its own. Add them together and it’s clear why sysadmins get paid the big bucks. You could argue that you don’t really need a staging environment. Because that’s what tests are for, ? right You could also argue that worrying about an on-premise enterprise version at this stage is premature. And in many cases you’d be right. But in Compass’s case, we’re juggling messages that contain sensitive information. And as such, we’ve already had requests for an on-premise version. Since enterprise customers are typically your largest and most loyal customers, we’d be foolish not to factor them into our initial planning. As a small team with limited resources, we think that’s pretty sweet. Single-tenant is the clear winner here since it gives you parity across all environments an easy path to enterprise. and Maintenance Maintenance on multi-tenant systems can be scary. You push out a single update, and every customer is immediately on the new system. If you botch it you take down the entire system. Been there, done that. Not fun when it happens. With multi-tenant, deploys are typically all or nothing. If you botch it, you typically only take down a single team. Instead of deploying a single app update, you’re deploying N app updates. Instead of migrating one database, you’re migrating N databases. On the surface level it sounds like this would create more work for you, but since the systems are isolated and identical most of that work can be automated. All you need is a bit of tooling to orchestrate the updates. With single-tenant, maintenance is incremental. No need to mess around with load balancers or juggle internal feature flags. A beautiful side effect of single-tenant maintenance is that you get incremental rollouts and targeted beta releases for free. Resilience Both architectures offer their own form of resilience. Each team is serviced by multiple instances, spread across multiple regions/zones, and hosted behind load balancers. A team is unlikely to experience issues unless there is a system-wide outage. Multi-tenant creates resilience at the team level. there are very few ways to take down the entire system. A team may experience problems, but it’s unlikely those problems will extend beyond that single instance. Single-tenant creates resilience at the system level. Outside of DDOS attacks at the DNS level To account for disasters, we can throw in EBS backups, health checks, and a recovery instance that can stuff Slack events into a queue until a new server is provisioned. Now we’ve got a simple, resilient system at both the team system level. and Team system resilience? Yesssss and Plus, in general, a single angry customer is much easier to deal with than an angry mob. So chalk another one up for single-tenant here. Recovery On , every project is backed by a separate CouchDB database. Over the years we’ve found this isolation extremely valuable. Sometimes we mess things up. Sometimes the customer messes things up. Regardless of who’s to blame, , rather than simply being logically isolated within a single database. Database restorations become simple filesystem copies instead of fragile, complex database queries. Kumu disaster recovery is much easier when each customer’s data is physically isolated Security As far as I’m concerned, simple is the best kind of secure. Complex systems often give the that isn’t truly there. illusion of security If everything is running locally on a single machine, and that machine is locked down with key-based SSH access, and the only other port that machine exposes is port 443 — then I’m not losing sleep at night worrying about security breaches. Yes, the machine is exposed directly to the internet. You can easily mess up either one. But as long as part of the system is exposed, direct exposure isn’t inherently less secure than indirect exposure. any If two systems have similar exposure and one is significantly simpler, I’ll go with the simpler system every time. Less surface area. Less complexity. Easier to audit. Sold. Conclusion As with most things, there’s no holy grail here. Both architectures have their tradeoffs and both are solid solutions for the right problems. I’ve always used multi-tenant architectures in the past but in this case it just doesn’t feel like the right tool for the job. At the end of the day it’s our job as engineers to find that sweet spot at the intersection between priorities and constraints. For Compass, that sweet spot appears to be single-tenant. There’s a strong argument to be made for multi-tenant too, but for now, a single instance per team looks like the quickest way to get up and running while minimizing scaling concerns. It’s also important to note that single-tenant wouldn’t even be an option if we were hoping to allow cross-team analysis. That said, : here are the key advantages single-tenant provides for Compass We can use the same docker containers we’re using in development to run things in production. We can roll updates out team-by-team as we build out new features and collect feedback during our beta phase. We can provision instances geographically close to the customer, improving responsiveness while avoiding the need for a CDN. We can ease privacy concerns by physically isolating each team’s data. And since we don’t have a free plan, we don’t mind spending a few bucks a month to support each team (especially if that reduces operational complexity and shortens time to launch). So that’s where we’re at. At this point Compass is just a prototype but we’re hoping to build out the backend over the next few weeks. If you run an active Slack team and you’re interested in being a beta tester, let me know! You can reach me at or on Twitter. ryan@kumu.io @rymohr Do you have experience running single-tenant architectures at scale? If so I’d love to hear about it in the comments below or the related post on HN !

Instagram

Slack

Twitter

Zones

Single vs multi-tenant SAAS

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

From Zero to Manic

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Delving Into OpenTelemetry Collector (11/18/2023)

The Noonification: How to Implement a Merkle Tree in Solidity (11/12/2023)

105 Stories To Learn About K8s

From Zero to Manic

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Delving Into OpenTelemetry Collector (11/18/2023)

The Noonification: How to Implement a Merkle Tree in Solidity (11/12/2023)

105 Stories To Learn About K8s

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps