First, I think it's important to discuss the motivations for building Plural since they're ultimately what guided most of our technical decisions. Plural originally came out of my enthusiasm for Kubernetes, especially the realization that the unique combination of a rich, extendable API and strong community could ultimately provide a platform on which to build self-managing applications.
But when I began investigating the Kubernetes deployments of many popular open-source applications, it became clear there was a wide chasm between even a good Kubernetes deployment and a fully hosted offering offered by a cloud provider or a mature software vendor.
This meant that while Kubernetes has a lot of technical potential until that user experience gap is closed, it's not generally commercially viable. That said, I thought the gap was closable; given how mature a lot of the tooling actually is, the big unsolved problem is delivering a workflow that allows for a consistent combination of the standard toolchain for deployment, which is what we're building at Plural.
It's worth listing exactly what those constraints are, at least as I've seen them consistently:
Applications need to be tailored to each specific cloud. This usually comes down to injecting credentials and setting up object storage/databases. Still, each cloud has its own services, APIs, and conventions, which quickly means you're navigating n sets of docs where n is the number of providers supported.
The big lift of running directly on Kubernetes is deep customizability, but that impedes a functional out-of-the-box experience, and most tools don't solve for both.
The application lifecycle needs to be solved since an unmanageable but easy to install the application is still a pile of tech debt. That requires really strong administration UX.
Additionally, we wanted a set of abstractions and principles that can scale to virtually any application, deploy to virtually any cloud, and be usable by virtually any developer.
It became clear that solving for cloud customizability and application configurability is as complex as a code management problem.
You can think of it as managing a graph of dependencies for an application between other applications and submodules needed to create the various cloud-specific resources the application needs.
Take Apache Airflow as an example; it generally needs to deploy these things:
Virtually any deployment of airflow will need to manage a sequenced installation of all those components, and any upgrade would also need to do some sort of version compatibility check to ensure all the components can play nicely together.
So how did we solve it? In general, we chose an architecture with 3 main components:
It's worth digging into some of the technical choices we made in each of these systems.
We made a somewhat unusual decision to use Elixir for our server-side code on both the api and admin console. I had previous experience building a very large elixir codebase at Frame.io and learned to love the language, but there were some unique rationales that made it, or really the entire BEAM (elixir's VM) ecosystem, a good fit for Plural as well:
Like the graphql decision, using elixir does come with tradeoffs. The most significant of which is community. Elixir is a niche language, and you don't have as large an initial well of developers to source from for it; that said, existing elixir devs love to continue working on elixir and are often high quality.
There's also an interesting ramp-up process on the language as it is a significant paradigm shift from imperative and object-oriented languages to a fully functional language with strong immutability guarantees. Part of how I've navigated that in the past is being very active in pair-programming with new developers as part of the onboarding process in the codebase.
Finally, dynamic typing is a meaningful perf hit in comparison to static typing, along with overhead imposed by immutability, especially for CPU bound work, which our server-side will do a fair amount of (especially JSON serialization). I do think some recent changes in the BEAM should improve the straight-line performance of elixir/erlang, but it's still worth noting.
When building APIs, there are two main patterns available: REST and GraphQl. I had built REST APIs at plenty of former roles, but there were two main reasons I actually preferred GraphQl.
First, it's much better supported in the browser currently. Apollo Client makes React development much easier than Redux, and also seems more performant, and I was anticipating a lot of complex UI to solve for a really challenging UX problem in making a wide slew of applications operable.
Secondly, I knew we were going to need a lot of real-time functionality in the various products, and there aren't many solid wire protocols to add on top of websockets...except for GraphQl subscriptions. Being able to have auto-typed, self-documenting websocket clients was a huge win, and I felt worth the slight novelty of GraphQl.
We have a significant portion of our product tied to a CLI distribution. The two common languages we could have used for building that were python and golang. I think golang is the obvious winner here for the ability to build easily distributed cross-platform binaries alone, but also it provides us the ability to statically link to source code for a lot of the tools we'll need as well within the Kubernetes ecosystem. We also have written a fair amount of Kubernetes operator code to manage the runtime of plural applications, and it's good to choose a language that makes it easy for our dev teams to toggle between both of those codebases.
There's infinitely more granularity to all these decisions, but we thought there might be some insights people could find helpful, or maybe just food for thought. If you are interested in learning more, everything we build is open source, so feel free to check them out, which you can find at these links (give us a star if you like what we are doing):
Or try Plural out for yourself: https://app.plural.sh