There’s a new flavor of the day in software development, so-called “server-less”, or Function as a Service (FaaS) architecture — and for good reason. It represents the fantastic promise of indefinite scalability without having to manage servers. You can simply write code, ship it, and never worry about the context in which your code is executing or the resources it’s consuming.
If you’re new to “server-less” architecture, the term “server-less” itself is a bit of a misnomer — the servers haven’t gone anywhere. I’m sure we’ll find the term antiquated soon enough (after all, we drive automobiles, not horseless carriages), and FaaS is a much more apt descriptor: small functions that can be replicated across large compute clusters. It’s not that the servers don’t exist, it’s that your code is not executed on the same physical (or virtual) machine every time it runs — if a compute node goes down, a replica pops up to take its place.
We’re pretty excited to show you a little bit of the power of FaaS, where we’re using this new, scalable, distributed architecture to build StdLib: The Function as a Service Software Library. It’s been our goal to open the world of FaaS architecture to every developer — we truly believe this is the future of cloud-based software development. What we would like to show you today is more about a place where FaaS really shines— massive parallelization. We’ve taken the opportunity to adapt a Node.js DNA sequence alignment tool to StdLib and are thrilled to share our results.
Function as a Service architecture makes massive parallelization easy — instead of a single server, your code gets executed on an arbitrary virtual context as part of a giant compute cluster. As multiple requests come in simultaneously, more virtual contexts are created to handle additional request load. We can leverage this behavior to create a dynamically allocated parallel compute cluster.
We Can Use the Behavior of Function as a Service Architecture for Massive Parallelization
The idea of creating dynamically allocated parallel compute clusters is a dream come true to anybody working with large datasets and MapReduce jobs. We’ve had a number of customers specifically ask us how StdLib fares at MapReduce and parallelization — so we took it upon ourselves to create an internal case study that pushes the limits of our platform. To do this, we adapted a Node.js-based DNA sequence alignment tool, NtSeq, to show off StdLib’s ability to handle parallelized workflows.
To make some clarifications as to how the algorithm works;
The mapping time complexity of O(n²) is a natural bottleneck and good candidate for parallelization as both the query sequence and genome sequence you’re searching through can be split into multiple chunks and aligned independently in parallel, with a final reduction step to aggregate results.
If you’d like to follow along with this article, check out keithwhor/stdlib-sequence on GitHub where you can clone the repository and deploy to StdLib yourself.
We have four basic steps associated with sequence alignment and scoring:
We can build our entire StdLib service using only two functions. A search function, which performs steps (1), (3) and (4) on its own and also takes care of the delegation / function calling for step (2). There needs to be a specialized map function to take care of the actual O(n²) mapping computation in step (2). The setup looks a little something like this (simplified to three parallel function invocations):
Here’s what our search function looks like. I’ve outlined each step: Prepare, Map, Reduce and Sort.
Our map function, by contrast, is a lot simpler because we’re just relying on NtSeq to do the heavy lifting for us. This is the function that gets executed in parallel. There’s heavy computation going on in the background behind the map.initialize() function call.
To visualize the effects of parallelization and whether or not we can actually see a reasonable speed up for large MapReduce jobs in practice, we’ve tested running our functions locally using the StdLib command line tools where they’ll be executed on the same thread (serially) as well as on StdLib, where they’ll be executed in different contexts (in parallel). The search query is a sequence of all “A”s of various lengths, mapped against the first 1,000,000 nucleotides of the E. coli K12 genome.
You can find this service at https://stdlib.com/services/keith/sequence, or run it over HTTPS using https://keith.stdlib.com/sequence?q=A&repeat=10000&count=0&stats. It’s also available on GitHub.
Local Execution vs StdLib Parallel (“Server-less”) Execution, Three (3) Trials Each
Local (Serial) vs StdLib (Parallel) execution for different search spaces. Local Execution has a clearly linear relationship with nt² input (O(n²)), whereas StdLib execution time remains relatively stable and can be approximated as O(1) due to massive parallelization.
We ran our three trials of our search function for every set of inputs locally, and compared that to three execution trials on StdLib. The local machine is a 2.7 GHz Core i5 processor with 16GB of RAM. StdLib executes with 1.5GB of RAM when optimized. The first two tests (search space ≤ 1B nt²) have not been parallelized — keep in mind, based on the code above, additional workers are only added when the nucleotide search space exceeds 1,000,000,000 nt².
When we use these results to compare them to number of cores (workers) used, we see a pattern of parallelization that seems to follow Amdahl’s Law, indicating that for a length 1,000,000 genome we can extrapolate a maximum of around a 20–30x speedup, regardless of the number of parallel workers.
Speedup due to Parallelization via StdLib. Roughly follows Amdahl’s Law with test data provided, but should be noted that more cores also maps to an increase in search space.
The largest bottleneck for parallelization is over-the-wire request time. Increases in latency due to data throughput mean any sort of tasks with an O(n) time complexity are ill-suited to these workflows.
Additionally, while, in theory, new virtual contexts can be created limitlessly leading to unlimited parallelization, this is not the end result in practice. Resource management leading to request queueing on both the client- and server-side can cause pauses or performance delays, leading to a variability for massively parallel (>100 workers) jobs that can be in the range of 1–3x. This is compensated for by the fact that these large jobs often come with time savings far greater than 3x, as is the case here. We’re constantly working on improving this performance.
Based on the work and upgrades we’ve made to StdLib over the past few weeks, we’re confident in stating that StdLib is ready for MapReduce primetime. The benefits you’ll get over AWS Lambda alone are immediate;
If you have a large job or workload just itching to be parallelized and you want to know more, please contact me directly at keith [at] stdlib [dot] com.
Thanks for reading, and we hope you enjoy! We’ll be rolling out many more updates to the platform in the coming months. You can follow us on Twitter, @StdLibHQ, if you’d like to keep up to date.
Keith Horwood is part-hobbyist programmer, part-biochemist, part-systems engineer, and everything in between. Currently the founder and CEO of StdLib, he’s also the author of the Nodal API framework and he’s working to make building web-based software and APIs as easy as possible for developers, teams and companies around the world.
You can follow him on Twitter, @keithwhor.