3,188 reads

Using “Server-less” Architecture to Massively Parallelize DNA Sequence Alignment via StdLib and…

by Keith HorwoodJanuary 3rd, 2017

Too Long; Didn't Read

There’s a new flavor of the day in software development, so-called “server-less”, or Function as a Service (FaaS) architecture — and for good reason. It represents the fantastic promise of indefinite scalability without having to manage servers. You can simply write code, ship it, and never worry about the context in which your code is executing or the resources it’s consuming.

People Mentioned

Companies Mentioned

featured image - Using “Server-less” Architecture to Massively Parallelize DNA Sequence Alignment via StdLib and…

If you’re new to “server-less” architecture, the term “server-less” itself is a bit of a misnomer — the servers haven’t gone anywhere. I’m sure we’ll find the term antiquated soon enough (after all, we drive automobiles, not horseless carriages), and FaaS is a much more apt descriptor: small functions that can be replicated across large compute clusters. It’s not that the servers don’t exist, it’s that your code is not executed on the same physical (or virtual) machine every time it runs — if a compute node goes down, a replica pops up to take its place.

We’re pretty excited to show you a little bit of the power of FaaS, where we’re using this new, scalable, distributed architecture to build StdLib: The Function as a Service Software Library. It’s been our goal to open the world of FaaS architecture to every developer — we truly believe this is the future of cloud-based software development. What we would like to show you today is more about a place where FaaS really shines— massive parallelization. We’ve taken the opportunity to adapt a Node.js DNA sequence alignment tool to StdLib and are thrilled to share our results.

StdLib: Parallelization via FaaS

Function as a Service architecture makes massive parallelization easy — instead of a single server, your code gets executed on an arbitrary virtual context as part of a giant compute cluster. As multiple requests come in simultaneously, more virtual contexts are created to handle additional request load. We can leverage this behavior to create a dynamically allocated parallel compute cluster.

We Can Use the Behavior of Function as a Service Architecture for Massive Parallelization

The idea of creating dynamically allocated parallel compute clusters is a dream come true to anybody working with large datasets and MapReduce jobs. We’ve had a number of customers specifically ask us how StdLib fares at MapReduce and parallelization — so we took it upon ourselves to create an internal case study that pushes the limits of our platform. To do this, we adapted a Node.js-based DNA sequence alignment tool, NtSeq, to show off StdLib’s ability to handle parallelized workflows.

Sequence Alignment: Adapting NtSeq to StdLib

We chose to adapt the NtSeq Library due to my deep expertise with it (as the original author). NtSeq comes packaged with an exhaustive, ungapped, degenerate DNA sequence alignment tool that made the rounds on Hacker News when first launched in early 2015. The mapping algorithm it’s packaged with is called NBEAM (Nucleotide Bitwise Exhaustive Alignment Mapping), and it uses bit operations in to achieve “raw metal” performance in JavaScript — about 2 nanoseconds per nucleotide comparison at 2.4GHz, which works out to about 5 processor cycles (benchmarks).

To make some clarifications as to how the algorithm works;

Exhaustive — compares every nucleotide and scores every potential alignment position, thus has a time complexity O(n²) for the alignment mapping step
Ungapped — only the raw sequence data is compared (no “gapped” variations, where a nucleotide may have been added or removed between sequences)
Degenerate — also matches “wildcard” nucleotides (not only A, T, G and C — the nucleotide W matches {A, T}, for example), a complete list of which are available on the Nucleic Acid Notation page on Wikipedia

The mapping time complexity of O(n²) is a natural bottleneck and good candidate for parallelization as both the query sequence and genome sequence you’re searching through can be split into multiple chunks and aligned independently in parallel, with a final reduction step to aggregate results.

StdLib Service Setup: Two Functions, Search and Map

If you’d like to follow along with this article, check out keithwhor/stdlib-sequence on GitHub where you can clone the repository and deploy to StdLib yourself.

We have four basic steps associated with sequence alignment and scoring:

Prepare — Read sequences and prepare them for mapping (convert to binary arrays)
Map — Perform alignment (Parallelized)
Reduce — Aggregate results from Map into single result
Sort — Sort results to give best matches

We can build our entire StdLib service using only two functions. A search function, which performs steps (1), (3) and (4) on its own and also takes care of the delegation / function calling for step (2). There needs to be a specialized map function to take care of the actual O(n²) mapping computation in step (2). The setup looks a little something like this (simplified to three parallel function invocations):

Search (Reduce) Function

Here’s what our search function looks like. I’ve outlined each step: Prepare, Map, Reduce and Sort.

Map Function

Our map function, by contrast, is a lot simpler because we’re just relying on NtSeq to do the heavy lifting for us. This is the function that gets executed in parallel. There’s heavy computation going on in the background behind the map.initialize() function call.

Putting it All Together: Results

To visualize the effects of parallelization and whether or not we can actually see a reasonable speed up for large MapReduce jobs in practice, we’ve tested running our functions locally using the StdLib command line tools where they’ll be executed on the same thread (serially) as well as on StdLib, where they’ll be executed in different contexts (in parallel). The search query is a sequence of all “A”s of various lengths, mapped against the first 1,000,000 nucleotides of the E. coli K12 genome.

You can find this service at https://stdlib.com/services/keith/sequence, or run it over HTTPS using https://keith.stdlib.com/sequence?q=A&repeat=10000&count=0&stats. It’s also available on GitHub.

Local Execution vs StdLib Parallel (“Server-less”) Execution, Three (3) Trials Each

Local (Serial) vs StdLib (Parallel) execution for different search spaces. Local Execution has a clearly linear relationship with nt² input (O(n²)), whereas StdLib execution time remains relatively stable and can be approximated as O(1) due to massive parallelization.

We ran our three trials of our search function for every set of inputs locally, and compared that to three execution trials on StdLib. The local machine is a 2.7 GHz Core i5 processor with 16GB of RAM. StdLib executes with 1.5GB of RAM when optimized. The first two tests (search space ≤ 1B nt²) have not been parallelized — keep in mind, based on the code above, additional workers are only added when the nucleotide search space exceeds 1,000,000,000 nt².

When we use these results to compare them to number of cores (workers) used, we see a pattern of parallelization that seems to follow Amdahl’s Law, indicating that for a length 1,000,000 genome we can extrapolate a maximum of around a 20–30x speedup, regardless of the number of parallel workers.

Speedup due to Parallelization via StdLib. Roughly follows Amdahl’s Law with test data provided, but should be noted that more cores also maps to an increase in search space.

Limitations: Over-the-Wire Request Time

The largest bottleneck for parallelization is over-the-wire request time. Increases in latency due to data throughput mean any sort of tasks with an O(n) time complexity are ill-suited to these workflows.

Additionally, while, in theory, new virtual contexts can be created limitlessly leading to unlimited parallelization, this is not the end result in practice. Resource management leading to request queueing on both the client- and server-side can cause pauses or performance delays, leading to a variability for massively parallel (>100 workers) jobs that can be in the range of 1–3x. This is compensated for by the fact that these large jobs often come with time savings far greater than 3x, as is the case here. We’re constantly working on improving this performance.

Conclusion: StdLib as a MapReduce Powerhouse

Based on the work and upgrades we’ve made to StdLib over the past few weeks, we’re confident in stating that StdLib is ready for MapReduce primetime. The benefits you’ll get over AWS Lambda alone are immediate;

a much simpler developer workflow — deployments as simple as “lib up”
we support request and response workloads of up to 128MB, alleviating issues around large job batching (compared to AWS Lambda’s 5MB)
an active community of thousands of developers (join our Slack channel by clicking the header at stdlib.com)
an enthusiastic development team ready to respond to any and all questions
our developer tier is free to use, and as we roll out updates we will be providing platform credits to early adopters

If you have a large job or workload just itching to be parallelized and you want to know more, please contact me directly at keith [at] stdlib [dot] com.

Thanks for reading, and we hope you enjoy! We’ll be rolling out many more updates to the platform in the coming months. You can follow us on Twitter, @StdLibHQ, if you’d like to keep up to date.

Keith Horwood is part-hobbyist programmer, part-biochemist, part-systems engineer, and everything in between. Currently the founder and CEO of StdLib, he’s also the author of the Nodal API framework and he’s working to make building web-based software and APIs as easy as possible for developers, teams and companies around the world.

You can follow him on Twitter, @keithwhor.