Why didn’t Lambda ship Docker out of the box? Is it because of security concerns? Why even bother, let’s do it ourselves!
Lambda is pioneering the serverless market. Look at the chart below:
I know, it’s just search activity. But you can see how much ahead of time it was in 2015.
Monopoly isn’t good for the end user. No competition = stagnating development. A year later Azure popped up with function bindings & Logic Apps. Google revealed simpler dependency management and nice dashboard.
And recently I discovered Apache OpenWhisk. This is an open-source alternative to run your own serverless platform. And unlike everything else, it can run functions from Docker images you supply.
Docker for Serverless? What’s the Point?
You may ask, “Why complicate? Why Docker? Serverless is about simple functions.”. Sure, but the reality is it’s not always that simple.
The entire world shifted to containers, and now AWS tells you to pack all your code with dependencies into 1 zip. A step back IMHO. ¯\_(ツ)_/¯
With Docker you build functions in the same way as you did for years with microservices.
Ever wanted to use bcrypt, phantom.js, ffmpeg or any other binary in your functions? Launch EC2 instance with a specific version of Amazon Linux, compile binaries, squeeze everything under 50MB and you’re good to go!
Small tip: if binary is too big, you can download it at the runtime to
/tmp. Network speed is fantastic, but watch out for cold start overhead.
With Docker you would throw it in Dockerfile and let the magic happen.
Write in any language
As for June 2017 officially supported are: Node.js, Python, Java (Groovy, Scala), C#.
Using unofficial language is cumbersome, similarly to compiling binaries.
With Docker it’s simple as
No vendor lock-in
It’s a plain docker image with your code inside. Sure, there is some custom logic which rely on FaaS API, but I feel better already.
With Docker it doesn’t matter where do you
docker pull — Google or OpenWhisk.
How Hard Is It?
Okay, so enough theory. I hope you share some of my excitement now.
To begin with, I need to know what kind of environment Lambda is running in. I want to look around, try things to see is it possible to run docker daemon there.
You can run any command there. Seriously, check it out.
So I started playing with absurd commands a-la
sudo apt-get install docker and so on. No surprise it didn’t work out.
Then I tried to install Docker from static binaries. Downloading and unpacking went fine, unless I tried to start it. Docker needs root access to start a daemon 😓. Of course it’s not available in such a limited environment as Lambda.
So how much is it limited?
- 128 to 1536 MB of RAM (and a proportional amount of CPU)
- 0.1s to 300 seconds of execution time
- 512 MB of writeable disk space under
- 250 MB of deployed code with dependencies (but 50 MB zipped)
- NO ROOT ACCESS
No sudo. Is it still possible to run Docker?
My brain replied “NO”, but heart was telling “GO RESEARCH, VLAD”.
So I did the research. What are the options?
- Use dockaless. This library wraps Docker Remote API to spawn arbitrary docker containers outside of Lambda
- Forget this stupid idea
1st option is not truly “serverless”. It requires some real servers running. So I was about to choose 2nd, unless one random day I found udocker.
Execute docker containers without root privileges.
This is insane.
Needles to say this project concluded my success. These smart guys figured out you can unpack docker image and execute it in an emulated environment. Well done.
Straight to the Code
You can find all the code on GitHub. But before some disclaimers:
- udocker uses PRoot as execution engine by default. It will not work in Lambda.
- Luckily, there is a development branch which allows you to choose one from 3 available execution engines.
- Lambda has 512 MB of disk space. No gigantic Docker images.
- You need to download udocker and docker image on every Lambda cold start (roughly every 4 hours, proof). It may take some seconds.
In Lambda, the only place you are allowed to write is
/tmp. But udocker will attempt to write to the homedir by default. I need to change its mind.
Voilà. You home is in /tmp now. Do whatever you want.
Next, let’s download udocker python script.
$ cd /tmp
$ curl https://raw.githubusercontent.com/indigo-dc/udocker/udocker-fr/udocker.py > udocker
$ python udocker version
If everything went well, you’ll see something like:
Info: creating repo: /tmp
Info: installing from tarball 1.1.0-RC2
Info: downloading: https://owncloud.indigo-datacl...
I show only what code did I run. You choose how to run it. I used
lambdash for the sake of simplicity. You may want to spawn a script from Node.js/Java/Python code you write for your functions.
Setting up & Running Ubuntu
This will download image from Docker Hub to your /tmp.
$ python udocker pull ubuntu:17.04
Next step is important. You need to create a container AND set up the execution engine mode. Since PRoot isn’t working in Lambda (bug), I tried the second option.
$ python udocker create --name=ubuntu ubuntu:17.04
$ python udocker setup --execmode=F1 ubuntu
Finally, hold your breath and run it.
$ python udocker run --nosysdirs ubuntu cat /etc/lsb-release
That’s all. This is a small proof of concept. But you got the idea. Go and experiment with images and tell me what you think in the comments below.
This Isn’t a Good Idea. But It Was Fun to Build
It’s pain in the ass. Downloading udocker with docker image every 4 hours. Sacrificing startup time. Clog your /tmp up to 100% and so on. (Not anymore, read UPD below).
But possible if you have a specific use-case. Who knows, maybe this post will facilitate the native Docker support from AWS (#awswishlist I am looking at you).
I believe it will inspire community to build next crazy ideas. Like these folks inspired me:
P.S. Thanks for reading. Click 💚 and subscribe if you want more ;)
P.P.S. UPD: A month before this post, Germán Moltó has developed a framework implementing similar ideas of running Docker in Lambda!
Meet the SCAR — Serverless Container-aware ARchitectures!