Docker Compose + GPU + TensorFlow = ❤️

Written by deepsystems | Published 2017/08/22
Tech Story Tags: docker | deep-learning | tensorflow | docker-compose | gpu

TLDRvia the TL;DR App

Docker is awesome — more and more people are leveraging it for development and distribution. Instant environment setup, platform independent apps, ready-to-go solutions, better version control, simplified maintenance: Docker has a lot of benefits.

But when it comes to data science and deep learning, there is a certain hitch. You have to memorize all those docker flags to share ports and files between host and container, create unnecessary run.sh scripts and deal with CUDA versions and GPU sharing. If you have ever seen this error, you know the pain:

$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch

Our goal

The purpose of this small post is to introduce you a sufficient set of Docker utilities and GPU-ready boilerplate we often use in our company.

So, instead of this:

docker run \--rm \--device /dev/nvidia0:/dev/nvidia0 \--device /dev/nvidiactl:/dev/nvidiactl \--device /dev/nvidia-uvm:/dev/nvidia-uvm \

You will end up with this:

doc up

Cool, right?

What do we actually want to achieve:

  • Manage our application state (run, stop, remove) using one command
  • Save all those run flags to a single configuration file we can commit to a git repo
  • Forget about GPU driver version mismatch and sharing
  • Use GPU-ready containers in production tools like Kubernetes or Rancher

So here is the list of tools we highly recommend for every deep learner:

1. CUDA

First, you will need CUDA toolkit. It’s an absolute must-have, if you plan to train models yourself. We recommend to use runfile installer type instead of deb, because it won’t mess your dependencies in future updates.

(Optional) How to check if it works:

cd /usr/local/cuda/samples/1_Utilities/deviceQuerymake./deviceQuery # Should print "Result = PASS"

2. Docker

You don’t want to pollute your computer with tons of libraries and be afraid of broken versions hell. Also, you won’t have to build and install stuff yourself — usually, software is already built for you and packed in image! Installing Docker is simple:

curl -sSL https://get.docker.com/ | sh

3. Nvidia Docker

A must have utility from NVIDIA if you use Docker — it really simplifies using GPU inside Docker containers.

Installation is really simple:

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.debsudo dpkg -i /tmp/nvidia-docker*.deb

Now, instead of sharing nvidia devices every time like this:

docker run --rm --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm 

you can use a nvidia-docker command:

nvidia-docker run --rm nvidia/cuda nvidia-smi

Also, you can stop worrying about driver version mismatch: docker plugin from Nvidia will solve your problems.

4. Docker Compose

Super useful utility that allows you to store docker run configuration in a file and manage application state more easily. Though it was designed to “compose” multiple docker containers together, docker compose is still very useful when you only have one service. Pick the stable version here:

curl -L https://github.com/docker/compose/releases/download/1.15.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-composechmod +x /usr/local/bin/docker-compose

5. Nvidia Docker Compose

Unfortunately, Docker Compose doesn’t know that Nvidia Docker exists. Lucky, there is a solution: a tiny Python script that generates configuration with nvidia-docker driver. Install it using pip:

pip install nvidia-docker-compose

Now you can use nvidia-docker-compose command instead of docker-compose.

Alternative

If you don’t want to use nvidia-docker-compose, you can pass volume-driver manually. Just add those options to your docker-compose.yml:

# Your nvidia driver version herevolumes:nvidia_driver_375.26:external: true...volumes:- nvidia_driver_375.26:/usr/local/nvidia:ro

6. Bash aliases

But nvidia-docker-compose is 21 characters to type! That’s too much.

Lucky we can use bash aliases. Open ~/.bashrc (sometimes ~/.bash_profile) in your favorite editor and type those lines:

alias doc='nvidia-docker-compose'alias docl='doc logs -f --tail=100'

Update your settings by running source ~/.bashrc.

Start a TensorFlow service

Now we are ready to use benefits from all those stuff above. For example, let’s run a Tensorflow GPU-enable Docker container.

In a project directory create file docker-compose.yml with the following content:

version: '3'

services:tf:image: gcr.io/tensorflow/tensorflow:latest-gpuports:- 8888:8888volumes:- .:/notebooks

Now we can start TensorFlow Jupiter with a single command:

doc up

doc is an alias for nvidia-docker-compose — it will generate modified configuration file nvidia-docker-compose.yml with correct volume-driver and then run docker-compose.

You can manage your service using the same command:

doc logsdoc stopdoc rm# ...etc

Conclusion

But is it worth the effort? Let’s weigh the pros and cons here.

Pros

  • Forget about GPU device sharing
  • You don’t have to worry about Nvidia driver version anymore
  • We got rid of command flags in favour of clean and plain configuration
  • No more --name flag to manage container state
  • Well-known documented and widely used utilities
  • Your configuration is ready for orchestration tools like Kubernetes that understand docker-compose files

Cons

  • You have to install more tools

Is it production-ready?

Yep. In our movies recommendation service Movix we use GPU-accelerated TensorFlow network to calculate real time film selection based on user input.

We have three computers with Nvidia Titan X in Rancher cluster behind Proxy API. Configuration is stored in regular docker-compose.yml files: because of that it’s really ease to setup development environment or deploy application on a new server. So far it works perfect.

Be prepared for the future of ML!

If you have any questions or comments, feel free to write here or on twitter @deepsystemsru.


Published by HackerNoon on 2017/08/22