Today we’re going to build our own Deep Learning Dream Machine.
This machine will slice through neural networks like a hot laser through butter. Other than forking over $129,000 for Nvidia’s DGX-1, the AI supercomputer in a box, you simply can’t get better performance than what I’ll show you right here.
Before we dig into building a DL beast, I want to give you the easiest upgrade path.
If you don’t want to build an entirely new machine, you still have one perfectly awesome option.
Simply upgrade your GPU (with either a Titan X or a GTX 1080) and get VMware Workstation or use another virtualization software that supports GPU acceleration! Or you could simply install Ubuntu bare metal and if you need a Windows machine run that in a VM, so you max your performance for deep learning.
Install Ubuntu and the DL frameworks using the tutorial at the end of the article and bam! You just bought yourself a deep learning superstar on the cheap!
All right, let’s get to it.
I’ll mark dream machine parts and budget parts like so:
CPUs are no longer the center of the universe. AI applications have flipped the script. If you’ve ever build a custom rig for gaming, you probably pumped it up with the baddest Intel chips you could get your hands on.
But times change.
The most important component of any deep learning world destroyer is the GPU(s).
While AMD have made headway in cyptocoin mining in the last few years, they have yet to make their mark on AI. That will change soon, as they race to capture a piece of this exploding field, but for now Nvidia is king. And don’t sleep on Intel either. They purchased Nervana Systems and plan to put out their own deep learning ASICs in 2017.
The king of DL GPUs
Let’s start with MINO. The ultimate GPU is the Titan X. It has no competition.
It’s packed with 3584 CUDA cores at 1531 MHz, 12GB of G5X and it boasts a memory speed of 10 Gbps.
In DL, cores matter and so does more memory close to those cores.
DL is really nothing but a lot of linear algebra. Think of it as an insanely large Excel sheet. Crunching all those numbers would slaughter a standard 4 or 8 core Intel CPU.
Moving data in and out of memory is a massive bottleneck, so more memory on the card makes all the difference, which is why the Titan X is the king of the world.
You can get Titan X directly from Nvidia for $1,200 MSRP. Unfortunately, you’re limited to two. But this is a Dream Machine and we’re buying four. That’s right quad SLI!
For that you’ll need to pay a slight premium from a third party seller. Feel free to get two from Nvidia and two from Amazon. That will bring you to $5300, by far the bulk of the cost for this workstation.
Now if you’re just planning to run Minecraft, it’ll still look blocky but if you want to train a model to beat cancer, these are your cards. :)
Gaming hardware benchmark sites will tell you that anything more than two cards is well past the point of diminishing returns but that’s just for gaming ! When it comes to AI you’ll want to hurl as many cards at it as you can. Of course, AI has its point of diminishing returns too but it’s closer to dozens or hundreds of cards (depending on the algo), not four. So stack up, my friend.
Please note you will NOT need an SLI bridge, unless you’re also planning to use this machine for gaming. That’s strictly for graphics rendering and we’re doing very little graphics here, other than plotting a few graphs in matplotlib.
Budget-Friendly Alternative GPUs
Your ADAD card is the GeForce GTX 1080 Founders Edition. The 1080 packs 2560 CUDA cores, a lot less than the Titan X, but it rings in at half the price, with an MSRP of $699.
It also boasts less RAM, at 8GB versus 12.
EVGA has always served me well so grab four of them for your machine. At $2796 vs $5300, that’s a lot of savings for nearly equivalent performance.
The second best choice for ADAD is the GeForce GTX 1070. It packs 1920 CUDA cores so it’s still a great choice. It comes in at around $499 MSRP but superclocked EVGA 1070s will run you only $389 bucks so that brings the price to a more budget-friendly $1556. Very doable.
Of course if you don’t have as much money to spend you can always get two or three cards. Even one will get you moving in the right direction.
Let’s do the math on best bang for the buck with two or three cards:
The sweet spot is 3 GTX 1080s. For half the price you’re only down 3072 cores. Full disclosure: That’s how I built my workstation.
SSD and Spinning Drive
You’ll want an SSD, especially if you’re building Convolutional Neural Nets and working with lots of image data. The Samsung 850 EVO 1 TB is the best of the best right now. Even better, SSD prices have plummeted in the last year, so it won’t break the bank. The 850 1 TB currently comes in at about $319 bucks.
The ADAD version of the 850 is the 250GB version. It’s very easy on the wallet at $98.
You’ll also want a spindle drive for storing downloads. Datasets can be massive in DL. A 4 TB Seagate Barracuda will do the trick.
Because we want to stuff four GPUs into this box your motherboard options narrow to a very small set of choices. To support four cards at full bus speeds we want the MSI Extreme Gaming X99A SLI Plus.
You can also go with the ASUS X99 Deluxe II.
If you go with less than four cards you have many more options. When it comes to motherboards, I favor stability. I learned this the hard way building cryptocoin mining rigs. If you run your GPUs constantly they’ll burn your machine to the ground in no time. Gigabyte make an excellent line of very durable motherboards. The X99 Ultra Gaming is absolutely rock solid and comes in at $237.
The Cooler Master Cosmos II is the ultimate full tower case. It’s sleek and stylish racecar design of brushed aluminum and steel make for one beautiful machine.
If you want a mid-tower case, you can’t go wrong with the Cooler Master Maker 5T.
I never favor getting a cheap-ass case for any machine. As soon as you have to open it to troubleshoot it, your mistake becomes glaringly clear. Tool-less cases are ideal. But there are plenty of decent budget cases out there so do your homework.
Your deep learning machine doesn’t need much CPU power. Most apps are single threaded as they load the data into the GPUs where they do multicore work, so don’t bother spending a lot of capital here.
That said, you might as well get the fastest clock speed for your processor, which is 4GHz on the i7–6700K. You can snag it here with a fan. Frankly, it’s ridiculous overkill here but prices have dropped drastically and I was looking for single-threaded performance. This is the CPU to beat.
If you want to go quieter then you can go with watercooling but you won’t be running the CPU that hard. Most of the fan noise will come from the GPUs.
There’s no great ADAD alternative here. The i5 at 3.5GHz with a water cooler runs about the same cost as the 4GHz so why bother?
The EVGA Modular 1600W Supernova G2 power supply is your best bet for a quad SLI setup. It will run you about $305 bucks.
The Titan X’s pull about 250 Watts each which brings you to 1000W easy. That doesn’t leave much overhead for CPU, memory, and systems power so go with the biggest supply to leave some head room.
If you’re rocking less cards than go with the 1300W version, which drops the price to a more manageable $184.
Now that we’re done with the hardware, let’s get to the software setup.
You have three options:
If you want to go with the Docker option, you’ll want to start with the official Nvidia-Docker project as a foundation. However to really get all of the frameworks, libraries and languages you’ll have to do a lot of installation on top of this image.
You can go with an all-in-one deep learning container, like this one on GitHub.
I wanted to love the all-in-one Docker image, but it has a few issues, no surprise considering the complexity of the setup.
I found the answer to one issue (libopenjpeg2 is now libopenjpeg5 on Ubuntu 16.04 LTS) but I got tired of troubleshooting a second one. I’m still waiting on fixes. If you’re the type of person who likes fixing Dockerfiles and submitting fixes on GitHub, I encourage you to support the all-in-one project.
A second major challenge is that it’s a very, very big image, so it won’t fit on Dockerhub due to timeouts. That means you’ll have to build it yourself and that can take several hours of compiling and pulling layers and debugging, which is about as much time as you need to do it bare metal.
Lastly, it doesn’t include everything I wanted, including Anaconda Python.
In the end I decided to use the all-in-one bare metal tutorial as a guide, while updating it and adding my own special sauce.
As I noted in the TL;DR section at the beginning of the doc, you can absolutely upgrade a current gaming machine, add VMware Workstation Pro, which supports GPU passthrough, and have a nice way to get started on a shoestring. This is a strong budget-friendly strategy. It also has several advantages, in that you can easily backup the virtual machine, snapshot and roll it back. It doesn’t start as fast as a Docker container, but VM tech is very mature at this point and that gives you a lot of tools and best practices.
This is the option I ended up going with on my machine. It’s a little old school, but as a long time sys-admin it made the most sense to me, as it gave me the ultimate level of control.
A few things of note about the software for deep learning before we get started.
You’ll find that the vast majority of AI research is done in Python. That’s because it’s an easy language to learn and setup. I’m not sure that Python will end up as the primary language once AI moves into production but for now Python is the way to go. A number of the major frameworks run on top of it and its scientific libraries are second to none.
The R language gets a lot of love too, as well as Scala, so we will add those to the equation.
Here are a list of the major packages we’ll set up in this tutorial:
Drivers and APIs
There area whole host of libraries that pretty much any scientific computing system will need to run effectively. So let’s install the most common ones off the bat.
Pip = an installer and packaging system for PythonPandas = high-performance data analysisScikit-learn = a popular and powerful machine learning libraryNumPy = numerical PythonMatplotlib = visualization libraryScipy = math and scientific computingIPython = interactive PythonScrappy = web crawling frameworkNLTK = natural language toolkitPattern = a web mining librarySeaborn = statistical visualizationOpenCV = a computer vision libraryRpy2 = an R interfacePy-graphviz = statistical graphingOpenBLAS = linear algebraLinux Workstation Setup
For cutting-edge work, you’ll want to get the latest version of Ubuntu LTS, which is 16.04 at the time of writing. I’m looking forward to the days when more of the tutorials cover Red Hat and Red Hat derivatives like CentOS and Scientific Linux but as of now Ubuntu is where it’s at for deep learning. I may follow up with an RH centric build as well.
Get Ubuntu burned to a USB stick via Rufus.
Get it installed in UEFI mode.
Your first boot will go to a black screen. That’s because the open source drivers are not up to date with the latest and greatest chipsets. To fix that you’ll need to do the following:
As the machine boots, get to a TTY:
Ctrl + Alt + F1
Get the latest Nvidia drivers and reboot:
Log into your root account in the TTY.Run sudo apt-get purge nvidia-*Run sudo add-apt-repository ppa:graphics-drivers/ppa and then sudo apt-get updateRun sudo apt-get install nvidia-375Reboot and your graphics issue should be fixed.Update the machine
Open a terminal and type the following:
sudo apt-get update -y sudo apt-get upgrade -y sudo apt-get install -y build-essential cmake g++ gfortran git pkg-config python-dev software-properties-common wget sudo apt-get autoremove sudo rm -rf /var/lib/apt/lists/*
Download CUDA 8 from Nvidia. Go to the downloads directory and install CUDA:
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local.deb sudo apt-get update -y sudo apt-get install -y cuda
Add CUDA to the environment variables:
echo ‘export PATH=/usr/local/cuda/bin:$PATH’ >> ~/.bashrc echo ‘export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH’ >> ~/.bashrc source ~/.bashrc
Check to make sure the correct version of CUDA is installed:
Restart your computer:
sudo shutdown -r now
Check your CUDA Installation
First install the CUDA samples:
/usr/local/cuda/bin/cuda-install-samples-*.sh ~/cuda-samples cd ~/cuda-samples/NVIDIA*Samples make -j $(($(nproc) + 1))
Note that the make section of this command uses +1 to indicate the number of GPUs that you have, so if you have more than one you can up the number and install/compile will move a lot faster.
Run deviceQuery and ensure that it detects your graphics card and that the tests pass:
cuDNN is a GPU accelerated library for DNNs. Unfortunately, you can’t just grab it from a repo. You’ll need to register with Nvidia to get access to it, which you can do right here. It can take a few hours or a few days to get approved for access. Grab version 4 and version 5. I installed 5 in this tutorial.
You will want to wait until you get this installed before moving on, as other frameworks depend on it and may fail to install.
Extract and copy the files:
cd ~/Downloads/ tar xvf cudnn*.tgz cd cuda sudo cp */*.h /usr/local/cuda/include/ sudo cp */libcudnn* /usr/local/cuda/lib64/ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Do a check by typing:
That should output some GPU stats.
sudo apt-get install -y python-pip python-dev sudo apt-get update && apt-get install -y python-numpy python-scipy python-nose python-h5py python-skimage python-matplotlib python-pandas python-sklearn python-sympy libfreetype6-dev libpng12-dev libopenjpeg5 sudo apt-get clean && sudo apt-get autoremove rm -rf /var/lib/apt/lists/*
Now install the rest of the libraries with Pip
pip install seaborn rpy2 opencv-python pygraphviz pattern nltk scrappy
pip install tensorflow-gpu
That’s it. Awesome!
$ python ... >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> print(sess.run(hello)) Hello, TensorFlow! >>> a = tf.constant(10) >>> b = tf.constant(32) >>> print(sess.run(a + b)) 42 >>>
sudo apt-get install -y <code class="markup--code markup--pre-code u-paddingRight0 u-marginRight0">libblas-test<strong> </strong>libopenblas-base libopenblas-dev</code>
Juypter is an awesome code sharing format that let’s you easily share “notebooks” with code and tutorials. I will detail using it in the next post.
pip install -U ipython[all] jupyter
Install the pre-requisites and install Theano.
sudo apt-get install -y python-numpy python-scipy python-dev python-pip python-nose g++ python-pygments python-sphinx python-nose sudo pip install Theano
Yes that’s a capital in Theano.
Test your Theano installation. There should be no warnings/errors when the import command is executed.
python >>> import theano >>> exit() nosetests theano
Keras is an incredibly popular high level abstraction wrapper that can surf on top of Theano and Tensorflow. It’s installation and usage are so dead simple it’s not even funny.
sudo pip install keras
Lasagne is another widely used high level wrapper that’s a bit more flexible than Keras in that you can easily color outside the lines. Think of Keras as deep learning on rails and Lasagne as the next step in your evolution. The instructions for Lasagne install come from here.
pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/v0.1/requirements.txt
Installing MXNet on Ubuntu
From the website:
MXNet currently supports Python, R, Julia, and Scala. For users of Python and R on Ubuntu operating systems, MXNet provides a set of Git Bash scripts that installs all of the required MXNet dependencies and the MXNet library.
The simple installation scripts set up MXNet for Python and R on computers running Ubuntu 12 or later. The scripts install MXNet in your home folder ~/mxnet.
Install MXNet for Python
Clone the MXNet repository. In terminal, run the commands WITHOUT “sudo”:
git clone <a href="https://github.com/dmlc/mxnet.git" target="_blank">https://github.com/dmlc/mxnet.git</a> ~/mxnet --recursive
We’re building with GPUs, so add configurations to config.mk file:
cd ~/mxnet cp make/config.mk . echo "USE_CUDA=1" >>config.mk echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk echo "USE_CUDNN=1" >>config.mk
Install MXNet for Python with all dependencies:
cd ~/mxnet/setup-utils bash install-mxnet-ubuntu-python.sh
Add it to your path:
Install MXNet for R
We’ll need R so let’s do that now. The installation script to install MXNet for R can be found here. The steps below call that script after setting up the R language.
First add the R repo:
sudo echo “deb <a href="http://cran.rstudio.com/bin/linux/ubuntu" target="_blank">http://cran.rstudio.com/bin/linux/ubuntu</a> xenial/” | sudo tee -a /etc/apt/sources.list
Add R to the Ubuntu Keyring:
gpg — keyserver keyserver.ubuntu.com — recv-key E084DAB9 gpg -a — export E084DAB9 | sudo apt-key add -
sudo apt-get install r-base r-base-dev
Install R-Studio (altering the command for the correct version number):
sudo apt-get install -y gdebi-core wget https://download1.rstudio.org/rstudio-0.99.896-amd64.deb sudo gdebi -n rstudio-0.99.896-amd64.deb rm rstudio-0.99.896-amd64.deb
Now install MXNet for R:
cd ~/mxnet/setup-utils bash install-mxnet-ubuntu-r.sh
These instructions come from the Caffe website. I found them to be a little flaky depending on how the wind was blowing that day, but your mileage may vary. Frankly, I don’t use Caffe all that much and many of the beginner tutorials out there won’t focus on it, so if this part screws up for you, just skip it for now and come back to it.
Install the prerequisites:
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler sudo apt-get install -y --no-install-recommends libboost-all-dev sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
Clone the Caffe repo:
cd ~/git git clone https://github.com/BVLC/caffe.git cd caffe cp Makefile.config.example Makefile.config
To use cuDNN set the flag USE_CUDNN := 1 in the Makefile:
sed -i ‘s/# USE_CUDNN := 1/USE_CUDNN := 1/‘ Makefile.config
Modify the BLAS parameters value to open:
<code class="markup--code markup--pre-code">sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config</code>
Install the requirements, then build Caffe, build the tests, run the tests and ensure that the all tests pass. Note that all this takes some time. Note again that the +1 indicates the number of GPUs to build Caffe with, so up it if you have more than one.
sudo pip install -r python/requirements.txt make all -j $(($(nproc) + 1)) make test -j $(($(nproc) + 1)) make runtest -j $(($(nproc) + 1))
Build PyCaffe, the Python interface to Caffe:
make pycaffe -j $(($(nproc) + 1))
Add Caffe to your environment variable:
echo ‘export CAFFE_ROOT=$(pwd)’ >> ~/.bashrc echo ‘export PYTHONPATH=$CAFFE_ROOT/python:$PYTHONPATH’ >> ~/.bashrc source ~/.bashrc
Test to ensure that your Caffe installation is successful. There should be no warnings/errors when the import command is executed.
ipython >>> import caffe >>> exit()
Here are the Torch install instructions from the Torch website. I’ve had some struggles with this framework installing but this usually works for most people.
git clone https://github.com/torch/distro.git ~/git/torch — recursive cd torch; bash install-deps; ./install.sh
sudo apt-get -y install scala
Download Anaconda for Python 3.6 right here. It will also have a 2.7.x version as well.
sudo bash Anaconda3–4.3.0-Linux-x86_64.sh
Do NOT add it to your bashrc or when you reboot Python will default to Anaconda. It is set to “no” by default in the script but you might be tempted to do it as I was at first. Don’t. You’ll want to keep the default pointed to Ubuntu’s Python as a number of things are dependent on it.
Besides Anaconda let’s you create environments that let you move back and forth between versions.
Let’s create two Anaconda environments:
conda create -n py2 python=2.7 conda create -n py3 python=3.6
Activate the 3 environment:
source activate py3
Now let’s install all the packages for Anaconda:
conda install pip pandas scikit-learn scipy numpy matplotlib ipython-notebook seaborn opencv scrappy nltk pattern
Now we install pygraphviz and the R bridge with pip which aren’t in Conda:
pip install pygraphviz rpy2
sudo shutdown -r now
Install Tensorflow, Theano, and Keras for Anaconda
You’ll install these libraries for both the Python 2 and 3 versions of Anaconda. You may get better performance using the Anaconda backed libraries, as they contain performance optimizations.
Let’s do Python 3 first:
source activate py3 pip install tensorflow Theano keras
Now deactivate the environment and activate the py2 environment:
Activate the Python 2 environment:
source activate py2
Install for py2:
pip install tensorflow Theano keras
Deactivate the environment:
Now you’re back in the standard Ubuntu shell with the built in Python 2.7.x with all the frameworks we installed for the standard Python that comes with Ubuntu.
There you have it. You’ve purchased a top notch machine or a budget-friendly alternative. You’ve also got it setup with the latest and greatest software for deep learning.
Now get ready to do some heavy number crunching. Dig up a tutorial and get to work! Be on the look out for the next article in my series, which dives into my approach to the Kaggle Data Science Bowl 2017, which races to beat lung cancer for a chance at prizes totaling one million dollars.
Again, be sure to check out the other articles in this series if you missed them:
Learning AI if You Suck at Math — Part 1 — This article guides you through the essential books to read if you were never a math fan but you’re learning it as an adult.
Learning AI if You Suck at Math — Part 2 — Practical Projects — This article guides you through getting started with your first projects.
Learning AI if You Suck at Math — Part 3 — Building an AI Dream Machine — This article guides you through getting a powerful deep learning machine setup and installed with all the latest and greatest frameworks.
Learning AI if You Suck at Math — Part 4 — Tensors Illustrated (with Cats!) — This one answers the ancient mystery: What the hell is a tensor?
Learning AI if You Suck at Math — Part 5 — Deep Learning and Convolutional Neural Nets in Plain English — Here we create our first Python program and explore the inner workings of neural networks!
Learning AI if You Suck at Math — Part 6 — Math Notation Made Easy — Still struggling to understand those funny little symbols? Let’s change that now!
Learning AI if You Suck at Math — Part 7 — The Magic of Natural Language Processing — Understand how Google and Siri understand what you’re mumbling.
If you love my work please do me the honor of visiting my Patreon page because that’s how we change the future together. Help me disconnect from the Matrix and I’ll repay your generosity a hundred fold by focusing all my time and energy on writing, research and delivering amazing content for you and world.
If you enjoyed this tutorial, I’d love it if you could clap it up to recommend it to others. After that please feel free email the article off to a friend! Thanks much.
A bit about me: I’m an author, engineer and serial entrepreneur. During the last two decades, I’ve covered a broad range of tech from Linux to virtualization and containers.
You can check out my latest novel, an epic Chinese sci-fi civil war saga where China throws off the chains of communism and becomes the world’s first direct democracy, running a highly advanced, artificially intelligent decentralized app platform with no leaders.
You can get a FREE copy of my first novel, The Scorpion Game, when you join my Readers Group. Readers have called it “the first serious competition to Neuromancer” and“Detective noir meets Johnny Mnemonic.”Lastly, you can join my private Facebook group, the Nanopunk Posthuman Assassins, where we discuss all things tech, sci-fi, fantasy and more.
I occasionally make coin from the links in my articles but I only recommend things that I OWN, USE and LOVE. Check my full policy here.
Thanks for reading