Bringing the Udacity Self-Driving Car Nanodegree to Google Cloud Platform. The step-by-step guide. đ
Countdown to the future?
Back in June I became a student of the Udacity Self-Driving Car Nanodegree program. Itâs a nine months long curriculum that teaches you everything you need to know to work on self-driving cars. Topics include:
Our first project was to detect lane in a video feed and most of the students from my batch are now very deep into the deep learning classes.
These classes rely on Jupyter notebook running Tensorflow programs and I learned the hard way that the GeForce GT 750M on my Macbook Pro and its 384 CUDA cores were not going to cut it.
Time to move on to something beefier.
All students get AWS credits as part of the Udacity program. Itâs the de facto platform for the course and all our guides and documentation are based on it.
However Iâm a long fan of GCP, Iâm already using it for some of my personal projects and Iâm constantly looking for new excuses to use it.
This time I wanted to take advantage of these nifty features of the platform:
I thought I might end up with a pretty cheap deep learning box and also (mostly?) it seemed like I would be in for a good learning experience and some fun.
After some research I quickly discovered that Pre-emptible VMs canât have GPUs attached and that my predicted monthly usage was definitely not going to trigger sustained-use discount.
Oh well. Looks like itâs only going to be for the learn & fun then. đ
Before we dive in, a quick note about pricing. I settled on the setup below for the needs of this course:
Total per hour: $1.20/hourTotal for average CarND student usage (15h/week/month): **$77**Total for a full month (sustained use discount applied): $784.40
I havenât done any benchmarking nor did I try to optimise this setup in any way. It might be overly powerful for the purpose of this Nanodegree, or on the contrary not powerful enough, so feel free to tweak if you have a better idea of what you are doing. đ
Letâs get started.
Your first step will be to launch an instance with GPU on Google Compute Engine. There are two versions of this guide, one for Web UI lovers and another one for people who donât like to leave their terminal.
In both cases this guide assumes you already have a Google account, if you donât please create one here first.
Head to https://console.cloud.google.com.
If itâs your first time using Google Cloud you will be greeted with the following page, that invites you to create your first project:
Google Cloud Consoleâââhomepage
Good news: if itâs your first time using Google Cloud you are also eligible for $300 in credits! In order to get this credit, click on the big blue button âSign up for free trialâ in the top bar.
Click on âCreateâ and you will get to the page shown in the image below.
If you are already a GCP user, click on the project switcher in the top menu bar and click on the â+â button in the top right of the modal that shows up.
Choose a name for your project, agree to the terms of service (if you do!) and click on âCreateâ again.
If you are already using Google Cloud Platform you most likely already have a billing account and so you can select it on the project creation form.
Google Cloud consoleââânew project
If you are a new user, you will have to create a billing account once you project is created (this can sometime take a little while, be patient!). This as simple as giving your credit card details and address when you are prompted for it.
Use the project switcher in the menu to go to your projectâs dashboard.
Before you create your first instance you need to request a quota increase in order to attach a GPU to your machine. Click on the side menu (a.k.a the infamous hamburger menu) and head to âIAM & Adminâ > âQuotasâ.
Editing quotasâââstep 1
Display all the quotas using the âQuota typeâ filter above the table. Select âAll quotasâ. Then use the âRegionâ filter to select your region.
Editing quotasâââstep 2
Look for âGoogle Compute Engine APIâââNVIDIA Tesla K80 GPUsâ in the list, select it and click on âEdit quotasâ at the top.
In the menu that opens on the right, enter your personal details if they are not already pre-filled and click on âNextâ. Fill the form asking you how many GPUs you want and the reason for requesting a quota increase with these details:
Editing quotasâââstep 3
After a while (should be quite short) Google will approve your request. You are now ready to move to the next step!
Editing quotasâââresult
Click on the side menu again and go to âCompute Engineâ > âVM Instancesâ.
Youâll be greeted with a notice inviting you to create your first instance. Click on âCreateâ.
You have already chosen the specs for your server so this form will be pretty straightforward:
drive01
but itâs completely up to you!europe-west-1b
. Youâre free to choose whichever zone you want but it needs to be a zone with GPU support.n1-standard-8
, 8 vCPUs, 30GB RAM.
Create instanceâââstep 1
Next you need to select an OS and add a SSD Persistent Disk to your instance. Click on âChangeâ in the boot disk section.
Create instanceâââstep 2.a
Use these settings:
Ubuntu 16.04 LTS
.SSD Persistent Disk
.50GB
.
Create instanceâââstep 2.b
You are almost done.
Click on âManagement, disks, networking, SSH keysâ at the bottom of the form.
Create instanceâââstep 3.a
In the âNetworkingâ section add the following network tag: jupyter
. This will be useful later on when you configure the firewall.
Create instanceâââstep 3.b
The last step is to attach your SSH key to the instance. Go to the âSSH Keysâ section.
Copy and paste your SSH public key. It needs to have this format:
<protocol> <key-blob> [email protected]
This is very important as <username> will end up being the username you use to access your server.
Create instanceâââstep 3.c
Click on âCreateâ at the bottom of the page.
Create instanceâââsuccess
Note the instanceâs external IP for later.
In order to access your Jupyter notebooks you need to add a firewall rule to allow incoming traffic on port 8888.
Using the side menu go to âVPC networkâ > âFirewall rulesâ.
Firewall rulesâââstep 1
Click on âCreate firewall ruleâ.
Firewall rulesâââstep 2
Put these values in the form:
default-allow-jupyter
.default
.65534
.Ingress
. We are only interested in allowing incoming traffic in.Allow
. We want to allow traffic that matches that specific rule.Specified target tags
. We donât want to apply this rule to all instances in the project, only to a select few.jupyter
. Only the instances with the jupyter
tag will have this firewall rule applied.IP ranges
.0.0.0.0/0
. Traffic coming from any network will be allowed in. If you want to make this more restrictive you can put your own IP address.tcp:8888
. Only allow TCP traffic on port 8888.
Firewall rulesâââstep 3.a
Firewall rulesâââstep 3.b
You can skip this section if you already started your server using the web-based console.
Follow the guide available on the Google Cloud Platform documentation site to install the latest version of the Cloud SDK for your OS.
$ gcloud projects create "drive-cli" --name "Drive CLI"
If your project ID is already taken feel free to append some random number to it.
Go to the following page to enable billing for your newly created project:
https://console.developers.google.com/billing/linkedaccount?project=[YOUR_PROJECT_ID]
Note: I havenât a way to do this from the CLI but if someone reading this does, please let me know in comment.
Before you can start an instance with an attached GPU you need to ensure that you have enough GPUs available in your selected region. Failing that, you will need to request a quota increase here:
https://console.cloud.google.com/iam-admin/quotas?project=[PROJECT_ID]
Then you can follow the web version of this guide to request a quota increase.
In order to create an instance with one or more attached GPUs you need to have at least version 144.0.0
of the gcloud
command-line tool and the beta
components installed.
$ gcloud version
Google Cloud SDK 165.0.0beta 2017.03.24bq 2.0.25core 2017.07.28gcloudgsutil 4.27
If you donât meet these requirements, run the following command:
$ gcloud components update && gcloud components install beta
Create a file with the format below anywhere on your local machine:
[USERNAME]:[PUBLIC KEY FILE CONTENT]
For instance (they key was shortened for practical reasons):
steve:ssh-rsa AAAAB3NzaC1y[...]c2EYSYVVw== steve@drive01
Time to create your instance:
$ gcloud beta compute instances create drive01 \ --machine-type n1-standard-8 --zone europe-west1-b \ --accelerator type=nvidia-tesla-k80,count=1 \ --boot-disk-size 50GB --boot-disk-type pd-ssd \ --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \ --maintenance-policy TERMINATE --restart-on-failure \ --metadata-from-file ssh-keys=~/drive_public_keys \ --tags jupyter --project "[YOUR_PROJECT_ID]"
The output should be similar to this if the command runs successfully:
Created [https://www.googleapis.com/compute/beta/projects/drive-test/zones/europe-west1-b/instances/drive01].NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUSdrive01 europe-west1-b n1-standard-8 10.132.0.2 146.148.9.152 RUNNING
Write down the external IP.
$ gcloud beta compute firewall-rules create "default-allow-jupyter" \--network "default" --allow tcp:8888 \--direction "ingress" --priority 65534 \--source-ranges 0.0.0.0/0 \--target-tags "jupyter" --project "[YOUR_PROJECT_ID]"
Here is what the output should be:
Creating firewall...done.NAME NETWORK DIRECTION PRIORITY ALLOW DENYdefault-allow-jupyter default INGRESS 65534 tcp:8888
Now that your instance is launched and that the firewall is correctly setup itâs time to configure it.
Start by SSHing into your server using this command:
$ ssh -i ~/.ssh/my-ssh-key [USERNAME]@[EXTERNAL_IP_ADDRESS]
[EXTERNAL_IP_ADDRESS] is the address you wrote down a couple of steps earlier.[USERNAME] is the username that was in your SSH key.
Since itâs you first time connecting to that server you will get the following prompt. Type âyesâ.
The authenticity of host '146.148.9.152 (146.148.9.152)' can't be established.ECDSA key fingerprint is SHA256:xll+CHEOChQygmnv0OjdFIiihAcx69slTjQYdymMLT8.Are you sure you want to continue connecting (yes/no)? yes
Letâs first update the system software and install some much needed dependencies:
$ sudo apt-get update$ sudo apt-get upgrade$ sudo apt-get install -y build-essential
Luckily Ubuntu ships with both Python 2 and Python 3 pre-installed so you can move directly to the next step: installing Miniconda.
$ cd /tmp$ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh$ md5sum Miniconda3-latest-Linux-x86_64.sh
The output from the last command should be something like:
c1c15d3baba15bf50293ae963abef853 Miniconda3-latest-Linux-x86_64.sh
This is the MD5 sum of your downloaded file. You can compare it against the MD5 hashes found here to verify the integrity of your installer.
You can now run the install script:
$ bash Miniconda3-latest-Linux-x86_64.sh
This will give you the following output:
Welcome to Miniconda3 4.3.21 (by Continuum Analytics, Inc.)
In order to continue the installation process, please review the licenseagreement.Please, press ENTER to continue>>>
Press [ENTER]
to continue, and then press [ENTER]
again to read through the licence. Once you have reached the end you will be prompted to accept it:
Do you approve the license terms? [yes|no]>>>
Type âyesâ if you agree.
The installer will now ask you to choose the location of your installation. Press [ENTER]
to use the default location, or type a different location if you want to customise it:
Miniconda3 will now be installed into this location:/home/steve/miniconda3
[/home/steve/miniconda3] >>>
The installation process will begin. Once itâs finished the installer will ask you whether you want it to prepend the install location to your PATH
. This is needed to use the conda
command in your shell so type âyesâ to agree.
...installation finished.Do you wish the installer to prepend the Miniconda3 install locationto PATH in your /home/steve/.bashrc ? [yes|no][no] >>>
Here is the final output from the installation process:
Prepending PATH=/home/steve/miniconda3/bin to PATH in /home/steve/.bashrcA backup will be made to: /home/steve/.bashrc-miniconda3.bak
For this change to become active, you have to open a new terminal.
Thank you for installing Miniconda3!
Share your notebooks and packages on Anaconda Cloud!Sign up for free: https://anaconda.org
Source your .bashrc
file in order to complete the installation:
$ source ~/.bashrc
You can verify that Miniconda was successfully installed by typing the following command in your shell:
$ conda list# packages in environment at /home/steve/miniconda3:
asn1crypto 0.22.0 py36_0cffi 1.10.0 py36_0conda 4.3.21 py36_0conda-env 2.6.0 0...
In order to reap all the benefits of the Tesla K80 mounted in your server you need to install the NVIDIA CUDA Toolkit.
First, check that your GPU is properly installed:
$ lspci | grep -i nvidia00:04.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
At the time of writing the most recent version of CUDA Toolkit supported by Tensorflow is 8.0.61â1 so Iâll use this version throughout this guide. If you read this guide in the future, feel free to swap it with the newest version found on https://developer.nvidia.com/cuda-downloads.
$ cd /tmp$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
Compare the MD5 sum of this file against the one published here.
$ md5sum cuda-repo-ubuntu1604_8.0.61-1_amd64.deb1f4dffe1f79061827c807e0266568731 cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
If they match, you can proceed to the next step:
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb$ sudo apt-get update$ sudo apt-get install -y cuda-8-0
Run this command to add some environment variables to your .bashrc
file:
$ cat <<EOF >> ~/.bashrcexport CUDA_HOME=/usr/local/cuda-8.0export LD_LIBRARY_PATH=\${CUDA_HOME}/lib64export PATH=\${CUDA_HOME}/bin:\${PATH}EOF
Source your .bashrc
file again:
$ source ~/.bashrc
NVIDIA provides some sample program that will allow us to test the installation. Copy them into your home directory and build one:
$ cuda-install-samples-8.0.sh ~$ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ make
If it goes well you can now run the deviceQuery
utility to verify your CUDA installation.
$ ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla K80"CUDA Driver Version / Runtime Version 8.0 / 8.0CUDA Capability Major/Minor version number: 3.7Total amount of global memory: 11440 MBytes (11995578368 bytes)(13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores...
You can also use the nvidia-smi
utility to verify that the driver is running properly:
cuDNN is a GPU-accelerated library of primitives for deep neural networks provided by NVIDIA. If it is required by Tensorflow when you install the version with GPU support.
Tensorflow 1.2.1 (the version we are using in this guide) only supports cuDNN 5.1.
In order to download the library you will have to register for a NVIDIA developer account here.
Once your account is created your can download cuDNN here (you will have to login).
Agree to the terms and download the âcuDNN v5.1 for Linuxâ archive on your local computer. Be sure to use the version for CUDA 8.0.
NVIDIAâââcuDNNÂ v5.1
Run these commands to go into the directory where the archive was downloaded (in my case itâs ~/Downloads
) and upload it to your server:
$ cd ~/Downloads/$ scp -i ~/.ssh/my-ssh-key
cudnn-8.0-linux-x64-v5.1.tgz [USERNAME]@[EXTERNAL_IP_ADDRESS]
:/tmp
Once itâs successfully uploaded, uncompress and copy the cuDNN library to the CUDA toolkit directory:
$ cd /tmp$
Being able to push and pull to your repos on Github directly from the server is way more convenient than having to synchronise files on your laptop first. In order to enable that workflow you will add a SSH key specific to this server to your Github account.
First, generate a SSH key pair on your server:
$ ssh-keygen -o -a 100 -t ed25519 -f ~/.ssh/id_ed25519 -C steve@drive01
Enter a passphrase and choose a location for your key when you are prompted.
Read the content of the public key and copy it to your clibboard:
$ cat ~/.ssh/id_ed25519.pub
Head to your Github SSH and GPG keys settings, click on âNew SSH keyâ, give a name to your key in the âTitleâ section and paste the key itself in the âKeyâ section.
Githubâââadding a SSHÂ key
Once the key is added, you can go back to your server.
Create a new folder named carnd
in your home directory and clone the CarND-Term1-Starter-Kit
repository:
$ mkdir ~/carnd$ cd ~/carnd$ git clone [email protected]:udacity/CarND-Term1-Starter-Kit.git$ cd CarND-Term1-Starter-Kit
There is one minor change we need to do in the environment-gpu.yml
file provided by Udacity, before we create the conda environment.
Using vim, nano or whichever text editor you are most familiar with, change this:
# environment-gpu.ymlname: carnd-term1channels:- https://conda.anaconda.org/menpo- conda-forgedependencies:- python3.5.2- numpy- matplotlib- jupyter- opencv3- pillow- scikit-learn- scikit-image- scipy- h5py- eventlet- flask-socketio- seaborn- pandas- ffmpeg- imageio=2.1.2- pyqt=4.11.4- pip:- moviepy- https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp35-cp35m-linux_x86_64.whl- keras1.2.1
into this:
# environment-gpu.ymlname: carnd-term1channels:- https://conda.anaconda.org/menpo- conda-forgedependencies:- python3.5.2- numpy- matplotlib- jupyter- opencv3- pillow- scikit-learn- scikit-image- scipy- h5py- eventlet- flask-socketio- seaborn- pandas- ffmpeg- imageio=2.1.2- pyqt=4.11.4- pip:- moviepy- **tensorflow-gpu1.2.1**- keras==1.2.1
This change will ensure you grab the latest available version of Tensorflow with GPU support.
You are now ready to create the conda environment:
$ conda env create -f environment-gpu.yml
This command will pull all the specified depencies. It may take a little while.
If it successfully creates the environment you should see this output in your console:
...Successfully installed backports.weakref-1.0rc1 decorator-4.0.11 keras-1.2.1 markdown-2.6.8 moviepy-0.2.3.2 protobuf-3.3.0 tensorflow-gpu-1.2.1 theano-0.9.0 tqdm-4.11.2
# To activate this environment, use:# > source activate carnd-term1
# To deactivate this environment, use:# > source deactivate carnd-term1
Time to try your new install!
Activate the environment:
$ source activate carnd-term1
Run this short TensorFlow program in a python shell:
(carnd-term1) $ python>>> import tensorflow as tf>>> hello = tf.constant('Hello, TensorFlow!')>>> sess = tf.Session()>>> print(sess.run(hello))b'Hello, TensorFlow!'
If you get this output then congratulations, your server is ready for the Udacity Self-Driving Car Nanodegree projects.
If you get an error message, see this section on the Tensorflow website for common installation problems or leave a comment here.
Now that your server is ready, itâs time to put it to good use.
Iâll run through how to use your server using the LeNet lab as an example but these steps apply to any other Jupyter-based lab in the course.
In order to access your Jupyter notebook you need to edit the Jupyter config so that the server binds on all interfaces rather than localhost.
$ jupyter notebook --generate-config
This command will generate a config file at ~/.jupyter/jupyter_notebook_config.py
.
Using a text editor, replace this line:
#c.NotebookApp.ip = 'localhost'
with this:
c.NotebookApp.ip = '*'
Next time you launch a Jupyter notebook the internal server will bind on all interfaces instead of localhost, allowing you to access the notebook in your browser.
First, letâs retrieve the content of the lab by cloning the repository from Github.
Run these commands on your server:
$ cd ~/carnd$ git clone https://github.com/udacity/CarND-LeNet-Lab.git$ cd CarND-LeNet-Lab/
If your conda environment is not active already, activate it now:
$ source activate carnd-term1
Launch the notebook:
(carnd-term1) $ jupyter notebook
On your local machine, open your browser and head to:
http://[EXTERNAL_IP_ADDRESS]:8888
If itâs not open already, click on âLeNet-Lab-Solution.ipynbâ to launch the LeNet lab solution notebook.
Run each cell in the notebook. The network with the hyper parameters given in the solution notebook should be trained in a minute or so on that machine.
Tada! You have a fully functioning GPU instance.
Important: You are billed for each minute your server is on so donât forget to stop it once you are done using it. Note that you will still be paying a small amount for storage (~$8/month for 50GB) until you terminate the instance. Check this guide to learn how to stop or delete an instance.
Hopefully this guide convinced you that setting up a GPU-backed instance on Google Cloud Platform is as easy as on AWS.
In order to keep this guide as short and digestable as possible (I really tried!) I glossed over some very important topics:
I hope you enjoyed it, if you have any question or comment please free to get in touch using the comment section below or by emailing me directly.