This setup wouldn’t have been possible without Lee Wei Yeong’s Linux know-how and patient guidance. Appreciate Wing Tai for letting me on his eGPU setups before thunderbolt 3 on eGPUs were a thing.
At work, colleagues and I remotely access a shared workstation that houses a few graphics cards for computational work like training deep neural networks for our research.
Images and videos are relatively large objects, making remote access feel cumbersome for some workflows.
X11 forwarding, remote desktops, apps with browser interfaces, and repetitively transferring images over winSCP did not satisfy.
Thus, I’ve set up an external graphics card with a laptop to avoid remote access for certain tasks. (But I’d still push large jobs to the shared workstation once I’ve figured out, on my laptop, the tasks to be pushed.)
An external graphics card + laptop combo lets me …
… work on the same laptop everywhere — in office, and on commutes. Plug in external graphics card as needed in the office.
… carry a lighter load. Carry graphics card & laptop, versus, carry desktop with (perhaps several) graphics cards, to deployment sites.
… avoid separating/syncing content (e.g. code, datasets) between an office desktop & laptop had I opted for such a setup.
Sections that follow detail my experience in setting up an external Nvidia TITAN X card with a laptop on Ubuntu 16.04. Core ideas in these sections also apply for externally connecting other Nvidia graphics card with an Ubuntu machine.
**Ingredient List:**- Lenovo X1 Carbon 5th gen (Supports Thunderbolt 3)- Akitio Node (External GPU case to mount TITAN X in. Supports Thunderbolt 3)- Ubuntu 16.04 (64-bit)- Nvidia GeForce GTX TITAN X graphics card- NVIDIA-Linux-x86_64–384.90.run - cuda_9.0.176_384.81_linux.run (Optional. Sometimes used in code that pushes computation to Nvidia graphics cards.)
Should you be eyeing different hardware, remember to check for compatibility. For instance, does the laptop have a port that is compatible with what the external GPU case offers? The Akitio Node is connected to my Lenovo X1 Carbon via a Thunderbolt 3 connection.
I began with fresh Ubuntu 16.04 installation.
If you aren’t starting with a fresh Ubuntu install: Be on the lookout for carelessness. An example is having multiple display managers installed and not stopping all of them before driver installation. Although I’ve included reminders in this recipe for those not using a fresh Ubuntu installation, it’s a good idea to be on the lookout. You might like to back up before starting.
Let’s begin.
Connect the TITAN X to the Akitio Node. Secure the TITAN X in place with the provided screws to avoid the card’s connection loosening when the case is moved around. Remember to connect the TITAN X to the Akitio Node’s power supply unit (PSU).
Power up the Akitio Node. The casing’s fan should be gently turning.
Turn on Lenovo X1. Get it to Ubuntu’s login screen before connecting it to the Akitio Node (via Thunderbolt 3).
Check whether TITAN X can be recognized:Open terminal (default shortcut ctrl+alt+t). Newcomers to Ubuntu: enter the following commands one at a time in the terminal. Omit ‘$’.
$ sudo sh -c 'echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized'$ lspci | grep -i nvidia
If the Nvidia GPU can be read, the console should print something like the following screenshot. “VGA compatible controller: NVIDIA Corporation Device …” is the TITAN X.
Using different hardware? Your eGPU may not be “device 0-3” Look in the directories under /sys/bus/thunderbolt/devices/*whatever* for vendor_name and device_name to ascertain your eGPU’s path.
Upgrade kernel:
$ sudo apt-get update$ sudo apt-get upgrade
Mine was upgraded to 4.13.0-43-generic.Running $uname -r
lets you see your kernel version.
Install packages needed for compiling the Nvidia driver later:
$ sudo apt-get install gcc build-essential$ sudo apt-get install linux-headers-$(uname -r) linux-images-$(uname -r) linux-image-extra-$(uname -r)
Not on a freshly installed Ubuntu? Uninstall existing Nvidia drivers. If apt is managing your installed Nvidia driver, run $sudo apt-get purge nvidia*
. Forcibly installing over an existing Nvidia driver may cause issues.
Download NVIDIA-Linux-x86_64–384.90.run .
Blacklist Nouveau:
Nouveau is an open source Nvidia driver that Ubuntu 16.04 ships with by default. Blacklisting Nouveau is required for the guided Nvidia installation later. Here’s how to blacklist it.
$ sudo touch /etc/modprobe.d/blacklist-nouveau.conf
Write these 2 lines into blacklist-nouveau.conf:
blacklist nouveauoptions nouveau modeset=0
Now we’ll update the configuration files for the boot process and reboot:
$ sudo update-initramfs -u$ sudo update-grub$ reboot
After rebooting, check if nouveau is indeed blacklisted. Running the following command should print nothing.
$ lsmod | grep -i nouveau
_lsmod_
shows loaded modules. If Nouveau shows up at this stage, check for typos in the blacklisting steps above. (Accidentally spelt it “noveau” once haha) Remember to “update-initramfs -u”, “update-grub”, and reboot after correcting typos.
Log out.
At the login screen, press ctrl+alt+F2 to get to a terminal (a black screen asking for you to log in). If ctrl+alt+F2 doesn’t work, try ctrl+alt+F3, try ctrl+alt+F4, … until you arrive at a black screen.
Log in by entering your username first, and then your password.
We’ll forgo the fancy graphical desktop temporarily and use this terminal.
Stop the display manager that’s rendering the graphical desktop:
$ sudo service lightdm stop$ service --status-all | grep -i lightdm
The last command should print “[-] lightdm”. “[-]” means not running.
Running $ service --status-all
shows the status of all services. “[+]” means running. For fun: try going back to the graphical login screen (probably ctrl+alt+F1), you shouldn’t be able to see one.
Not on Ubuntu or did not begin with a freshly installed Ubuntu? Your display manager may not be lightdm. Possibilities: sddm, gdm, gdm3, etc.
According to Nvidia, OpenGL applications must be stopped too. If you’re on a freshly installed Ubuntu like me, nothing to do here. Moving on…
Drop runlevel:
$ sudo init 3
Let the OS recognize the TITAN X (same as the steps in Preliminaries section):
$ sudo sh -c 'echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized'$ lspci | grep -i nvidia
NVIDIA-Linux-x86_64–384.90.run needs one small change before it’ll compile. We’ll now proceed to add 1 line to a source code file in there.
Extract the .run file:
$ cd /path/to/NVIDIA-Linux-x86_64–384.90.run$ NVIDIA-Linux-x86_64–384.90.run -x$ cd NVIDIA-Linux-x86_64–384.90
Add “#include <linux/sched/task_stack.h>” to NVIDIA-Linux-x86_64–384.90/kernel/nvidia-uvm/uvm8_va_block.c . This line tells the C compiler to include a linux header. Compilation undertaken by the installer (later) will fail in the absence of this line.
Here’s a screenshot with the added line highlighted:
Add “#include <linux/sched/task_stack.h>” to NVIDIA-Linux-x86_64–384.90/kernel/nvidia-uvm/uvm8_va_block.c
“vim” and “nano” are editors that can be used in the terminal. It’s ok to take longer than you thought at this stage especially if you’re new to this.
When you’re done, let’s return to the root of this archive, and launch the installer.
$ cd /path/to/NVIDIA-Linux-x86_64–384.90$ sudo nvidia-installer --no-opengl-files
--no-opengl-files
is recommended by Nvidia when the GPU is solely intended for computation — like in my use-case.
Caution: To those who wish to install without _--no-opengl-files_
, I have not tried this.
The installer will prompt you with options. Here are my choices:- pre-install script fail? Answer: Continue.- DKMS? Answer: No.- 32-bit? Answer: Yes. Answering no shouldn’t cause the installation to break- configure X? Answer: No. Do not answer yes if GPU is solely for computation.
When the installation is done, let’s check if it went well:
$ modprobe nvidia$ nvidia-smi
nvidia-smi
should print something like:
Output of nvidia-smi
Reboot:
$ reboot
For the Akitio Node, every time you log in, you’ll have to do 2 things before the GPU can be used:(i) $ sudo sh -c 'echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized'
(ii) $ modprobe nvidia
Psst.. create a script to do these automatically upon logging in.
Congrats. You‘ve successfully installed a Nvidia driver. Ubuntu can now read your TITAN X.
CUDA may come in handy for tasking computations on GPUs.
Download cuda_9.0.176_384.81_linux.run .
Let the OS recognize the TITAN X:
$ sudo sh -c 'echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized'
Blacklist Nouveau if you haven’t done so. Details in earlier part of this article.
Stop lightdm. $ sudo service lightdm stop
. Details in earlier part.
Drop runlevel. $ sudo init 3
. Details in earlier part.
$ cd /path/to/cuda_9.0.176_384.81_linux.run$ chmod +x cuda_9.0.176_384.81_linux.run$ sudo cuda_9.0.176_384.81_linux.run
The installer will prompt you with options. Here are my choices:- license agreement? Press ‘q’ once to scroll to end quickly. Enter “accept” to accept.- Install Nvidia driver? No. Because we already did this earlier.- Install cuda 9.0 toolkit? Yes.- Install toolkit location? Default- Install symbolic link? Yes- Install samples? Yes- Sample location? Default.
Nearing the end of this installation, you might be shown a message saying “WARNING Incomplete Installation … A driver of version of at least 384.00 is required for CUDA 9.0 functionality to work.” This is not a problem. We’re fine.
Check installation:
$ cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery$ sudo make all$ ./deviceQuery
deviceQuery should print something like:
output of deviceQuery
Hurray
Food for thought:Some workflows involving a remotely accessed machine and relative large objects can still feel natural. For e.g., quantitatively benchmarking a trained neural net. An eGPU + laptop setup might not improve your workflow.