Nambi Sankaran

@snambi

Setting Up Your Machine for ML

Coming from a software engineering background, I thought the first thing I should do before getting into machine learning was to setup my computer. Naively I thought it may take just a few hours. I was totally wrong.

I wanted to setup pytorch with the right version of anaconda and CUDA sdk. It should be a fairly a straightforward exercise. Fortunately all the machines I have, have NVIDIA GPUs. So I thought its just a matter of installing the right software and I should be done in no time.

Uh.. uh.. Not so fast.

MacBook Pro

First, I tried my mac book pro which had NVIDIA GeForce 750m. After installing everything, the conda env creation failed

conda env create
Solving environment: failed
ResolvePackageNotFound:
- cuda90

After several attempts I realized the Mac is probably the hardest environment to set up.

Ubuntu 16.04

Next I moved to my Linux desktop that was running on Ubuntu 16.04. This machine was not even able to install NVIDIA drivers. This is an old installations with a lot of software. Anything I install may break something else, so let it go.

Windows 10

Then I moved to my Windows 10 Desktop. This machine is quite new, only 6+ months old which I rarely use it. So, I installed NVIDIA drivers, SDK, Anaconda and git. Then I ran the following code to test whether pytorch with CUDA works.

>>> import torch
>>> torch.cuda.is_available()
True
>>> x = torch.randn( 4, 5 )
>>> print( x )
tensor([[-0.1285, -0.2369, -0.3363, 0.1386, 0.1244],
[ 0.1818, -2.0207, 1.9165, -1.4153, 0.3645],
[-0.5384, 0.4833, -1.0172, -0.2509, -1.3831],
[ 1.9053, -1.7572, -0.0098, -0.0333, -1.8762]])
>>> device = torch.device("cuda")
>>> y = torch.ones_like( x , device=device )
C:\Users\nsankaran\AppData\Local\conda\conda\envs\torch2\lib\site-packages\torch\cuda\__init__.py:116: UserWarning:
Found GPU0 NVS 510 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
THCudaCheck FAIL file=c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thc\generic/THCTensorMath.cu line=15 error=48 : no kernel image is available for execution on the device
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thc\generic/THCTensorMath.cu:15

Some of the tensor operations failed to execute on the GPU. Seems like this is an issue with pytorch. Windows 10 seemed very promising until the last step.

At this point, I have already spent 15+ hours and there is no clear and easy way to install the software I needed. I thought it is pretty hard to setup my machine.

Paperspace (Cloud GPU)

In the meantime, I created an account with paperspace.com and tried to create a GPU enabled VM. But, paperspace didn’t work. I wanted to use the fastai template or linux template. Both were not available.

Paperspace GPU request dialog

I requested the GPU several times, but nothing happened. So I assumed that paperspace is not ready yet. At this point I was quite hopeless.

Ubuntu 16.04 (again)

I still had another laptop with NVIDIA GPU running Ubuntu 16.04. I thought why not try once more.

By this time I knew exactly what needed to be done. So I started installing the software in a systematic way,

  • Downloaded the NVIDIA GPU driver
  • CUDA SDK for Ubundu 16.04 instructions
  • Install CUDNN instructions
  • Install Anaconda for Python 3.6 instructions
  • Install pytorch for Linux, Conda, Python 3.6 and Cuda 9.1. Pytorch generated simple one line script to install
conda create --name torch
source activate torch
conda install pytorch torchvision cuda91 -c pytorch

All these steps went through fine. Now time for test. To test I used the following code.

import torch
if torch.cuda.is_available():
print("cuda is available")
x = torch.randn( 4, 5 )
print( "x=\n", x )
device = torch.device("cuda")
y = torch.ones_like( x, device=device )
z = y.to( device )
a = y +z 
print( "a=\n", a )
print( "a.to(\"cpu\")\n", a.to("cpu", torch.double) )
else:
print( "cuda is not available")

Here are the results,

cuda is available
x=
tensor([[-1.1355, -1.0563, 1.1290, 0.5870, -1.4088],
[ 0.0311, 1.4285, -0.7489, -0.3020, -1.1053],
[-0.6777, -0.3300, -0.1201, -0.0400, -0.7543],
[-0.3388, 0.8496, -0.1594, -0.3230, 1.0891]])
a=
tensor([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]], device='cuda:0')
a.to("cpu")
tensor([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]], dtype=torch.float64)

Finally it worked in the fourth attempt.

My Recommendation

  • Use a newer Windows 10 or Ubuntu 16.04 installation
  • Install the software in the order provided
  • Hope everything works

It is awesome to have the Linux laptop setup with ML stuff.

More by Nambi Sankaran

Topics of interest

More Related Stories