Coming from a software engineering background, I thought the first thing I should do before getting into machine learning was to setup my computer. Naively I thought it may take just a few hours. I was totally wrong.
I wanted to setup pytorch with the right version of anaconda and CUDA sdk. It should be a fairly a straightforward exercise. Fortunately all the machines I have, have NVIDIA GPUs. So I thought its just a matter of installing the right software and I should be done in no time.
Uh.. uh.. Not so fast.
First, I tried my mac book pro which had NVIDIA GeForce 750m. After installing everything, the conda env creation failed
conda env createSolving environment: failed
ResolvePackageNotFound:
After several attempts I realized the Mac is probably the hardest environment to set up.
Next I moved to my Linux desktop that was running on Ubuntu 16.04. This machine was not even able to install NVIDIA drivers. This is an old installations with a lot of software. Anything I install may break something else, so let it go.
Then I moved to my Windows 10 Desktop. This machine is quite new, only 6+ months old which I rarely use it. So, I installed NVIDIA drivers, SDK, Anaconda and git. Then I ran the following code to test whether pytorch with CUDA works.
>>> import torch>>> torch.cuda.is_available()True>>> x = torch.randn( 4, 5 )>>> print( x )tensor([[-0.1285, -0.2369, -0.3363, 0.1386, 0.1244],[ 0.1818, -2.0207, 1.9165, -1.4153, 0.3645],[-0.5384, 0.4833, -1.0172, -0.2509, -1.3831],[ 1.9053, -1.7572, -0.0098, -0.0333, -1.8762]])>>> device = torch.device("cuda")>>> y = torch.ones_like( x , device=device )C:\Users\nsankaran\AppData\Local\conda\conda\envs\torch2\lib\site-packages\torch\cuda\__init__.py:116: UserWarning:Found GPU0 NVS 510 which is of cuda capability 3.0.PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))THCudaCheck FAIL file=c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thc\generic/THCTensorMath.cu line=15 error=48 : no kernel image is available for execution on the deviceTraceback (most recent call last):File "<stdin>", line 1, in <module>RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at c:\programdata\miniconda3\conda-bld\pytorch_1524546371102\work\aten\src\thc\generic/THCTensorMath.cu:15
Some of the tensor operations failed to execute on the GPU. Seems like this is an issue with pytorch.
Windows 10 seemed very promising until the last step.
At this point, I have already spent 15+ hours and there is no clear and easy way to install the software I needed. I thought it is pretty hard to setup my machine.
In the meantime, I created an account with paperspace.com and tried to create a GPU enabled VM. But, paperspace didn’t work. I wanted to use the fastai
template or linux
template. Both were not available.
I requested the GPU several times, but nothing happened. So I assumed that paperspace is not ready yet. At this point I was quite hopeless.
I still had another laptop with NVIDIA GPU running Ubuntu 16.04. I thought why not try once more.
By this time I knew exactly what needed to be done. So I started installing the software in a systematic way,
conda create --name torchsource activate torchconda install pytorch torchvision cuda91 -c pytorch
All these steps went through fine. Now time for test. To test I used the following code.
import torch
if torch.cuda.is_available():print("cuda is available")
x = torch.randn( 4, 5 )print( "x=\n", x )
device = torch.device("cuda")y = torch.ones_like( x, device=device )z = y.to( device )
a = y +zprint( "a=\n", a )print( "a.to(\"cpu\")\n", a.to("cpu", torch.double) )
else:print( "cuda is not available")
Here are the results,
cuda is availablex=tensor([[-1.1355, -1.0563, 1.1290, 0.5870, -1.4088],[ 0.0311, 1.4285, -0.7489, -0.3020, -1.1053],[-0.6777, -0.3300, -0.1201, -0.0400, -0.7543],[-0.3388, 0.8496, -0.1594, -0.3230, 1.0891]])a=tensor([[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.]], device='cuda:0')a.to("cpu")tensor([[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.],[ 2., 2., 2., 2., 2.]], dtype=torch.float64)
Finally it worked in the fourth attempt.
It is awesome to have the Linux laptop setup with ML stuff.