What I am about to say may seem obvious, but a LOT of people out there are using VMWare vSphere to virtualize all kinds of workloads. Of course, that means I get a LOT of questions about the integration of the Canonical Distribution of Kubernetes with vSphere.
Until very recently, to be fair, it was not so easy. You could do it, but well, you had to spend the time to do some manual tweaks here and there, adjust hostnames on each VM… Most of the road bumps were due to a simple thing: VMWare does not support cloud-init, the de-facto standard to bootstrap VMs in pretty much every other cloudy solution.
The team has spent a fair amount of time improving the UX of Juju for vSphere, and I am pleased to say that it now works pretty well, including activating GPUs (what else?)!!!
Let’s see what the UX looks like now!
To reproduce this post, you’ll need:
and for the files, cloning the repo:
git clone https://github.com/madeden/blogpostscd blogposts/k8s-vsphere
I am no expert in VMWare, so I didn’t change anything to the default setup:
That’s it, really the default setup for everything else: I didn’t touch networking or storage. If you have a specific setup, I’d be happy to talk and review production systems to validate the integration.
Once you have vSphere installed, you need to let Juju know about it:
juju add-cloud vsphereCloud Typesmaasmanualopenstackoraclevsphere
Select cloud type: vsphere
Enter the vCenter address or URL: 192.168.1.164
Enter datacenter name: Region1
Enter another datacenter? (Y/n): n
Cloud “vSphere-test” successfully addedYou may bootstrap with ‘juju bootstrap vsphere’
Now you need to configure the credentials for this cloud:
juju add-credential vsphereEnter credential name: canonical
Using auth-type “userpass”.
Enter user: [email protected]
Enter password:
Credentials added for cloud vsphere.
A classic bynow, the bootstrap code for Juju:
juju bootstrap vsphere/Region1 --bootstrap-constraints "cores=2 mem=4G root-disk=32G"Creating Juju controller "vsphere-Region1" on vsphere/Region1Looking for packaged Juju agent version 2.2-rc1 for amd64No packaged binary found, preparing local Juju agent binaryLaunching controller instance(s) on vsphere/Region1...
juju-9c9d0a-0 (arch=amd64 mem=4G cores=2)dk: 97.76% (26.2MiB/s)ases/xenial/release-20170330/ubuntu-16.04-server-cloudimg-amd64.ovaFetching Juju GUI 2.6.0Waiting for addressAttempting to connect to 192.168.1.165:22Attempting to connect to fe80::250:56ff:fe87:d44c:22Bootstrap agent now startedContacting Juju controller at 192.168.1.165 to verify accessibility...Bootstrap complete, "vsphere-Region1" controller now available.Controller machines are in the "controller" model.Initial model "default" added.
I prepared a small bundle in the src folder, which you can install with:
juju deploy src/k8s-vsphere.yaml
then you can wait for the model to converge to a stable state:
watch -c juju status --color
In vSphere, this will translate in something like:
vSphere UI after bootstrap and deployment
Then you can download the credentials are query the cluster:
juju scp kubernetes-master/0:config ~/.kube/config
kubectl get nodes --show-labelsNAME STATUS AGE VERSION LABELSjuju-428e55-1 Ready 1h v1.6.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=juju-428e55-1juju-428e55-2 Ready 1h v1.6.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=juju-428e55-2
OK!! You now have a Kubernetes cluster up & running on VMWare vSphere. Wasn’t too complicated, was it? Should we say it was boring?
OK so the cool stuff now. Using the same guide as before, add GPUs to the VMs running the Kubernetes workers.
You’ll first need to stop them, then add the PCI device, and restart them.
At this point, Juju should pick up and discover the nVidia board and install the CUDA drivers all by itself. For some reason it did not, and we are investigating.
But we don’t stop at a small glitch. Let’s install that manually, which will also give me the occasion to answer questions I got about managing CDK now that the control plane has been fully snapped.
Google has this simple script to install the drivers:
#!/bin/bashecho “Checking for CUDA and installing.”# Check for CUDA and try to install.if ! dpkg-query -W cuda; thencurl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.debdpkg -i ./cuda-repo-ubuntu1604_8.0.61–1_amd64.debapt-get updateapt-get install cuda -yfi
Just run it on the 2 workers (eventually make use of “juju scp” and “juju ssh”)
Now on each worker, you need to activate a couple of flags. There is a new procedure to do so as GPUs are now “Accelerators” in K8s:
sudo snap set kubelet experimental-nvidia-gpus=1sudo snap set kubelet feature-gates=”Accelerators=true”sudo systemctl restart snap.kubelet.daemon.service
Note: If you read my previous posts, maybe you remember we were editing files with sed, awk and other nice text edition foo. Now it is a set command for the snap. It DOES matter. Suddenly as the admin, you’re not in charge of managing idempotency of your code as you delegate that to snapd. It is a game changer and makes things a lot more trivial than they used to. And that is even without mentioning the new upgrade path made soooo simple.
Now on the master,
sudo snap set kube-controller-manager feature-gates=”Accelerators=true”sudo snap set kube-scheduler feature-gates=”Accelerators=true”sudo snap set kube-apiserver feature-gates=”Accelerators=true”sudo systemctl restart snap.kube-apiserver.daemon.servicesudo systemctl restart snap.kube-scheduler.daemon.servicesudo systemctl restart snap.kube-controller-manager.daemon.service
OK, you’re good to go, you now have GPUs activated in K8s
A classic to start with:
kubectl install -f src/nvidia-smi.yaml
There you go, it just works :)
nVidia P5000, passthrough in vSphere
I noticed you guys LOVE cryptocurrencies, so I wrote a new chart for Minergate, which you’ll find in https://github.com/madeden/charts
It’s not the fastest miner ever, but it’s OK, and it can do CUDA mining out of the box, making it very cool for testing.
Have a look at the config file, create an account on https://minergate.com and start playing:
helm inithelm install path/to/charts/minergate — name minergate
You can configure the following values:
The logs should look like:
$ kubectl logs worker-ob-1-0-2045746350-h2dd7Starting mining -qcn with 2 CPUs and 1 GPUs[2017-05-17 15:43:38.333] [ info] Pool parameters query...[2017-05-17 15:43:46.592] [ info] Loading miners...[2017-05-17 15:43:46.593] [ info] Miners loaded successfully[2017-05-17 15:43:46.594] [ info] CUDA: Initializing CUDA miner...[2017-05-17 15:43:53.212] [ info] CUDA: Device name: Quadro P5000[2017-05-17 15:43:53.212] [ info] CUDA: Total memory: 17063477248[2017-05-17 15:43:53.213] [ info] CUDA: Free memory: 16942891008[2017-05-17 15:43:53.213] [ info] CUDA: MP count: 20[2017-05-17 15:43:53.213] [ info] CUDA: MP threads count: 2048[2017-05-17 15:43:53.213] [ info] CUDA: CUDA version: 6.1[2017-05-17 15:43:53.213] [ info] CUDA: CUDA cores: 0[2017-05-17 15:43:53.213] [ info] CUDA: Threads per block: 1024[2017-05-17 15:43:53.213] [ info] CUDA: Dim size: 1024 | 1024 | 64[2017-05-17 15:43:53.213] [ info] CUDA: Grid size: 2147483647 | 65535 | 65535[2017-05-17 15:43:53.213] [ info] CUDA: Calculated threads per block: 8[2017-05-17 15:43:53.213] [ info] CUDA: Calculated blocks: 0[2017-05-17 15:43:53.213] [ info] CUDA: Total threads: 0[2017-05-17 15:43:53.214] [ info] CUDA: CUDA miner successfully initialized[2017-05-17 15:43:53.215] [error] PoolClient: ERROR: Trying to connect while connection is in progress[2017-05-17 15:43:53.215] [ info] Stratum client stopped[2017-05-17 15:43:53.216] [ info] Stratum client stopped[2017-05-17 15:43:53.325] [ info] Successfully connected to pool: stratum+tcp://qcn.pool.minergate.com:45570. session_id="02ec69f8-1f5f-440f-9e9f-3b85a6febf84"[2017-05-17 15:43:53.325] [ info] New Job: job_id="25f7c926-e632-4ab3-b822-9d7abc0ed9c8" blob="0100aedff1c8057c6ea6d76a9920d1bf61c14b410a50d1ef216a293b0ea5e107b6c9d615d8abc3000000007cc70a356438f7d1cc3133d226a8b2b7c4adaf21286b3cf3ade0cee62acbf1cc01" target="e4a63d00"[2017-05-17 15:43:53.325] [ info] New difficulty: 1063[2017-05-17 15:43:56.570] [ info] QCN hashrate: 67.8947 H/s
Enjoy! Of course this is Helm, so you are only limited by Kubernetes, not being on VMWare or any other substrate.
More seriously, any GPU workload you have: Deep Learning, physics computation, cracking passwords, transcoding videos… all that will be drastically improved with such a setup.
That’s right, we have Kubernetes AND GPUs
The Juju experience on VMWare has drastically improved over the last few weeks. It is now particularly easy to operate big software on vSphere.
The Canonical Distribution of Kubernetes is one example, but Spicule, a long time partner of Canonical, does Big Data consulting and integration with Pentaho with it and can now leverage VMWare as a target as well.
It is also good to know that MAAS can integrate VMWare as a “bare metal layer”, so you can essentially record VMs from VMWare in MAAS, and use it to start them or stop them.
We’re about to complete our tour of activating nVidia GPUs on all clouds, bare metal and so on. Next stop: Microsoft Azure, and the loop will be closed.
Any question, I am @SaMnCo_23 on Twitter, #SaMnCo on Freenode and GitHub. Feel free to ping me!
And of course, if you liked this, found it useful, or just want to help, click the little heart! Thanks for reading!