What I am about to say may seem obvious, but a of people out there are using VMWare vSphere to virtualize all kinds of workloads. Of course, that means I get a of questions about the integration of the with vSphere. LOT LOT Canonical Distribution of Kubernetes Until very recently, to be fair, it was not so easy. You could do it, but well, you had to spend the time to do some manual tweaks here and there, adjust hostnames on each VM… Most of the road bumps were due to a simple thing: VMWare does not support cloud-init, the de-facto standard to bootstrap VMs in pretty much every other cloudy solution. The team has spent a fair amount of time improving the UX of Juju for vSphere, and I am pleased to say that it now works pretty well, including activating GPUs (what else?)!!! Let’s see what the UX looks like now! Requirements To reproduce this post, you’ll need: Basic understanding of the Canonical toolbox: Ubuntu and Juju; Basic understanding of Kubernetes; a VMWare vSphere cluster that can access Internet (at least proxied) and has at least a public (routable) network for the VMs, with a DNS working for all nodes created (or you’ll have some edit to /etc/hosts to do); the will to leave on the edge: juju 2.2rc1 and for the files, cloning the repo: git clone cd blogposts/k8s-vsphere https://github.com/madeden/blogposts vSphere setup I am no expert in VMWare, so I didn’t change anything to the default setup: Installed ESXi 6.5 from the latest ISO on 3 Dell T630 with 12c / 32GB RAM each; Installed the vCenter Appliance on the first host; For each host, I activated GPU passthrough using ; this guide Then I created a datacenter in the vCenter, which I called “Region1” That’s it, really the default setup for everything else: I didn’t touch networking or storage. If you have a specific setup, I’d be happy to talk and review production systems to validate the integration. Juju experience Connecting to vSphere Once you have vSphere installed, you need to let Juju know about it: juju add-cloud vsphereCloud Typesmaasmanualopenstackoraclevsphere Select cloud type: vsphere Enter the vCenter address or URL: 192.168.1.164 Enter datacenter name: Region1 Enter another datacenter? (Y/n): n Cloud “vSphere-test” successfully addedYou may bootstrap with ‘juju bootstrap vsphere’ Now you need to configure the credentials for this cloud: juju add-credential vsphereEnter credential name: canonical Using auth-type “userpass”. Enter user: administrator@vsphere.local Enter password: Credentials added for cloud vsphere. Bootstrapping A classic bynow, the bootstrap code for Juju: juju bootstrap vsphere/Region1 --bootstrap-constraints "cores=2 mem=4G root-disk=32G"Creating Juju controller "vsphere-Region1" on vsphere/Region1Looking for packaged Juju agent version 2.2-rc1 for amd64No packaged binary found, preparing local Juju agent binaryLaunching controller instance(s) on vsphere/Region1... juju-9c9d0a-0 (arch=amd64 mem=4G cores=2)dk: 97.76% (26.2MiB/s)ases/xenial/release-20170330/ubuntu-16.04-server-cloudimg-amd64.ovaFetching Juju GUI 2.6.0Waiting for addressAttempting to connect to 192.168.1.165:22Attempting to connect to fe80::250:56ff:fe87:d44c:22Bootstrap agent now startedContacting Juju controller at 192.168.1.165 to verify accessibility...Bootstrap complete, "vsphere-Region1" controller now available.Controller machines are in the "controller" model.Initial model "default" added. I prepared a small bundle in the src folder, which you can install with: juju deploy src/k8s-vsphere.yaml then you can wait for the model to converge to a stable state: watch -c juju status --color In vSphere, this will translate in something like: vSphere UI after bootstrap and deployment Then you can download the credentials are query the cluster: juju scp kubernetes-master/0:config ~/.kube/config kubectl get nodes --show-labelsNAME STATUS AGE VERSION LABELSjuju-428e55-1 Ready 1h v1.6.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=juju-428e55-1juju-428e55-2 Ready 1h v1.6.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=juju-428e55-2 OK!! You now have a Kubernetes cluster up & running on VMWare vSphere. Wasn’t too complicated, was it? Should we say it was boring? Adding GPUs On vSphere OK so the cool stuff now. Using the same guide as before, add GPUs to the VMs running the Kubernetes workers. You’ll first need to stop them, then add the PCI device, and restart them. In Kubernetes At this point, Juju should pick up and discover the nVidia board and install the CUDA drivers all by itself. For some reason it did not, and we are investigating. But we don’t stop at a small glitch. Let’s install that manually, which will also give me the occasion to answer questions I got about managing CDK now that the control plane has been fully snapped. Google has this simple script to install the drivers: #!/bin/bashecho “Checking for CUDA and installing.”# Check for CUDA and try to install.if ! dpkg-query -W cuda; thencurl -O dpkg -i ./cuda-repo-ubuntu1604_8.0.61–1_amd64.debapt-get updateapt-get install cuda -yfi http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb Just run it on the 2 workers (eventually make use of “juju scp” and “juju ssh”) Now on each worker, you need to activate a couple of flags. There is a new procedure to do so as GPUs are now “Accelerators” in K8s: sudo snap set kubelet experimental-nvidia-gpus=1sudo snap set kubelet feature-gates=”Accelerators=true”sudo systemctl restart snap.kubelet.daemon.service : If you read my previous posts, maybe you remember we were editing files with sed, awk and other nice text edition foo. Now it is a command for the snap. It matter. Suddenly as the admin, you’re not in charge of managing idempotency of your code as you delegate that to snapd. It is a game changer and makes things a lot more trivial than they used to. And that is even without mentioning the new upgrade path made soooo simple. Note set DOES Now on the master, sudo snap set kube-controller-manager feature-gates=”Accelerators=true”sudo snap set kube-scheduler feature-gates=”Accelerators=true”sudo snap set kube-apiserver feature-gates=”Accelerators=true”sudo systemctl restart snap.kube-apiserver.daemon.servicesudo systemctl restart snap.kube-scheduler.daemon.servicesudo systemctl restart snap.kube-controller-manager.daemon.service OK, you’re good to go, you now have GPUs activated in K8s Testing the results A classic to start with: kubectl install -f src/nvidia-smi.yaml There you go, it just works :) nVidia P5000, passthrough in vSphere More cryptocurrencies? I noticed you guys cryptocurrencies, so I wrote a new chart for , which you’ll find in LOVE Minergate https://github.com/madeden/charts It’s not the fastest miner ever, but it’s OK, and it can do CUDA mining out of the box, making it very cool for testing. Have a look at the config file, create an account on and start playing: https://minergate.com helm inithelm install path/to/charts/minergate — name minergate You can configure the following values: : (ob) just a way to run several times the same chart and still id workers easily clusterName : (minergate) also a differenciator pool : (-qcn) the crypto currency you want to mine, with a “-” before it. coin : (2): How many nodes are in the cluster to welcome miners nodes : (1) How many workers you want to deploy per node workersPerNode : (1): How many CPU cores do you want to allocate per miner cpusPerWorker : (4): If using GPU, how much stress to put on the GPU? Use 0 if you want to do CPU mining only gpuComplexity username: ( ) Your Minergate ID samnco@gmail.com The logs should look like: $ kubectl logs worker-ob-1-0-2045746350-h2dd7Starting mining -qcn with 2 CPUs and 1 GPUs[2017-05-17 15:43:38.333] [ info] Pool parameters query...[2017-05-17 15:43:46.592] [ info] Loading miners...[2017-05-17 15:43:46.593] [ info] Miners loaded successfully[2017-05-17 15:43:46.594] [ info] CUDA: Initializing CUDA miner...[2017-05-17 15:43:53.212] [ info] CUDA: Device name: Quadro P5000[2017-05-17 15:43:53.212] [ info] CUDA: Total memory: 17063477248[2017-05-17 15:43:53.213] [ info] CUDA: Free memory: 16942891008[2017-05-17 15:43:53.213] [ info] CUDA: MP count: 20[2017-05-17 15:43:53.213] [ info] CUDA: MP threads count: 2048[2017-05-17 15:43:53.213] [ info] CUDA: CUDA version: 6.1[2017-05-17 15:43:53.213] [ info] CUDA: CUDA cores: 0[2017-05-17 15:43:53.213] [ info] CUDA: Threads per block: 1024[2017-05-17 15:43:53.213] [ info] CUDA: Dim size: 1024 | 1024 | 64[2017-05-17 15:43:53.213] [ info] CUDA: Grid size: 2147483647 | 65535 | 65535[2017-05-17 15:43:53.213] [ info] CUDA: Calculated threads per block: 8[2017-05-17 15:43:53.213] [ info] CUDA: Calculated blocks: 0[2017-05-17 15:43:53.213] [ info] CUDA: Total threads: 0[2017-05-17 15:43:53.214] [ info] CUDA: CUDA miner successfully initialized[2017-05-17 15:43:53.215] [error] PoolClient: ERROR: Trying to connect while connection is in progress[2017-05-17 15:43:53.215] [ info] Stratum client stopped[2017-05-17 15:43:53.216] [ info] Stratum client stopped[2017-05-17 15:43:53.325] [ info] Successfully connected to pool: stratum+tcp://qcn.pool.minergate.com:45570. session_id="02ec69f8-1f5f-440f-9e9f-3b85a6febf84"[2017-05-17 15:43:53.325] [ info] New Job: job_id="25f7c926-e632-4ab3-b822-9d7abc0ed9c8" blob="0100aedff1c8057c6ea6d76a9920d1bf61c14b410a50d1ef216a293b0ea5e107b6c9d615d8abc3000000007cc70a356438f7d1cc3133d226a8b2b7c4adaf21286b3cf3ade0cee62acbf1cc01" target="e4a63d00"[2017-05-17 15:43:53.325] [ info] New difficulty: 1063[2017-05-17 15:43:56.570] [ info] QCN hashrate: 67.8947 H/s Enjoy! Of course this is Helm, so you are only limited by Kubernetes, not being on VMWare or any other substrate. More seriously, any GPU workload you have: Deep Learning, physics computation, cracking passwords, transcoding videos… all that will be drastically improved with such a setup. That’s right, we have Kubernetes AND GPUs Conclusion The experience on VMWare has drastically improved over the last few weeks. It is now particularly easy to operate big software on vSphere. Juju The is one example, but , a long time partner of Canonical, does Big Data consulting and integration with Pentaho with it and can now leverage VMWare as a target as well. Canonical Distribution of Kubernetes Spicule It is also good to know that can integrate VMWare as a “bare metal layer”, so you can essentially record VMs from VMWare in MAAS, and use it to start them or stop them. MAAS We’re about to complete our tour of activating nVidia GPUs on all clouds, bare metal and so on. Next stop: Microsoft Azure, and the loop will be closed. Any question, I am @SaMnCo_23 on Twitter, #SaMnCo on Freenode and GitHub. Feel free to ping me! And of course, if you liked this, found it useful, or just want to help, click the little heart! Thanks for reading!
Share Your Thoughts