GPU Computing (general-purpose computing on graphics processing units) enables many modern machine learning algorithms that were previously impractical due to slow runtime. By taking advantage of the parallel computing capabilities of GPUs, a significant decrease in computational time can be achieved relative to traditional CPU computing.
CUDA, developed and provided free of charge by NVIDIA, is the parallel computing runtime and software API for developers used to power most leading machine learning frameworks.
By using the CUDA platform and APIs, the parallel computation pipeline originally developed for computer graphics rendering can now be utilized by software developers for general-purpose computing on NVIDIA GPUs. With a parallel architecture and massive memory bandwidth, GPU hardware enables the completion of computationally intensive assignments orders of magnitude faster compared with conventional CPUs. For example, training of deep neural networks that would have taken years on CPU can now be completed in hours or days thanks to the advantages of GPU computation.
But why GPU? Because GPUs have thousands of computational cores while CPUs have only up to 2, 4, 8, 16, 32 or so cores. The level of parallelization of the code is limited to the number of cores on one’s machine. Additionally, many GPUs include cores specialized for the sorts of mathematical computations used in machine learning such as NVIDIA’s tensor core for performing matrix multiplication-accumulation.
In order to execute code on NVIDIA GPUs, traditional serial code can be rewritten into parallelized code using CUDA C/C++, FORTAN, or other interfaces which is then compiled to take advantage of the specific compute capabilities of the GPU hardware available. The dataset is distributed across different workers (processors) and then tasks assign for each worker with the results being collected at the end of the computational pipeline. Many popular deep learning frameworks such as Tensorflow, Pytorch, MXNet, and Chainer include CUDA support and allowing the user to take advantage of GPU computation without writing a single line of CUDA code.
However, not all tasks can be accelerated by GPUs; a task has to able to be parallelized in order to be worked on a GPU. These problems are called inherently serial problems. Thankfully many of the computations important in machine learning algorithms such as artificial neural networks can be parallelized. For example, in convolutional neural networks one of the slow computational steps is the process where a sliding window has to “slide” through the image and apply the convolution kernel to each piece of the image. In this case, each step of computation is independent of its previous step. This allows us to easily parallelize the code, so that each processor can perform calculations without interrupting others. This is only one example of why GPU computing is so popular in the machine learning field where huge datasets and massive computation renders traditional CPU computation impractical.
The Modzy platform provides a CUDA capable runtime with support for NVIDIA GPUs. All of our models can be run on the GPU, or multiple GPUs, obtaining superior runtime performance compared to CPU.
If you are using GPU-powered models from the Modzy library then you already get to take advantage of the runtime improvements. If you are creating new models to deploy into the Modzy platform, you will be able to leverage the included CUDA and GPU support to enhance their performance.