Let’s go through a high-level exploration of the evolution of computational hardware technologies with a focus on applications to machine learning (ML), and using cryptocurrency mining as an analogy.
I posit that the machine learning industry is undergoing the same progression of hardware as cryptocurrency did years ago.
Machine learning algorithms often consist of matrix (and tensor) operations. These calculations benefit greatly from parallel computing, which leads to model-training performed on graphics cards (rather than only on the CPU).
The natural progression of computational hardware goes:
- Central Processing Unit (CPU)
- Graphics Processing Unit (GPU)
- Field Programmable Gate Array (FGPA)
- Application-Specific Integrated Circuit (ASIC)
Each step in this progression of technologies produces tremendous performance advantages.
Performance can be measured in a number of ways:
- computational capacity (or throughput)
- energy-efficiency (computations per Joule)
- cost-efficiency (throughput per dollar)
Orders of Magnitude
For comparison, let’s consider the task of mining cryptocurrencies, which demands substantial computing power in exchange for financial gain. Since the introduction of Bitcoin in 2009, the crypto-mining industry evolved from using CPUs, to GPUs, to FPGAs, and finally to ASIC systems.
Each step in the hardware evolution provided orders of magnitude in performance improvement. Below is an approximation of performance relative to a single-core CPU representing 1 computational unit:
- Single-core CPU: 1
- Multi-core CPU: 10
- GPU: 100
- FPGA: 1 000
- ASIC: 10 000 ~ 1 000 000
General-Purpose Computing (CPU & GPU)
Prior to 2001, general-purpose computing would be done on the CPU, whereas GPUs traditionally handled only computation for rendering graphics.
Doing general-purpose computing on graphics cards became practical when computer scientists developed matrix multiplication and factorization techniques that were faster and more efficient.
However, GPUs are notoriously power-hungry. Nvidia rates their Titan X graphics card at 250W, and recommends a system power supply of 600W. At $0.12 cents/kWh, 600W translates to $50 in monthly electricity consumption! Nvidia will likely continue to address these concerns in future products.
Specialized Hardware: FPGA
In the case of cryptocurrency, FPGA boards marked the transition to mining with specialized hardware.
A series of FPGA-based mining systems provided the next order-of-magnitude increase in throughput performance, as well as energy-efficiency (as the cost of electricity created a break-even favoring low-power systems).
Efforts are underway to implement machine learning models using FPGAs. For instance, Altera showcases an implementation of the AlexNet convolutional neural network used to classify images.
In late 2012, Microsoft started exploring FPGA-based processors for their Bing search engine.
Currently FPGAs only match GPUs on throughput performance, however they consume less energy for the same workload, thereby making them more feasible in low-power environments (such as self-driving cars).
Cryptocurrency mining continued its evolution to specialized hardware and ASICs quickly became the only competitive option.
The same trend has already started in machine learning.
The TPU-servers power their RankBrain search system, StreetView, and even the AlphaGo system that beat world champion, Lee Sedol.
Google has been using TPUs since 2015,
…and have found them to deliver an order of magnitude better-optimized performance per watt for machine learning.
This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law).
Future and Next Steps
It appears that demand for deep learning and statistical inference is driving the hardware industry towards ML-specialized hardware.
Currently, Google leads with ASICs, their top competitors run FPGAs, and the rest of us are heating our homes with GPUs.
When will ML-specialized ASIC technology become commercially available?
Imagine specialized ASIC chips thousands of times more powerful than today’s top ML hardware. What new AI applications will become feasible? What will become possible when their energy efficiency makes them viable for embedded devices such as smartphones, IoT, and wearables?
As AI applications expand, the demand for ML-specialized devices is driving hardware into the next phases of evolution. It will be fascinating to experience the impact of these technologies applied in healthcare, medicine, transportation, robotics. Many exciting steps in the evolution of machine learning still remain.
- Non-specialized hardware comparison
- Mining hardware comparison
- CNN Implementation on Altera FPGA Using OpenCL
- Microsoft Working on Re-configurable Processors to Accelerate Bing
- Google supercharges machine learning tasks with TPU custom chip
- Google Turning Its Lucrative Web Search Over to AI Machines
- High-Performance Hardware for Machine Learning