Let’s go through a high-level exploration of the evolution of computational hardware technologies with a focus on applications to machine (ML), and using cryptocurrency mining as an analogy. learning I posit that the machine learning industry is . undergoing the same progression of hardware as cryptocurrency did years ago Machine learning algorithms often consist of matrix (and ) operations. These calculations benefit greatly from parallel computing, which leads to model-training performed on graphics cards (rather than only on the CPU). tensor The natural progression of computational hardware goes: Central Processing Unit (CPU) Graphics Processing Unit (GPU) Field Programmable Gate Array (FGPA) Application-Specific Integrated Circuit (ASIC) Each step in this progression of technologies produces tremendous performance advantages. can be measured in a number of ways: Performance computational capacity (or ) throughput (computations per ) energy-efficiency Joule cost-efficiency (throughput per dollar) Orders of Magnitude For comparison, let’s consider the task of cryptocurrencies, which demands substantial computing power in exchange for financial gain. Since the introduction of Bitcoin in 2009, the crypto-mining industry evolved from using CPUs, to GPUs, to FPGAs, and finally to ASIC systems. mining Each step in the hardware evolution provided in performance improvement. Below is an approximation of performance relative to a single-core CPU representing 1 orders of magnitude computational unit: Single-core CPU: 1 Multi-core CPU: 10 GPU: 100 FPGA: 1 000 ASIC: 10 000 ~ 1 000 000 [ , ] These numbers are based on the performance factors (such as throughput, efficiency) observed through the cryptocurrency-mining evolution. 1 2 General-Purpose Computing (CPU & GPU) Prior to , general-purpose computing would be done on the CPU, whereas GPUs traditionally handled only computation for rendering graphics. 2001 Doing general-purpose computing on graphics cards became practical when computer scientists developed matrix multiplication and techniques that were faster and more efficient. factorization Since then, there have been notable efforts to create programming languages that allow general-purpose computing on GPUs, including and . CUDA OpenCL NVIDIA Titan X Graphics Card However, GPUs are notoriously power-hungry. Nvidia rates their graphics card at 250W, and recommends a system power supply of 600W. At $0.12 cents/kWh, 600W translates to $50 in monthly electricity consumption! Nvidia will likely continue to address these concerns in . Titan X future products Specialized Hardware: FPGA (FPGA) are integrated circuits whose can be programmed and reconfigured using a ( ). Field-programmable gate arrays logic blocks hardware description language HDL In the case of cryptocurrency, FPGA boards marked the transition to mining with hardware. specialized A series of FPGA-based mining systems provided the next order-of-magnitude increase in throughput performance, as well as energy-efficiency (as the cost of electricity created a break-even favoring low-power systems). Efforts are underway to implement machine learning models using FPGAs. For instance, Altera showcases an implementation of the used to classify images. AlexNet convolutional neural network In late 2012, Microsoft started exploring FPGA-based processors for their Bing search engine. Currently FPGAs only GPUs on throughput performance, however they consume less energy for the same workload, thereby making them more feasible in low-power environments (such as self-driving cars). match Purpose-Built ASICs Cryptocurrency mining continued its evolution to specialized hardware and ASICs quickly became the only competitive option. The same trend has already started in machine learning. TPU servers, AlphaGo with Lee Sedol In May 2016, engineers at Google announced that they created an ML-specialized ASIC called . technology a Tensor Processing Unit (TPU) The TPU-servers power their RankBrain search system, StreetView, and even the AlphaGo system that beat world champion, Lee Sedol. Google has been using TPUs since 2015, …and have found them to deliver an order of magnitude better-optimized performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law). Future and Next Steps It appears that demand for deep learning and statistical inference is driving the hardware industry towards ML-specialized hardware. Currently, Google leads with ASICs, their top competitors run FPGAs, and the rest of us are heating our homes with GPUs. When will ML-specialized ASIC technology become commercially available? Will the industry adopt an open framework such as as a basis for heterogeneous computing? Progress is already being made by popular ML libraries such as and . OpenCL TensorFlow Caffe Will this exponential evolution continue or plateau at some ? The in this hardware evolution include new materials, biological computing, or quantum computing. physical barrier next steps Imagine specialized ASIC chips thousands of times more powerful than today’s top ML hardware. What new AI applications will become feasible? What will become possible when their energy efficiency makes them viable for embedded devices such as smartphones, IoT, and wearables? As AI applications expand, the demand for ML-specialized devices is driving hardware into the next phases of evolution. It will be fascinating to experience the impact of these technologies applied in healthcare, medicine, transportation, robotics. Many exciting steps in the evolution of machine learning still remain. References Non-specialized hardware comparison https://en.bitcoin.it/wiki/Non-specialized_hardware_comparison Mining hardware comparison https://en.bitcoin.it/wiki/Mining_hardware_comparison CNN Implementation on Altera FPGA Using OpenCL https://www.altera.com/solutions/technology/machine-learning/overview.highResolutionDisplay.html Microsoft Working on Re-configurable Processors to Accelerate Bing http://www.datacenterknowledge.com/archives/2014/06/27/programmable-fpga-chips-coming-to-microsoft-data-centers/ Google supercharges machine learning tasks with TPU custom chip https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html Google Turning Its Lucrative Web Search Over to AI Machines http://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines High-Performance Hardware for Machine Learning https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf