Originally posted by Chris Kachris on InAccel blog Emerging cloud applications like machine learning, AI and big data analytics require high-performance computing systems that can sustain the increased amount of data processing without consuming excessive power. Towards this end, many cloud operators have started adopting heterogeneous infrastructures deploying hardware accelerators, like FPGAs, to increase the performance of computationally intensive tasks. However, most hardware accelerators lack programming efficiency as they are programmed using not-so-widely used languages like OpenCL, VHDL, and HLS. According to a survey from Databricks in 2016, 91% of the data scientists care mostly about the performance of their applications and 76% care about the ease of programming. Therefore, the most efficient way for the data scientists to utilize hardware accelerators like FPGAs, to speed up their application, is through the use of a library of IP cores that can be used to speed up the most computationally intensive part of the algorithm. Ideally, what most data scientists want is better performance, lower TCO and no need to change their code. , a world leader in application acceleration, has released the new version of the Accelerated ML suite that allows data scientists to speed up the Machine learning applications without the need to change their code. InAccel offers a novel suite on F1 instances that can be used to speed up an application for Apache Spark MLlib in the cloud (AWS) with . The provided platform is fully scalable and supports all the main new features of Apache Spark like and . For the data scientists that prefer to work with common programming languages like C/C++, Java, Python, and Scala, InAccel offers all the required APIs on that allow the utilization of FPGAs in the cloud as simple as using a programming function. InAccel AWS zero-code changes pipelines data frames AWS Currently, InAccel offers two widely used algorithm for Machine learning training: and . Both of these algorithms were evaluated using the MNIST dataset (24 GBytes). The performance evaluation for 100 iterations showed that could achieve over for the machine learning and up to 2.5x overall (including the initialization of the Spark, the data extraction, etc.). While the loading of the data and the data extraction run much faster on r5, due to the utilization of 48 cores, when it comes to machine learning that is the most computationally intensive part, the FPGA-accelerated cores can achieve up to 3x speedup compared to the multi-core. Logistic Regression BGD K-means clustering InAccel Accelerated ML suite for Apache Spark 3x speedup The data scientist can enjoy not only higher speedup but also they can reduce the TCO and keep happy the CFO of their company. In the performance evaluation, we compared the Accelerated ML suite against the r5.x12large instances that cost the same as the f1.x2large plus the prices for the InAccel IP cores ($3/hour). Using the not only you can achieve up to , but you also . InAccel Accelerated ML suite for Apache Spark 3x speedup 🚀 reduce the cost by 2.5x 📉🤑 If your application was running for 2.5 hours and costs $7.5, now you can achieve the same results in 1 hour (2.5x speedup), and the total cost will be $3/hour. TL;DR you get and without changing your code. Is there any pitfall? Currently, the Accelerated ML suite can support up to 784 features for Logistic regression and Kmeans and up to 32 classes/centroids. The FPGA architecture is optimized to handle the above number of features for now. Soon, we are going to release the new framework that is fully customizable that can support a higher number of features and classes. 3x speedup 2x lower TCO Interested in learning more? Send us a message we are a friendly bunch. 🤖