paint-brush
Introducing gpu.js: GPU Accelerated JavaScriptby@abhisheksoni2720
20,412 reads
20,412 reads

Introducing gpu.js: GPU Accelerated JavaScript

by Abhishek SoniJuly 19th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

<span>S</span><strong>o, Here’s the problem:</strong> You are a chef, and you have been asked to prepare milkshakes for your best friend’s baby shower. Because of the huge income chef’s usually have, you are a proud owner of a Cylindrical Automated Transformer(<strong>CAT</strong>) that you can use to make milkshakes and whatnot.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Introducing gpu.js: GPU Accelerated JavaScript
Abhishek Soni HackerNoon profile picture

It’s kind of fun to do the impossible.

So, Here’s the problem: You are a chef, and you have been asked to prepare milkshakes for your best friend’s baby shower. Because of the huge income chef’s usually have, you are a proud owner of a Cylindrical Automated Transformer(CAT) that you can use to make milkshakes and whatnot.

Option 1: You could make the milkshakes yourself, and it will take you 2 minutes to do it.

Option 2: You could use the CAT, which takes about 20 minutes, no matter the request.

You choose: Option 1 (good choice)

Impressed by your speed in the kitchen, the Queen of Whales asks you to cook an eight course dinner, with appetizers, and dessert, and milkshakes on her son’s wedding.

If you chose to have at it all by yourself now, not only will you not be able to complete the job, you will be barred from Whales and the Kingdom for the rest of your life. However, if you are a wee bit smart at Math, you’d choose the CAT, finish the job in 20 minutes(right!?!), and gain an all access pass to the fanciest hotels in the United Kingdom.

Machine Learning is that eight-course dinner, in a catered wedding of the Prince of whales with 200,000 guests. Would you want to cook all that yourself(CPU) or use a service? (Hint: gpu.js is that service.)

In Machine Learning, the GPU can help you cut the time to 1/100th of the original time taken. And maybe more. (Keep going! Results will show up.)

Introducing, gpu.js!

If you are baffled and want to dive straight into the pool of brackets, feel free to skip to the next section.

gpu.js is a GPGPU(General purpose Programming on Graphical Processing Units) library that lets you hand over hefty calculations to the GPU for super fast operation and output. It currently runs on the browser and node.js, wherein it is using WebGl API’s in the browser, and a single threaded operation on node.js. OpenCL is on the roadmap.(🎉)

Github Stars ⭐️

You might ask: “But why? Aren’t Intel’s i7, or even i9 fast enough? They seem to work fine for me. I don’t need this.”

Before you get bogged down by that, check out the results:

MacBook Pro Retina 2015, Google Chrome

22.97 times faster!?!

(Specs: MacBook Pro Retina, 2015)

Right, right! That’s a powerful machine, so here are the results on a system with Integrated Graphics Card (Intel HD 3000) and no dedicated GPU:

Intel HD 3000, Google Chrome

All in all, the thing that separates gpu.js form the lot is that it doesn’t chain you to use the library in a specific way. It does what the tagline says: It lets you accelerate your hefty JavaScript.

Let’s have some code now: We are going to perform Matrix Multiplication, and benchmark CPU’s performance vs GPU. Size of the Matrices: 512 X 512

Code

You are right. It’s hosted on GitHub: gpu.js-demo

The source files, namely gpu.min.js and gpu-core.min.js can be downloaded from our website (gpu.rocks) or github(gpu.js).

Note: I am assuming you have already initialized a prototype HTML/JS/CSS project (index.html, index.js, style.css)

Step 1. Import gpu lib files

In your index.html, import the files, and you are good to go:

Step 2. Matrix Multiplication on CPU

To multiply two matrices, we need to make sure that the number of columns in the first matrix is equal to the number of rows in the second matrix.

Matrix A: 512X512 (m X n)

Matrix B: 512X512 (n X r)

Result:512X512 (m X r)

Here’s a generic Matrix Multiplication algorithm that runs on the CPU:

The next step is where the magic begins. (WebGL magic 😆)

Step 3. Setup GPU

The lib files export a global function named GPU that you can use to create a new gpu instance. A gpu

const gpu = new GPU({mode: 'webgl'});

A few options can be sent to the constructor, the complete list of which can be find on GitHub and the automatically generated JSDocs.

The mode option specifies where the function will run. There are three options:

  1. “gpu”
  2. “webgl”
  3. “cpu”

gpu and webgl are aliases, for the time being. We are targeting to incorporate OpenCL for v2 and then the gpu will solely mean use the gpu via OpenCL API on server.

Currently, both webgl and gpu use WebGL API’s to defer work to the GPU.

Step 4. Matrix Multiplication on GPU

The gpu variable we just initialized has several different methods attached to it, all of which have varying use-cases.

We’ll use thecreateKernel method which, essentially, creates a “kernel” (An abstract term for it could be, in fact, function) which you can call from JS. Behind the scenes, your code is compiled to GLSL shaders using an AST and a jison based parser. That ensures that the code written inside the kernel will be executed on a GPU.

You pass a JS Function as an argument to the kernel, and have access to the thread dimensions(As a mnemonic, you can think of the thread dimensions as the lengths of the for-loops we used in CPU mode.)

.setDimensions sets the dimension of the loop. (See the API page for complete reference.)

This is an inherent problem in the way most GPU written software works: Transfer Penalty. The GPU is like its own computer; a black box that we send commands to from the CPU. We can transfer things to it and read data back from it, but all that comes with a penalty. The overall penalty of transfer especially becomes a bottleneck if your case involves performing several mathematical operations on the GPU, and the net fine keeps increasing with every operation.

You can, however, leave values on the GPU. They exist on the GPU as Textures. (You can think of Textures as sort of data containers, but, for the GPU.) By setting outputToTexture flag to true , you can make sure that you incurring no transfer penalty thereby eliciting an important speed gain.

.setOutputToTexture is where the REAL STUFF happens!

WiffleGif.com

And as important as they are**, A** and B are matrices which we’ll create in the next step.

Step 5. Initialize Matrices

This code is taken from the demo you saw on our website. If you don’t get it, no problem. Here’s what it does: It adds 512*512 elements to a JavaScript Array. (1D) and then divides them into 512 parts, which means, in the end, we have a 2D array of size 512*512. (Every array element has child elements.)

All done! Let’s hit the gas.

Step 6. Run it and Benchmark

For simplicity, we are going to use the Web performance API to benchmark this but you can use benchmark.js as well. (Our website uses that.)

First we generate the matrices by calling the above function, and then we run the matMult methods for both CPU and GPU.

BAM! Open index.html in the browser and see for yourself. Here is what I got: (Support for Safari is coming soon!)

Chrome:

Firefox(What’s up, Chrome?):

And that’s just 512 X 512. If you change that number to 1024, you’ll notice that the GPU is a powerful beast and can run your code much much faster than the CPU.

We would like to let the community know that JavaScript has been gifted a rocket. 🚀 What will you do with it?

All the code is on GitHub and the team will be thrilled to have more users and contributors. gpu.js — Go have some epiphanies. 🎉