Deploying machine learning models outside of a environment used to be difficult. When the target platform is the browser, the defacto standard for serving predictions has been an API call to a server-side inference engine. For many reasons, server-side inference APIs are a non-optimal solution and machine learning models are more often being deployed natively. has done a good job at supporting this movement by providing cross-platform APIs, however many of us do not want to be married to a single ecosystem. Python TensorFlow In comes the (ONNX) project which, since being picked up by , has been seeing massive development efforts and is approaching a stable state. It's now easier than ever to deploy machine-learning models; trained using your machine-learning framework of choice, on your platform of choice, with hardware acceleration out of the box. Open Neural Network Exchange Microsoft In April this year, was introduced (see this ). uses to compile the inference engine to format - it's about time started to flex its muscles. Especially when paired with we suddenly have GPU-powered machine learning in the browser, pretty cool. onnxruntime-web Pull Request onnxruntime-web WebAssembly onnxruntime wasm WebAssembly WebGL, In this tutorial we will dive into by deploying a pre-trained model to the browser. We will be using as our deployment target. AlexNet has been trained as an image classifier on the , so we will be building an image classifier - nothing better than re-inventing the wheel. At the end of this tutorial, we will have built a bundled web app that can be run as a stand alone static web page, or integrated into your framework of choice. onnxruntime-web PyTorch AlexNet ImageNet dataset JavaScript → Jump to code onnxruntime-web-tutorial Prerequisite You will need a trained machine-learning model exported as an binary protobuf file. There's many ways to achieve this using a number of different deep-learning frameworks. For the sake of this tutorial, I will be using the exported model from the example in the , the python code snippet below will help you generate your own model. You can also follow the documentation to export your own model. If you're coming from , will help you with exporting your model to . Lastly, doesn't just pride itself on cross-platform deployment, but also in allowing exports from all major deep-learning frameworks. Those of you using another deep learning framework should be able to find support for exporting to in the docs of your framework. ONNX AlexNet PyTorch documentation PyTorch Tensorflow this tutorial ONNX ONNX ONNX import torch import torchvision dummy_input = torch.randn(1, 3, 224, 224) model = torchvision.models.alexnet(pretrained=True) input_names = ["input1"] output_names = ["output1"] torch.onnx.export( model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names ) Running this script creates a file, , a binary protobuf file which contains both the network structure and parameters of the model you exported (in this case, ). alexnet.onnx AlexNet ONNX Runtime Web is a JavaScript library for running models on the browser and on . ONNX Runtime Web has adopted and technologies for providing an optimized ONNX model inference runtime for both CPUs and GPUs. ONNX Runtime Web ONNX Node.js WebAssembly WebGL The official package is hosted on under the name . When using a bundler or working server-side, this package can be installed using . However, it's also possible to deliver the code via a CDN using a script tag. The bundling process is a bit involved so we will start with the script tag approach and come back to using the package later. npm onnxruntime-web npm install npm Inference Session Let's start with the core application logic: model inference. exposes a runtime object called an with a method which is used to initiate the forward pass with the desired inputs. Both the constructor and the accompanying method return a so we will run the entire process inside an context. Before implementing any browser elements, we will check that our model runs with a dummy input tensor, remembering the input and output names and sizes that we defined earlier when exporting the model. onnxruntime InferenceSession .run() InferenceSessesion .run() Promise async async function run() { try { // create a new session and load the AlexNet model. const session = await ort.InferenceSession.create('./alexnet.onnx'); // prepare dummy input data const dims = [1, 3, 224, 224]; const size = dims[0] * dims[1] * dims[2] * dims[3]; const inputData = Float32Array.from({ length: size }, () => Math.random()); // prepare feeds. use model input names as keys. const feeds = { input1: new ort.Tensor('float32', inputData, dims) }; // feed inputs and run const results = await session.run(feeds); console.log(results.output1.data); } catch (e) { console.log(e); } } run(); We then implement a simple HTML template, , which should load both the pre-compiled package and , containing our code. index.html onnxruntime-web main.js <!DOCTYPE html> <html> <header> <title>ONNX Runtime Web - Tutorial</title> </header> <body> <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"> </script> <script src="main.js"></script> </body> </html> To run this, we can use . If you haven't started an project by now, please do so by running in your current working directory. Once you've completed the setup, install live-server ( ) and serve the static HTML page using . light-server npm npm init npm install light-server npx light-server -s . -p 8080 You’re now running a machine learning model in the ! To check that everything is running fine go to your web console and make sure that the output tensor is logged ( is bulky so it's normal that inference takes a few seconds). natively browser AlexNet Bundled deployment Next we will use to bundle our dependencies as would be the case if we want to deploy the model in a app powered by frameworks like or . Usually bundling is a relatively simple procedure, however requires a slightly more involved configuration - this is because is used to provide the natively assembled runtime. webpack Javascript React Vue onnxruntime-web webpack WebAssembly Browser support, the classic pitfall, especially when working with cutting-edge web technology. If your intended users are not using one of the four major browsers (Chrome, Edge, Firefox, Safari) you might want to hold off on integrating components. More information on the support and roadmap can be found . WebAssembly WebAssembly here The following steps are based on the examples provided by the official . We’re assuming you've already started an project. ONNX documentation npm 1. Install the dependencies. npm install onnxruntime-web && npm install -D webpack webpack-cli copy-webpack 2. Instead of loading the module via a CDN, we should update to the package at the top of the script. onnxruntime-web main.js require const ort = require('onnxruntime-web'); 3. Save the configuration file, defined by the Runtime team, below as ONNX webpack.config.js // Copyright (c) Microsoft Corporation. // Licensed under the MIT license. const path = require('path'); const CopyPlugin = require("copy-webpack-plugin"); module.exports = () => { return { target: ['web'], entry: path.resolve(__dirname, 'main.js'), output: { path: path.resolve(__dirname, 'dist'), filename: 'bundle.min.js', library: { type: 'umd' } }, plugins: [new CopyPlugin({ // Use copy plugin to copy *.wasm to output folder. patterns: [{ from: 'node_modules/onnxruntime-web/dist/*.wasm', to: '[name][ext]' }] })], mode: 'production' } }; 4. Run to compile the bundle. npx webpack 5. Finally, before reloading the server, we need to update . index.html Remove the script tag to stop loading the compiled package from the CDN. ort.min.js Load the code dependencies from (which contains all our dependencies bundled and minified by ) instead of bundle.min.js webpack main.js should now look something like this. index.html <!DOCTYPE html> <html> <header> <title>ONNX Runtime Web Tutorial</title> </header> <body> <script src="bundle.min.js.js"></script> </body> </html> To make building and launching the live server easier, you could define and a scripts in build serve package.json "scripts": { "build": "npx webpack", "serve": "npm run build && npx light-server -s . -p 8080" } Image Classifier Let's put this model to work and implement the image classification pipeline. We will need some utility functions to load, resize, and display the image - the object is perfect for this. In addition, image classification systems typically have lots of magic built into the pre-processing pipeline, this is quite trivial to implement in using frameworks like , unfortunately this is not the case with . It follows that we will have to implement our pre-processing from scratch to transform the image data into the correct input format. canvas Python numpy JavaScript 1. DOM Elements We will need some HTML elements to interact with and display the data. File input, for loading a file from disk. <label for="fileIn"><h2>What am I?</h2></label> <input type="file" id="file-in" name="file-in"> Image displays, we want to display both the input and re-scaled image. <img id="input-image" class="input-image"></img> <img id="scaled-image" class="scaled-image"></img> Classification target, to display our inference results. <h3 id="target"></h3> 2. Image load and display We want to load an image from file and display it. Back in , we will get the file input element from the DOM and use to read the data into memory. Following this, the image data will be passed to which will draw the image using the 2D context. main.js FileReader handleImage canvas const canvas = document.createElement("canvas"), ctx = canvas.getContext("2d"); document.getElementById("file-in").onchange = function (evt) { let target = evt.target || window.event.src, files = target.files; if (FileReader && files && files.length) { var fileReader = new FileReader(); fileReader.onload = () => onLoadImage(fileReader); fileReader.readAsDataURL(files[0]); } } function onLoadImage(fileReader) { var img = document.getElementById("input-image"); img.onload = () => handleImage(img); img.src = fileReader.result; } function handleImage(img) { ctx.drawImage(img, 0, 0) } 2. Preprocess and convert image to tensor Now that we can load and display an image, we want to move to extracting and processing the data. Remember that our model takes in a matrix of shape , this means we will have to resize the image to support any input image and perhaps also transpose the dimensions depending on how we extract the image data. [1, 3, 224, 224] To resize and extract image data, we will use the context again. Let's define a function that does this. has the necessary elements in scope to immediately draw the scaled image so we will also do that here. canvas processImage processImage function processImage(img, width) { const canvas = document.createElement("canvas"), ctx = canvas.getContext("2d") // resize image canvas.width = width; canvas.height = canvas.width * (img.height / img.width); // draw scaled image ctx.drawImage(img, 0, 0, canvas.width, canvas.height); document.getElementById("scaled-image").src = canvas.toDataURL(); // return data return ctx.getImageData(0, 0, width, width).data; } We can now add a line to the function which calls . handleImage processImage const resizedImageData = processImage(img, targetWidth); Finally, let's implement a function called which applies the transforms needed to get the image data ready to be used as input to the model. should apply three transforms: imageDataToTensor imageDataToTensor Filter out the alpha channel, our input tensor should contain 3 channels corresponding to the RGB channels. returns data in the shape so we need to transpose the data to the shape ctx.getImageData [224, 224, 3] [3, 224, 224] returns a with values ranging 0 to 255, we need to convert the values to and store them in a to construct our tensor input. ctx.getImageData UInt8ClampedArray int float32 Float32Array function imageDataToTensor(data, dims) { // 1a. Extract the R, G, and B channels from the data const [R, G, B] = [[], [], []] for (let i = 0; i < data.length; i += 4) { R.push(data[i]); G.push(data[i + 1]); B.push(data[i + 2]); // 2. skip data[i + 3] thus filtering out the alpha channel } // 1b. concatenate RGB ~= transpose [224, 224, 3] -> [3, 224, 224] const transposedData = R.concat(G).concat(B); // 3. convert to float32 let i, l = transposedData.length; // length, we need this for the loop const float32Data = new Float32Array(3 * 224 * 224); // create the Float32Array for output for (i = 0; i < l; i++) { float32Data[i] = transposedData[i] / 255.0; // convert to float } const inputTensor = new ort.Tensor("float32", float32Data, dims); return inputTensor; } 3. Display classification result Almost there, let’s wrap up some loose ends to get the full inference pipeline up and running. First, stitch together the image processing and inference pipeline in . handleImageData function handleImage(img, targetWidth) { ctx.drawImage(img, 0, 0); const resizedImageData = processImage(img, targetWidth); const inputTensor = imageDataToTensor(resizedImageData, DIMS); run(inputTensor); } The output of the model is a list of activation values corresponding to the probability that a certain class is identified in the image. We need to get get the most likely classification result by getting the index of the maximum value in the output data, this is done using an function. argMax function argMax(arr) { let max = arr[0]; let maxIndex = 0; for (var i = 1; i < arr.length; i++) { if (arr[i] > max) { maxIndex = i; max = arr[i]; } } return [max, maxIndex]; } 3. Finally, needs to be re-factored to accept a tensor input. We also need to use the max index to actually retrieve the results from a list of . I've pre-converted this list to JSON and we will load it into our script using - you can find the JSON file in the code repository linked at the start and end of the tutorial. run() ImageNet classes require const classes = require("./imagenet_classes.json").data; async function run(inputTensor) { try { const session = await ort.InferenceSession.create('./alexnet.onnx'); const feeds = { input1: inputTensor }; const results = await session.run(feeds); const [maxValue, maxIndex] = argMax(results.output1.data); target.innerHTML = `${classes[maxIndex]}`; } catch (e) { console.error(e); // non-fatal error handling } } That’s it! All that’s left is to re-build our bundle, serve the app, and start classifying some images. As you test the app, you will notice that prediction quality is not as good as it could be. This is primarily because the current image processing pipeline is still rather rudimentary and can be improved in a number of ways, for example we could implement improved resizing, center-cropping, and/or normalization. Maybe food for a next tutorial, or I’ll just leave it up to you to explore! Conclusion That’s it, we’ve built a web app with a machine-learning model running natively in the browser! You can find the full code (including styles and layout) in this on GitHub. I appreciate any and all feedback so feel free to share any or . code repository Issues Stars Thank you for reading! Also published on: https://rekoil.io/blog/onnxruntime-web-tutorial