Handtrack.js library allows you track a user’s hand (bounding box) from an image in any orientation, in 3 lines of code.
Here’s an example interface built using Handtrack.js to track hands from webcam feed. Try the demo here.
A while ago, I was really blown away by results from an experiment using TensorFlow object detection api to track hands in an image. I made the trained model and source code available, and since then it has been used to prototype some rather interesting usecases (a tool to help kids spell, extensions to predict sign language, hand ping pong, etc). However, while many individuals wanted to experiment with the trained model, a large number still had issues setting up Tensorflow (installation, TF version issues, exporting graphs, etc). Luckily, Tensorflow.js addresses several of these installations/distribution issues, as it is optimized to run in the standardized environment of browsers. To this end, I created Handtrack.js as a library to allow developers quickly prototype hand/gesture interactions powered by a trained hand detection model.
Runtime: 22 FPS. On a Macbook Pro 2018, 2.2 Ghz, Chrome browser. 13 FPS on a Macbook Pro 2014 2.2GHz.
The goal of the library is to abstract away steps associated with loading the model files, provide helpful functions and allow a user detect hands in an image without any ML experience. You do not need to train a model (you can if you want). You do not need to export any frozen graphs or saved models. You can just get started by including handtrack.js in your web application (details below) and calling the library methods.
Interactive demo built using Handtrack.js here, and the source code on GitHub is here. Love tinkering in Codepen? Here’s a handtrack.js example pen you can modify.
victordibia/handtrack.js_A library for prototyping realtime hand detection (bounding box), directly in the browser. - victordibia/handtrack.js_github.com
You can use handtrack.js
simply by including the library URL in a script tag or by importing it from npm
using build tools.
The Handtrack.js minified js file is currently hosted using jsdelivr, a free open source cdn that lets you include any npm package in your web application.
<script src="https://cdn.jsdelivr.net/npm/handtrackjs/dist/handtrack.min.js"> </script>
Once the above script tag has been added to your html page, you can reference handtrack.js using the handTrack
variable as follows.
const img = document.getElementById('img');handTrack.load().then(model => {model.detect(img).then(predictions => {console.log('Predictions: ', predictions) // bbox predictions});});
The snippet above prints out bounding box predictions for an image passed in via the img tag. By submitting frames from a video or camera feed, you can then “track” hands in each frame (you will need to keep state of each hand as frames progress).
Demo interface using handtrack.js to track hands in an image. You can use the `renderPredictions()` method to draw detected bounding boxes and source image in a canvas object.
You can install handtrack.js
as an npm package using the following
npm install --save handtrackjs
An example of how you can import and use it in a React app is given below.
import * as handTrack from 'handtrackjs';
const img = document.getElementById('img');
// Load the model.handTrack.load().then(model => {// detect objects in the image.console.log("model loaded")model.detect(img).then(predictions => {console.log('Predictions: ', predictions);});});
You can vary the confidence threshold (predictions below this value are discarded). Note: The model tends to work best with well lighted image conditions. The reader is encouraged to experiment with confidence threshold to accommodate various lighting conditions. E.g. a low lit scene will work better with a lower confidence threshold.
If you are interested in prototyping gesture based (body as input) interactive experiences, Handtrack.js can be useful. The user does not need to attach any additional sensors or hardware but can immediately take advantage of engagement benefits that result from gesture based/body-as-input interactions.
A simple body-as-input interaction prototyped using Handtrack.js where the user paints on a canvas using the tracked location of their hand. In this interaction the maxNumber of detections modelParameter value is set to 1 to ensure only one hand is tracked.
Some (not all) relevant scenarios are listed below:
Body as input in the browser. Results from Handtrack.js (applied to webcam feed) controls of a pong game. Try it here. Modify it here on Codepen.
Body as input on a large display. Results from Handtrack.js (applied to webcam feed) can be mapped to the controls of a game.
Several methods are provided. The two main methods including the load()
which loads a hand detection model and detect()
method for getting predictions.
load()
accepts optional model parameters that allow you control the performance of the model. This method loads a pretrained hand detection model in the web model format (also hosted via jsdelivr).
detect()
accepts an input source parameter (a html img, video or canvas object) and returns bounding box predictions on the location of hands in the image.
const modelParams = {flipHorizontal: true, // flip e.g for videoimageScaleFactor: 0.7, // reduce input image size .maxNumBoxes: 20, // maximum number of boxes to detectiouThreshold: 0.5, // ioU threshold for non-max suppressionscoreThreshold: 0.79, // confidence threshold for predictions.}
const img = document.getElementById('img');
handTrack.load(modelParams).then(model => {model.detect(img).then(predictions => {console.log('Predictions: ', predictions);});});
prediction results are of the form
[{bbox: [x, y, width, height],class: "hand",score: 0.8380282521247864}, {bbox: [x, y, width, height],class: "hand",score: 0.74644153267145157}]
Other helper methods are also provided
model.getFPS()
: get FPS calculated as number of detections per second.model.renderPredictions(predictions, canvas, context, mediasource)
: draw bounding box (and the input mediasource image) on the specified canvas.model.getModelParameters()
: returns model parameters.model.setModelParameters(modelParams)
: updates model parameters.dispose()
: delete model instancestartVideo(video)
: start camera video stream on given video element. Returns a promise that can be used to validate if user provided video permission.stopVideo(video)
: stop video stream.Underneath, Handtrack.js uses the Tensorflow.js library — a flexible and intuitive APIs for building and training models from scratch in the browser. It provides a low-level JavaScript linear algebra library and a high-level layers API.
Steps in creating a Tensorflow.js -based JavaScript Library.
The data used in this project is primarily from the Egohands dataset. This consists of 4800 images of the human hand with bounding box annotations in various settings (indoor, outdoor), captured using a Google glass device.
A model is trained to detect hands using the Tensorflow Object Detection API. For this project, a Single Shot MultiBox Detector (SSD) was used with the MobileNetV2 Architecture. Results from the trained model were then exported as a savedmodel
. Additional details on how the model was trained can be found here and on the Tensorflow Object Detection API github repo.
Tensorflow.js provides a model conversion tool that allows you convert a savedmodel
trained in Tensorflow python to the Tensorflow.js webmodel
format that can be loaded in the browser. This process is mainly around mapping operations in Tensorflow python to their equivalent implementation in Tensorflow.js. It makes sense to inspect the saved model graph to understand what is being exported. Finally, I followed the suggestion by authors of the Tensorflow coco-ssd example [2] in removing the post processing part of the object detection model graph during conversion. This optimization effectively doubled the runtime for the detection/prediction operation in the browser.
The library was modeled after the tensorflowjs coco-ssd example (but not written in typescript). It consists of a main class with methods to load the model, detect hands in an image, and a set of other helpful functions e.g. startVideo, stopVideo, getFPS(), renderPredictions(), getModelParameters(), setModelParameters()etc. A full description of methods are on Github .
The source file is then bundled using rollup.js, and published (with the webmodel files) on npm. This is particularly valuable as jsdelivr automatically provides a cdn for npm packages. (It might be the case that hosting the file on other CDNs might be faster and the reader is encouraged to try out other methods). At the moment handtrackjs is bundled with tensorflowjs (v0.13.5) mainly because as at the time of writing this library, there were version issues where tfjs (v0.15) had datatype errors loading image/video tags as tensors. As new versions fix this issue, it will be updated.
Web Workers is a simple means for web content to run scripts in background threads. The worker thread can perform tasks without interfering with the user interface. In addition, they can perform I/O using
[XMLHttpRequest](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest)
(although theresponseXML
andchannel
attributes are always null). Once created, a worker can send messages to the JavaScript code that created it by posting messages to an event handler specified by that code (and vice versa).This article provides a detailed introduction to using web workers.
I really look forward to how others who use or extend this project solve some of these limitations.
Handtrack.js represents really early steps with respect to the overall potential in enabling new forms of human computer interaction with AI. In the browser. Already, there have been excellent ideas such as posenet for human pose detection, and handsfree.js for facial expression detection in the browser.
Above all, the reader is invited to imagine. Imagine interesting use cases where knowing the location of a users hand can make for more engaging interactions.
In the meantime, I will be spending more time on the following
If you would like to discuss this in more detail, feel free to reach out on Twitter, Github or Linkedin. Many thanks to Kesa Oluwafunmilola who helped with proof reading this article.
[2] Tensorflow.js Coco-ssd example.This library uses code and guidance from the Tensorflow.js coco-ssd example which provides a library for object detection trained on the MSCOCO dataset. The optimizations suggested in the repo (stripping out a post processing layer) was really helpful (2x speedup).