Edge AI—also referred to as —commonly refers to the components required . on-device AI to run an AI algorithm locally on a hardware device Of late, it means running deep learning algorithms on-device, and most articles tend to focus only on one component i.e. inference. This series of articles will shed some light on the other components and challenges of Edge AI. This series is divided as follows: Part 1: Why Run AI Algorithms on Edge Part 2: Edge AI Components - Sensor Data Capture Part 3: Edge AI Components - Pre-processing Part 4: Edge AI Components - Inference Part 5: Edge AI Components - Performance Evaluation Part 6: Edge AI Components - Real Time Metrics Part 7: Edge AI Components - Scheduling & System Architecture Part 8: Edge AI Components - Bridging the gap between Edge AI & Cloud AI Experimental Setup Edge devices are very diverse in their cost/capabilities. To make the discussion more concrete, here’s the experimental setup used in this series: Qualcomm Snapdragon 855 Development Kit [ 4 ] * Qualcomm Snapdragon 855 Development Kit. * Object Detection as the Deep learning model to be run on an Edge device. There are a lot of good articles describing state of the art in Object detection [ paper]. We will use model for Object Detection in this series. survey Mobilenet SSD * to quickly run object detection model in nodejs environment Tensorflowjs Why run AI algorithms on Edge Why can’t we rely on cloud to run AI algorithms? After all scaling resources to run an AI/Deep learning model to match your performance needs is easier on cloud. So why should one worry about running them on an edge device with compute and power constraints? To answer this question let’s consider two scenarios: a) Cloud based architecture, where inference happens on cloud. b) Edge based architecture, where inference happens locally on device. ( . To keep the comparison as fair as possible, in both the cases a nodejs webserver along with tensorflowjs (cpu only) will be used, only difference being that in case a) webserver will run on an EC2 instance and in case b) webserver will run locally on an edge device Goal here is NOT to have an optimized implementation for a platform (cloud or edge) but rather to have a framework to do fair comparison.) Cloud based architecture Here’s how a cloud based setup would look like, it would involve the steps detailed below: Cloud only Architecture for Inference. (image references at end). Step 1: Request with input image There are two possible options here: * We can send the raw image (RGB or YUV) from edge device as it’s captured from a camera. Raw images are always bigger and takes longer to send to cloud. * We can encode the raw image to JPEG/PNG or some other lossy format before sending, decode them back to raw image on cloud before running inference. This approach would involve an additional step to decode the compressed image as most deep learning models are trained with raw images. We will cover some more ground on different raw image formats in future articles in this series. To keep the setup simple, first approach [RGB image] is used. Also HTTP is used as the communication protocol to POST an image to a REST endpoint (http://<ip-address>:<port>/detect). Step 2: Run inference on cloud is used to run inference on an EC2 (t2.micro) instance, only a single nodejs worker instance ( ) is used. * tensorflowjs no load balancing, no fail over etc * Mobilenet version used is hosted . here * ( ) is used to collect latency numbers for HTTP requests. In order to use , RGB image is base64 encoded and POST ed to an endpoint. is used to handle the POST ed image. Apache Bench ab ab express-fileupload Total latency (RGB) = Http Request + Inference Time + Http Resp ab -k -c -n -g out_aws.tsv -p post_data.txt -T http: This is ApacheBench, Version <$Revision: $> Copyright Adam Twiss, Zeus Technology Ltd, : Licensed to The Apache Software Foundation, : Benchmarking <ip-address> (be patient) Completed requests Completed requests Finished requests Server Software: Server Hostname: <port> Document Path: /detect Document Length: 22610 bytes Concurrency Level: 1 Time taken for tests: 170.875 seconds Complete requests: 250 Failed requests: 0 Keep-Alive requests: 250 Total transferred: 5705000 bytes Total body sent: 50267500 HTML transferred: 5652500 bytes Requests per second: 1.46 [#/sec] (mean) Time per request: 683.499 [ms] (mean) Time per request: 683.499 [ms] (mean, across all concurrent requests) Transfer rate: 32.60 [Kbytes/sec] received 287.28 kb/s sent 319.89 kb/s total Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 5.0 0 79 Processing: 530 683 258.2 606 2751 Waiting: 437 513 212.9 448 2512 Total: 530 683 260.7 606 2771 Percentage of the requests served within a certain time (ms) 50% 606 66% 614 75% 638 80% 678 90% 812 95% 1084 98% 1625 99% 1720 100% 2771 (longest request) 1 250 "multipart/form-data; boundary=1234567890" //<ip-address>:<port>/detect 2.3 1843412 1996 http //www.zeustech.net/ http //www.apache.org/ 100 200 250 Server Port: < > ip-address Histogram of end to end Inference Latencies for Cloud based architecture (bucket size of 1s). It shows the inference latencies for requests generated by Apache Bench (ab) in a given second. End to End Inference Latencies for Cloud based architecture sorted by response time (ms). This article explains the difference between the two plots. As we can see here 95% percentile request latency is around . 1084ms Edge based architecture Web server (which runs tensorflowjs) is running locally on an edge device (Qualcomm Snapdragon 855 Development Kit [ ]). We repeat the same steps using Apache Bench (with http requests to localhost this time instead of remote sever) and the results are as follows. 4 Histogram of end to end Inference Latencies for Edge based architecture (bucket size of 1s). It shows the inference latencies for requests generated by Apache Bench (ab) in a given second. End to End Inference Latencies for Edge based architecture sorted by response time (ms). This article explains the difference between the two plots. As we can see here 95% percentile request latency is around . 357ms Optimization Opportunities As you can see the latency numbers are fairly high, the numbers we obtained here are more like upper bound latencies, there are many optimization opportunities, some of them are detailed below: Cloud based architecture: * Have multiple nodejs worker instances and load balance between.Have multiple deployments (us-east, us-west etc) and route the request to the closest deployment. * Batch multiple input images and run batched inference on cloud. * Have a and use to accelerate inference. gpu based EC2 instance tensorflow-node-gpu * Use a different communication protocol like geared more towards IOT / cloud connectivity to avoid overheads with HTTP. MQTT Edge based architecture: * Have an optimized implementation for your Edge device. In this case for Qualcomm Snapdragon 855 Development Kit [ ] inference would be accelerated on GPU / DSP or their . 4 NPU * Most likely implementation on device would depend on native libraries through vendor frameworks like or . SNPE tensorflow-lite * Optimize the data path consisting of image capture from camera to feeding the deep learning models to run inference. Conclusion We looked in detail at one of the factors to decide if you need Edge based solutions, as we saw if your application is tolerant to cloud latencies then cloud based inference would be the quickest way to get going. However if your application is latency sensitive then you can consider Edge based solutions. Be sure to benchmark your particular use case to pick one vs the other. In addition to latency these are some of the other reasons to consider Edge based solutions: * You already have an existing deployment of Edge devices and want to leverage it to save on cloud compute costs. * Privacy, you don’t want data to ever leave an edge device. * Devices which are not fully connected / have poor connectivity to cloud, edge based solutions becomes inevitable Future articles in the series will cover different components involved in an Edge based solution. Stay tuned!. References [1] — Car Overlay image from https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=detection&c=%2Fm%2F0k4j&id=101c3faac77e2e29 Open Images Dataset V5 [2] — Original Car image https://c2.staticflickr.com/7/6021/6005573548_11b7b17c9b_o.jpg [3] — Pixel Phone image. https://pixabay.com/illustrations/google-pixel-3-google-cell-phone-3738925/ [4] — Development Kit by Intrinsyc https://www.intrinsyc.com/snapdragon-embedded-development-kits/snapdragon-855-hdk/ [5] http://www.bradlanders.com/2013/04/15/apache-bench-and-gnuplot-youre-probably-doing-it-wrong/ Machine Learning Edge Ai Deep Learning Inference Ai On Device