PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Predictor Analysis

Written by bayesianinference | Published 2024/04/02
Tech Story Tags: neural-networks | polythrottle | neural-network-inference | edge-devices | on-device-hardware | fine-tuning | nvidia-triton | efficientnet

TLDRThis paper investigates how the configuration of on-device hardware affects energy consumption for neural network inference with regular fine-tuning.via the TL;DR App

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison;

(2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, [email protected].

Table of Links

D PREDICTOR ANALYSIS

We vary the latency SLO to assess how the predictor schedules the fine-tuning requests. We replay a 60-second stream where we initially set the latency SLO to 250ms for the first half (30 seconds), and then increase it to 700ms for the remainder. As shown in Figure 14, under stringent latency conditions, the predictor deduces that it is impractical to schedule fine-tuning requests while adhering to the latency SLO, hence no fine-tuning requests are scheduled. Conversely, when the latency SLO is more relaxed, the predictor determines that it is feasible to schedule fine-tuning requests and sequentially schedules each request once the preceding one is completed and has issued a completion signal.


Written by bayesianinference | At BayesianInference.Tech, as more evidence becomes available, we make predictions and refine beliefs.
Published by HackerNoon on 2024/04/02