PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Arithmetic Intensity

Too Long; Didn't Read

This paper investigates how the configuration of on-device hardware affects energy consumption for neural network inference with regular fine-tuning.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison;

(2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, myan@cs.wisc.edu.

Table of Links

C ARITHMETIC INTENSITY

The arithmetic intensity of a 2D convolution layer can be computed by the following equation:

The notations used in equation 1 can be found in table 8.

The FLOPs term captures the total computation of each workload, while the arithmetic intensity term captures how much computation power and memory bandwidth will affect the final performance. Combining the aforementioned features with an intercept term, which captures the fixed overhead in neural network inference, we can build a model that predicts inference latency if the hardware operating frequency is stable.

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Arithmetic Intensity

Too Long; Didn't Read

Table of Links

C ARITHMETIC INTENSITY

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Arithmetic Intensity

Too Long; Didn't Read

Table of Links

C ARITHMETIC INTENSITY

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics