paint-brush
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Experimental Resultsby@bayesianinference

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Experimental Results

by Bayesian InferenceApril 2nd, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This paper investigates how the configuration of on-device hardware affects energy consumption for neural network inference with regular fine-tuning.
featured image - PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Experimental Results
Bayesian Inference HackerNoon profile picture

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison;

(2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, [email protected].

B EXPERIMENTAL RESULTS

In this section, we further demonstrate the tradeoff between memory frequency and maximum GPU frequency by presenting an array of results. These results underline the interesting observation that the energy consumption patterns vary for the same model operating on different devices. Furthermore, even for the same model device pairing, the optimization landscape can be significantly influenced by the batch size. This underlines the complexities of energy optimization and the need for an adaptive framework that can take these factors into account. Figures 6 − 12 show the energy consumption patterns of EfficientNet and Bert on Jetson TX2 and Orin under various batch sizes. Table 7 shows the optimal CPU frequency and corresponding energy consumption reduction in image preprocessing.


Figure 6. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP16 on JetsonTX2 versus varying Memory and GPU frequency with batch size fixed at 1.


Figure 7. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP32 on JetsonTX2 versus varying Memory and GPU frequency with batch size fixed at 1.


Figure 8. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 8.


Figure 9. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B4 at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 16.


Figure 10. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 16.


Figure 11. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 8.


Figure 12. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 1.


Figure 13. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B4 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 8.