Model Quantization in Deep Neural Networksby@aibites
1,398 reads

Model Quantization in Deep Neural Networks

tldt arrow
EN
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Quantization is the process of converting values from a continuous range to a smaller set of discrete values, often used in deep neural networks to enhance inference speed on various devices. This conversion involves mapping high-precision formats like float32 to lower-precision formats like int8. Quantization can be uniform (linear mapping) or non-uniform (non-linear mapping). In symmetric quantization, zero in the input maps to zero in the output, while asymmetric quantization shifts this mapping. The scale factor and zero point are crucial parameters for quantization, determined through calibration. Quantization modes include Post Training Quantization (PTQ) and Quantization Aware Training (QAT), with QAT offering better model accuracy through fine-tuning. It involves using fake quantizers to make quantization compatible with the differentiability required for fine-tuning.
featured image - Model Quantization in Deep Neural Networks
Shrinivasan Sankar HackerNoon profile picture

@aibites

Shrinivasan Sankar

I am an AI Reseach Engineer. I was formerly a researcher @Oxford VGG before founding the AI Bites YouTube channel.


Receive Stories from @aibites

react to story with heart
Shrinivasan Sankar HackerNoon profile picture
by Shrinivasan Sankar @aibites.I am an AI Reseach Engineer. I was formerly a researcher @Oxford VGG before founding the AI Bites YouTube channel.
Read my stories

RELATED STORIES

L O A D I N G
. . . comments & more!