Too Long; Didn't Read
The best models are at the cutting edge of our ability to scale on resources. But generally the models that learn the best aren't the fastest. Model compression enables us to ship these models and run them on-device with low latency and low power consumption. The road to truly intelligent and adaptive systems might rely on what I've been inspired by the old gods of creativity and genius to call model decompression. The principle is when you detect your model drifting in its performance, add back some capacity in the compressed model and learn on the fly.