A from researchers at Google DeepMind demonstrates that large language models like GPT-3 are not just adept at generating human-like text - they are also excellent general-purpose compressors. This means they can compress many types of data like text, images, and audio down to very small sizes, similar to specialized compression algorithms like gzip and PNG. new paper Why Should We Care About Compression? Data compression is a fundamental capability in computing and . Compressing data means we can store and transmit it using less memory, disk space, and bandwidth. This saves costs and allows systems to scale. AI But more importantly, good compression also indicates a deep understanding of the structure and patterns in data. To compress well, an algorithm needs to spot redundancies and exploit statistical regularities. So, compression capability acts as a benchmark for how much knowledge an AI system has learned. The fact that huge natural language models can compress varied data types so efficiently has major implications: It demonstrates they have learned general abilities beyond just processing language. Their skill at compression reflects an understanding of images, audio, video, and more. There is potential to apply them to practical compression tasks. How Was the Research Conducted? The tested the compression capabilities of different-sized language models on three different 1GB datasets: DeepMind researchers Text - The first 1 billion bytes of Wikipedia. Images - 1 million 32x64px patches extracted from ImageNet. Audio - Speech samples from the LibriSpeech dataset. They compared the models against standard compression algorithms like PNG, JPEG, and FLAC, which are specialized for images, audio, etc. The language models are compressed using - a technique that turns a predictive model into a compressor. The more accurately a model can predict the next byte in a file, the better it can compress the data. arithmetic coding They tested three main types of language models: Smaller Transformer models trained from scratch on Wikipedia text. Larger foundation models like Chinchilla-70B are pre-trained on huge text datasets. As a baseline, general-purpose compressors like gzip and LZMA. Key Technical Findings The experiments yielded several insightful results: Despite being trained only on text, the foundation models than methods specialized for each domain. For example, Chinchilla-70B compressed ImageNet images 43.4% smaller than PNG. compressed all modalities better Confirmed scaling laws: , but only up to a point. After a certain size, the model itself took up too much space. Bigger models compressed better There was a direct link between model size and training data size. . However, the model size must be suited to the dataset size. More data enables bigger models Tokenization like BPE, while useful for language tasks, generally slightly. This is because it makes the prediction task harder. decreased compression performance Longer contexts improved compression, as models could exploit more sequential dependencies. Key Implications These findings have significant implications: They demonstrate language models have learned beyond just text. Their versatility likely stems from pretraining on vast datasets. very general capabilities The models' reflects an understanding of images, audio, and more at a deep statistical level. strong compression across modalities There are inherent . Bigger datasets allow bigger models, but the size must match. tradeoffs between model scale, datasets, and compression performance The results provide a - compression considers model size, unlike log loss. Scaling hits limits. new perspective on model scaling laws The equivalence between prediction and compression means these models could have for compressing images, video, and more. However, model size may be prohibitive compared to current methods. practical applications The compression viewpoint offers into model generalization, failure modes, tokenization, and other aspects of deep learning. new insights In summary, this research shows have become adept general-purpose learners. Their exceptional compression capabilities demonstrate an expansive understanding of patterns in textual, visual, and audio data. There is still progress to be made, but these models show increasing competence as general systems for automating prediction and compression across modalities. large language models Also published . here

This story contains new, firsthand information uncovered by the writer.

Breathing Life into Still Photos: Exploring Neural Motion Textures

Who Could Have Guessed LLMs are Great at Compressing Images and Audio: Reports From New Research

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Beginner's Guide to Understanding Unstructured Data Analysis with LangChain and DeepInfra

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

A Beginner's Guide to Understanding Unstructured Data Analysis with LangChain and DeepInfra

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps