To fine-tune a LLAMA 65 billion parameter model, we need 780 GB of GPU memory. That is about 16 A40 GPUs. The solution lies with QLoRA where Q stands for Quantisation.
Company Mentioned
Shrinivasan Sankar
@aibites
I am an AI Reseach Engineer. I was formerly a researcher @Oxford VGG before founding the AI Bites YouTube channel.
STORY’S CREDIBILITY
Original Reporting
This story contains new, firsthand information uncovered by the writer.