Get ready for an AI earthquake! A team of UCLA researchers ( n, , , ) have dropped some major keys to AGI. It's not only the code to seriously human-sounding AI, but they've also gone and open-sourced the whole thing. @zxche @Yihe__Deng @HuizhuoY @Kaixuan_Ji_19, @QuanquanGu Now you can develop better LLMs without needing to feed it tons of new, human-annotated data. First, let's focus on the game-changer here: a self-teaching language model. This method lets a language model becoming better and better without massive amounts of new, externally curated data. teach itself, Introducing SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models I went full deep-dive mode – read their paper (" "), scoured the insights on forums like , , and with Google Gemini Ultra and GPT-4 Turbo – and the core concept of SPIN knocked my tech-loving metaphorical socks off: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models HackerNews X Reddit The 'Conversation Partner' Trick Imagine starting with a language model that has mastered basic skills (let's say conversational etiquette). With SPIN, the model building a dataset from what it already knows. generates internal 'conversations,' Instant knowledge expansion! Step two involves unleashing a new model and giving it one task: spot the difference between and genuine human communication. This forces the original model to up its game, getting more and more with every response to avoid detection. machine-generated chats human-like Here's where things get interesting. They started with (already fine-tuned with ). SPIN unleashed an iterative training system with this base model, improving it exponentially without relying on tons of new externally created data. zephyr-7b-sft-full UltraChat corpus SPIN vs. Traditional AI Training (DPO): A New Champion? We usually think machine learning, particularly for these huge language models, requires boatloads of carefully curated and labeled data. methods involve humans painstakingly rating AI responses against each other for training. Not only is this labor-intensive, but it also balloons costs as a dataset grows. Direct Preference Optimization (DPO) Direct Preference Optimization (DTO) is a training method where a model is fine-tuned using a dataset of preferences, often involving human judgments that decide which of the model-generated responses are preferred. This method requires collecting new data where each piece is labeled based on these preferences, which can be resource-intensive. In contrast, SPIN utilizes , significantly reducing the need for new data. iterative self-play By the first iteration, , highlighting its efficiency and effectiveness in leveraging existing data to enhance model performance. SPIN's performance already exceeds that of DPO in most cases SPIN showcases its strength by achieving . The process of , methodically enhances the model's performance across multiple iterations, , especially on challenging benchmarks like TruthfulQA and GSM8k. on-par performance with models trained on more extensive datasets iterative training showcasing substantial improvements So, SPIN outperforms conventional training methods, including DPO, by efficiently leveraging synthetic datasets generated through self-play, without the need for additional human-annotated data. What are SPIN's Strengths and Costs? SPIN throws a curveball with its self-play dynamic. Think of it like a language model sparring with itself in a linguistic boxing ring, with each round teaching it new tricks. SPIN's data efficiency bypasses the need for new human-annotated datasets. But more importantly, it , . accelerates the improvement loop making the model increasingly adept at generating human-like text Not only does SPIN seem to match models trained on larger external datasets, but its iterative power means consistent gains as it essentially studies its own output. Mindblowing, right? Okay, Let's Talk the Elephant in the Room – COST Nous Research co-founder has a point. These big ol' language models don't get smarter for free. Iteratively re-training with SPIN involves the expensive process of Supervised Fine-Tuning (SFT) each time. @Teknium1 However, he also mentions that "I think its worth it!". Also, the long-term benefits of quicker evolution and potentially less dependency on human-annotated data outweigh the initial investment? That's the exciting question! BOOM! It's Open-Source AI Time Just yesterday, , associate professor of computer science at UCLA and director of AI research at ByteDance, announced that . This doesn't just mean code and datasets, but pre-trained models to kickstart your own AI journeys. Quanquan Gu anyone can now use the SPIN model and dataset SPIN mirrors human thought processes. By generating text that feels human, SPIN hints at the foundational elements of reasoning that future AI could do. You know how some LLM outputs feel robotic right? Well, SPIN is different. It actually mirrors the way humans think. The way it writes feels so natural, it's like a peek into how future AI might be able to reason for themselves. This isn't just about making chatbots sound nicer. It's about creating a kind of digital thinking that works like ours. That kind of AI would be so much more flexible and capable of real understanding. While SPIN is a big leap forward in making language models sound more natural, . it's easy to get excited and overestimate what it means The text it produces is impressive (you can take a look to the database), but it's important to remember that AI doesn't yet have the capacity for true independent reasoning. While SPIN isn't true , the way it mimics human-like writing demonstrates impressive advances in how AI could process and use language in the future. AGI Even so, it does suggest amazing possibilities for how AI and language might develop in the future (if you remember that we are at the beginning of the hockey stick, the future is not far from today...) The ripple effects will be huge and here's your access pass: Code: Available on : GitHub https://github.com/uclaml/SPIN Data: Hosted on , the dataset is readily accessible for those eager to apply SPIN methodologies: Hugging Face https://huggingface.co/collections/UCLA-AGI/datasets-spin-65c3624e98d4b589bbc76f3a… Models: Pre-trained models are also available, offering a head start for experimenting with SPIN-enhanced language models: https://huggingface.co/collections/UCLA-AGI/zephyr-7b-sft-full-spin-65c361dfca65637272a02c40… Project Page: For comprehensive insights and further information, the project page is an invaluable resource: https://uclaml.github.io/SPIN/ To sum up, its iterative, self-improving methodology is a significant advancement towards creating LLM that can engage in genuinely human-like communication. Originally shared on my . X account