Why Self-Distillation Can Make AI Reasoning Worse

Overview

Self-distillation—where a language model learns from its own outputs—sometimes makes reasoning worse instead of better
The paper investigates why this counterintuitive degradation happens
The core issue involves how models lose access to diverse reasoning paths during the distillation process
Training on filtered, high-confidence outputs removes valuable intermediate steps that help with complex problems
The research reveals a fundamental tension between simplifying a model and preserving its reasoning abilities

Plain English Explanation

Imagine you're trying to teach someone to solve a difficult math problem. One approach is to show them your own solution and have them memorize it. That sounds reasonable, but there's a catch: if you only show them your final answer without the full working, they might memorize a pattern that works for similar problems but falls apart when something changes slightly.

Self-distillation works like this. A large language model generates outputs, and then a smaller model learns from those outputs. The intuition seems sound—the student learns from the teacher. But the paper reveals something surprising: when you do this, the smaller model often gets worse at reasoning tasks.

The key problem is loss of reasoning diversity. When the large model generates an answer, it typically shows one path to the solution. But there might be many valid ways to reach the correct answer. By training only on one path (or a few cherry-picked paths), the smaller model misses out on understanding the different approaches. This matters enormously for reasoning because problems often require flexible thinking.

Think of it like learning to navigate a city. If someone only shows you one route from point A to point B, you learn that specific path. But if you explore multiple routes, you develop a better mental map of the city and can adapt when roads are closed. Reasoning in language models works similarly—exposure to varied solution methods builds deeper understanding.

Key Findings

Models trained through self-distillation perform significantly worse on complex reasoning tasks compared to the original teacher model
The degradation correlates strongly with reduced diversity in the reasoning paths present in training data
When training data includes multiple valid solution paths to the same problem, performance degradation decreases substantially
Filtering outputs by confidence (keeping only high-confidence predictions) amplifies the reasoning problem by removing alternative approaches
The effect varies by task complexity—simple tasks show less degradation than problems requiring multi-step reasoning

Technical Explanation

The researchers examine the distillation process by analyzing what information the student model actually receives. The experimental design compares three scenarios: training a student model on diverse outputs from the teacher, training on single best outputs, and training on filtered high-confidence outputs.

The core finding involves reasoning path diversity. When a teacher model solves a problem, it could follow multiple valid logical chains to reach the correct answer. Standard self-distillation typically captures only one path—usually the first one generated or the one with highest confidence. The student model then learns to replicate that single approach.

For complex reasoning tasks, this creates a representation bottleneck. The student model doesn't learn abstract reasoning principles; instead, it memorizes specific solution patterns. When faced with a novel problem that requires combining reasoning steps differently, the student model fails because it lacks exposure to the variety of approaches needed.

The paper demonstrates this through experiments measuring both task accuracy and the diversity of reasoning steps in training data. Models trained on curated datasets containing multiple solution paths for each problem maintain reasoning capability far better than those trained on single outputs. This suggests that improved reasoning in smaller models requires explicit attention to preserving solution diversity.

The implications extend to how we think about model training generally. Simply copying from a larger model isn't sufficient for complex capabilities. The teaching process must preserve the structural variety that enables flexible problem-solving.

Critical Analysis

The research addresses an important blind spot in distillation methodology, but several limitations deserve consideration. The experiments focus primarily on mathematical and logical reasoning tasks. It remains unclear whether the findings generalize equally to other reasoning domains like scientific explanation or creative problem-solving.

The paper could strengthen its analysis by examining whether the diversity problem is fundamental to distillation or stems from specific implementation choices. For instance, do different filtering criteria produce different results? What if you deliberately select diverse outputs rather than high-confidence ones? The research touches on this but doesn't exhaustively explore the parameter space.

There's also a practical question about scalability. Maintaining multiple solution paths for each training example increases data requirements substantially. The paper acknowledges this tradeoff but doesn't fully explore whether practitioners can achieve reasonable results with moderate diversity rather than maximal diversity.

The research assumes that diversity in the training data naturally translates to reasoning flexibility in the learned model. While intuitive, this assumption could use deeper investigation. Do all reasoning paths contribute equally to model capability, or do certain paths teach more fundamental principles than others?

Additionally, understanding how distilled reasoning models represent knowledge internally could clarify whether the problem is truly about missing diversity or about how knowledge gets compressed during training. This representational angle might reveal more targeted solutions.

Conclusion

Self-distillation degrades reasoning capability because it contracts the space of valid solution approaches into a narrow channel. Models learn patterns rather than principles, and when faced with new problems, they lack the flexible reasoning toolkit that the larger model possessed.

The core takeaway is straightforward but consequential: preserving reasoning diversity during training matters more than previously recognized. For anyone working on model compression or student-teacher training, this suggests that simple approaches—filtering by confidence or selecting single outputs—carry hidden costs for complex tasks.

The research opens practical questions about how to maintain diversity efficiently and which reasoning paths matter most. As models become smaller for deployment in resource-constrained environments, understanding this tradeoff becomes increasingly important. The field needs methods that capture the essential flexibility of reasoning without requiring prohibitive amounts of training data.

This is a Plain English Papers summary of a research paper called Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.