Nemotron-Cascade-2-30B-A3B Brings Olympiad-Level Reasoning to Sparse AI

Model overview

`Nemotron-Cascade-2-30B-A3B` is a 30 billion parameter mixture-of-experts model with only 3 billion parameters active at inference time. Built by NVIDIA and post-trained from the `Nemotron-3-Nano-30B-A3B-Base`, it achieves gold medal performance on both the 2025 International Mathematical Olympiad and the International Olympiad in Informatics. The model operates in two distinct modes: a thinking mode for reasoning-heavy tasks and an instruct mode for direct question-answering without extended reasoning. Compared to the smaller Nemotron-Cascade-8B, this model provides substantially higher capacity while maintaining computational efficiency through its sparse architecture.

Model inputs and outputs

The model accepts text inputs through a ChatML-formatted chat template supporting both single-turn and multi-turn conversations. It generates text outputs with optional reasoning content enclosed in `` and `` tags. Tool responses can be integrated into conversations through `` and `` wrapper tags.

Inputs

- User queries in natural language across mathematical, coding, and reasoning domains

- System prompts that define model behavior and constraints

- Tool specifications when function calling is needed

- Previous conversation history for multi-turn interactions

Outputs

- Reasoning tokens enclosed in thinking tags for step-by-step problem solving

- Direct responses in instruct mode for immediate answers

- Tool calls formatted in XML for executing functions

- Text completions following ChatML format standards

Capabilities

The model excels at mathematical reasoning with a score of 35 points on the 2025 IMO and 92.4 on AIME 2025. It handles complex coding challenges, achieving 439.3 on the 2025 IOI and solving 10 out of 12 ICPC World Finals problems. The model performs at 87.2 on LiveCodeBench and maintains strong alignment capabilities with an 83.5 average on ArenaHard v2. It can engage in creative writing, process long contexts up to 1 million tokens, and handle tool-integrated reasoning when properly prompted.

What can I use it for?

Competitive programming preparation and automated problem solving benefit from its strong performance on olympiad-level challenges. Educational institutions can deploy it for tutoring students in mathematics and computer science. Software development teams can use it for code generation and bug fixing tasks. Researchers exploring sparse mixture-of-experts architectures can examine its design through the [technical report on arxiv 2603.19220](https://arxiv.org/abs/2603.19220). The reasoning capabilities make it suitable for scientific problem solving, and its tool integration support enables agentic workflows for complex task automation.

Things to try

Test the thinking mode on challenging mathematics problems to observe how the model develops solutions step-by-step before providing final answers. Compare results between thinking and instruct modes on the same queries to understand the performance trade-off between reasoning depth and response speed. Experiment with tool calling by providing Python execution functions to see how the model breaks down complex tasks into executable steps. Use multi-turn conversations with thinking mode enabled in early exchanges to build context, then switch to instruct mode for follow-up questions to reduce context length overhead. Set temperature to 1.0 and top_p to 0.95 as recommended for optimal performance across different problem types.

This is a simplified guide to an AI model called Nemotron-Cascade-2-30B-A3B maintained by nvidia. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.