Meet GLM-5, Built for Long-Horizon Work

Model overview

GLM-5 is a large language model developed by zai-org designed for complex systems engineering and long-horizon agentic tasks. The model scales to 744 billion total parameters with 40 billion active parameters, representing a significant expansion from its predecessor GLM-4.5, which contained 355 billion total parameters with 32 billion active. The training data increased from 23 trillion to 28.5 trillion tokens. A key architectural innovation is the integration of DeepSeek Sparse Attention, which reduces deployment costs while maintaining long-context capabilities. The model employs a novel asynchronous reinforcement learning infrastructure called slime that improves post-training efficiency and enables more granular optimization iterations.

Model inputs and outputs

GLM-5 processes text inputs and generates text outputs, functioning as a general-purpose language model for reasoning, coding, and agent-based tasks. The model supports extended context windows and can handle complex multi-step problems, tool usage, and real-world applications like software engineering and system interactions.

Inputs

Natural language text: Questions, instructions, and prompts in various languages
System prompts: Guidance for specific task types and behaviors
Context and documents: Long-form text for analysis and reasoning
Tool specifications: Descriptions of available functions for agent tasks

Outputs

Generated text responses: Model outputs ranging from short answers to extended reasoning
Code generation: Programming solutions across multiple languages
Tool calls: Structured function calls for agentic workflows
Reasoning traces: Intermediate thinking steps for complex problems

Capabilities

The model demonstrates best-in-class performance on reasoning tasks, achieving 92.7% on AIME 2026 I and 96.9% on HMMT November 2025. In coding applications, it reaches 77.8% on SWE-bench Verified and 73.3% on SWE-bench Multilingual, showing strong performance across diverse programming scenarios. For agentic tasks, the model excels in browser interaction (75.9% on BrowseComp with context management), terminal operations (56.2% on Terminal-Bench 2.0), and cybersecurity tasks (43.2% on CyberGym). Tool usage capabilities enable it to function in complex environments requiring multiple function calls and sequential decision-making.

What can I use it for?

GLM-5 serves applications requiring advanced reasoning and autonomous behavior. Software engineering teams can use it for code review, bug detection, and feature implementation through SWE-bench performance. Research and educational contexts benefit from its mathematical reasoning for solving complex problems. Business automation becomes possible through its browser interaction and terminal operation capabilities, enabling automated testing, system administration, and data processing workflows. Creative and technical writing tasks leverage its language generation quality. Organizations can deploy the model locally using popular inference frameworks to maintain data privacy while accessing these capabilities.

Things to try

Experiment with the model's thinking mode by setting generation length to 131,072 tokens to observe detailed reasoning processes on complex mathematical or logical problems. Test tool-calling capabilities by providing structured function definitions and observing how the model chains multiple operations to solve multi-step tasks. Compare outputs between reasoning-intensive benchmarks like AIME and practical coding tasks to understand performance variations across domains. Try using the model for long-horizon planning tasks that benefit from its extended context window and reinforcement-learned behaviors. Test deployment on various hardware configurations using the provided SGLang or vLLM implementations to optimize for your infrastructure constraints.

This is a simplified guide to an AI model called GLM-5 maintained by zai-org. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.