This is a simplified guide to an AI model called GLM-4.7 maintained by zai-org. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.
Model overview
GLM-4.7 is a large language model developed by zai-org that excels at coding, reasoning, and tool use. The model brings substantial improvements over its predecessor, achieving 73.8% on SWE-bench (up 5.8%), 66.7% on SWE-bench Multilingual (up 12.9%), and 42.8% on the Humanity's Last Exam benchmark with tools (up 12.4%). This model represents a significant step forward in agentic capabilities, particularly for developers and teams building AI-powered coding systems. Compared to GLM-4.5, GLM-4.7 delivers marked gains in multilingual coding, terminal-based tasks, UI generation quality, and complex reasoning tasks.
Model inputs and outputs
GLM-4.7 accepts text prompts and tool calls, processing requests for coding assistance, reasoning tasks, web browsing, and general conversation. The model supports context windows up to 131,072 tokens by default, enabling it to handle extensive code repositories and complex multi-turn interactions. Output quality varies based on task type—coding tasks benefit from extended reasoning with preserved thinking across turns, while simpler requests can leverage lightweight modes for faster responses.
Inputs
- Text prompts for coding, reasoning, creative writing, and conversation
- Tool calls for web browsing, code execution, and external API integration
- Multi-turn conversation context with automatic thinking preservation for agent tasks
- Structured function calling for integration with external systems
Outputs
- Code generation with support for multiple programming languages and frameworks
- Reasoning explanations with intermediate thinking steps for complex problems
- Web browsing results with improved tool use and context management
- UI components and webpage generation with enhanced visual quality
- Mathematical solutions and step-by-step problem breakdowns
Capabilities
The model excels at core coding tasks with terminal-based interactions, supporting agentic workflows in frameworks like Claude Code and Cline. It produces cleaner, more modern webpages and better-looking slide decks with accurate layout and sizing. Tool use has improved significantly on benchmarks like ^2-Bench and web browsing performance via BrowseComp. The model introduces interleaved thinking, where it reasons before every response and tool call, plus preserved thinking for coding agents that reuses reasoning across multi-turn conversations rather than re-deriving information. Turn-level thinking allows control over reasoning per request, disabling it for lightweight tasks to reduce latency and cost while enabling it for complex problems.
What can I use it for?
Development teams can use GLM-4.7 as a coding partner for software engineering tasks, including repository analysis, bug fixes, and feature implementation. The model works well for agentic automation, where it can browse the web, execute code, and manage complex workflows across multiple turns. Content creators can leverage its improved UI generation for building webpages and presentation slides. For research and mathematics, the model handles complex reasoning tasks and proof generation. Businesses can integrate it into customer service systems via web browsing capabilities and deploy it locally using vLLM or SGLang for private inference. The API is available through the Z.ai API Platform.
Things to try
Enable preserved thinking mode when working on long-horizon coding tasks to see how the model reuses reasoning across turns rather than starting fresh. Compare responses with thinking enabled versus disabled to understand the latency and quality tradeoffs for your specific use case. Try the model on your own codebase to evaluate its repository understanding and refactoring suggestions. Experiment with turn-level thinking control to optimize cost and speed for different request types. Test its web browsing and tool calling capabilities on structured tasks requiring external data retrieval. Deploy locally using Docker with vLLM or SGLang to evaluate inference performance on your hardware, leveraging FP8 quantization for improved throughput.
