GLM-5: China’s First Public AI Company Ships a Frontier Model

New model release!

Large Language Models
Author

Maxime Labonne

Published

February 12, 2026

On February 11th, 2026, just days before the Lunar New Year, Z.ai officially released GLM-5, its new frontier large language model.

First, congrats to the Z.ai team on a strong release. GLM-5 is the new #1 open-weight model on Artificial Analysis and hit #1 among open models on LMArena’s Text Arena (score 1452, #11 overall). It scores 77.8% on SWE-bench Verified, 92.7% on AIME 2026, 86.0% on GPQA-Diamond, and leads open-source models on BrowseComp, Vending Bench 2, and MCP-Atlas.

What GLM-5 Actually Is

GLM-5 is a 744B-parameter Mixture-of-Experts model with 40B active parameters per token. That’s roughly a 2x scale-up from GLM-4.5 (355B total, 32B active). Pre-training data went from 23T to 28.5T tokens. The model also switched its attention mechanism to DeepSeek Sparse Attention (DSA) for efficient long-context handling, supporting a 200K-token context window. It’s released under the MIT license on Hugging Face , available via the Z.ai API, and already on OpenRouter.

According to Reuters, GLM-5 was trained entirely on Huawei Ascend chips using the MindSpore framework, with zero dependency on NVIDIA hardware. This is as much a geopolitical statement as it is a technical one. Zhipu has been on the U.S. Entity List since January 2025, which bans access to H100/H200 GPUs. The fact that they can produce a frontier-class model under these constraints tells you something important about the viability of China’s domestic compute stack at scale.

Benchmarks

Among self-reported benchmarks, the SWE-bench number is the headline. At 77.8%, GLM-5 beats Gemini 3 Pro (76.2%) and GPT-5.2 (75.4%) but still trails Claude Opus 4.5 (80.9%). Note that they didn’t compare themselves with the more recent Opus 4.6 and GPT-5.3-Codex.

On their internal CC-Bench-V2 suite, GLM-5 hits a 98% frontend build success rate and 74.8% end-to-end correctness, which represents a 26% improvement over GLM-4.7 on frontend tasks. They’re framing this as the shift from “vibe coding” to “agentic engineering,” and the Vending Bench 2 result (where the model runs a simulated vending machine business over a full year) supports that framing. Long-horizon planning seems significantly improved compared to GLM-4.7.

One interesting data point from Artificial Analysis: GLM-5 achieved a score of -1 on the AA-Omniscience Index, a 35-point improvement over its predecessor. This means GLM-5 leads the industry in “knowing when to say I don’t know” rather than hallucinating. Hallucinations are a very 2023 problem, but it’s still a concrete improvement for production deployment.

The Pony Alpha Saga

The backstory here is too good to skip. On February 6th, OpenRouter quietly launched “Pony Alpha” as a stealth model with no attribution, zero cost, and a 200K context window. It processed over 40 billion tokens on its first day. The community immediately started speculating. Was it DeepSeek V4? Grok 4.2? A Claude variant?

The evidence quickly pointed to Zhipu. The model self-identified as GLM under certain prompts. The output style matched the GLM series. And the timing aligned perfectly with Zhipu’s pre-announced GLM-5 release window around Spring Festival. Some people even caught the zodiac connection: 2026 is the Year of the Horse.

OpenRouter has a history of these stealth drops. Quasar Alpha turned out to be GPT-4.1. Sherlock Alpha was Grok 4.1 Fast. Pony Alpha was GLM-5 getting a live stress test with real users before the official launch. This is a good move since you get genuine usage data and community feedback without the hype cycle distorting everything.

Pricing and Accessibility

The official GLM-5 API is priced at $1.00 per million input tokens and $3.20 per million output tokens. That’s approximately 5x cheaper on input and nearly 8x cheaper on output compared to Claude Opus 4.6 ($5/$25). Despite that, it is still quite pricey compared to previous versions and other Chinese MoEs.

Z.ai also offers a GLM Coding Plan, which is their answer to Anthropic’s Claude Code. They actually hiked the price by 30% this week to capitalize on demand, and their Hong Kong-listed stock surged 34% on the day of release.

With 744B parameters, GLM-5 is mostly an API model. If you want to deploy it, you’ll need at least 8 H200s (or H20s) for FP8 inference. Because of the infrastructure budget it requires, it makes little sense for most teams and companies. I saw an HN discussion with people debating whether you could run this on 2x M4 Ultra Macs with 512GB unified memory each. The answer is “technically yes, but practically painful.” This is an API model for 99% of users.

The Bigger Picture

GLM-5 drops at an inflection point for Zhipu and for the Chinese AI ecosystem more broadly. A few things worth noting:

Zhipu just became the world’s first publicly traded foundation model company. Their Hong Kong IPO on January 8th raised $558 million at a $7.1 billion valuation. Meanwhile, OpenAI and Anthropic are still private. That’s a structural difference in how frontier AI companies are funded and governed.

DeepSeek leads the architecture game. GLM-5 adopted DeepSeek Sparse Attention and was already using their training recipes. This is a strong signal that DeepSeek still has a significant technical lead in terms of model architecture.

Hardware independence is becoming real. Zhipu trained on Huawei Ascend, with inference also running on chips from Moore Threads, Cambricon, and Kunlunxin. This is no longer a research demo, but a production frontier model running on a fully domestic stack.

What’s Missing

I just wanted to note a few issues with GLM-5:

GLM-5 is text-only. No native multimodal support. Kimi K2.5 from Moonshot AI offers multimodal capabilities that GLM-5 lacks. This matters increasingly as the industry moves toward unified architectures.

Early adopters report that while benchmarks are strong, “situational awareness” lags behind Claude. The vibe test results are mixed. It’s a strong executioner but perhaps not as thoughtful a collaborator.

Many people flagged questions about benchmark methodology. It’s a good sign that the community reads leaderboard claims with a more technical lens these days, and some of the numbers in the official docs raised eyebrows. More independent testing is needed.

What’s Next

GLM-5 is the strongest open-weight model released to date for coding and agentic tasks, and it does this on domestic Chinese hardware under U.S. sanctions. Those are two separate statements, and both are significant.

Quick links: