
Quick Answer
The GLM 5.2 benchmarks Z.ai withheld at launch on June 13 are now public, and they are the headline reason this model is suddenly the most discussed open-weight option in coding agent stacks. GLM 5.2 scores 62.1 on SWE-bench Pro (vs GPT-5.5 at 58.6 and the previous GLM 5.1 at 58.4), 81.0 on Terminal-Bench 2.1, 76.8 on MCP-Atlas, and 91.2 on GPQA-Diamond. On June 17, 2026, Z.ai also dropped the full MIT-licensed weights for both zai-org/GLM-5.2 and the zai-org/GLM-5.2-FP8 quantized build on HuggingFace, several days earlier than the "the following week" guidance from the original launch announcement. For teams shipping with Claude Code, the Claude Agent SDK, Cursor, or the Vercel AI Gateway, this is the first time a credibly open-weight model has led an Anthropic and OpenAI flagship on a real-world SWE-bench Pro head-to-head.
What changed in the last 24 hours
Z.ai promised three things at the June 13 launch of GLM 5.2 and held back on two of them.
- The model and the GLM Coding Plan at $10 to $80 per month, shipped on June 13.
- The MIT-licensed open weights "the following week", originally expected around June 20.
- Independent benchmarks once the model was through external evaluation.
Both held-back items arrived early. The MIT weights uploaded to HuggingFace on June 17, roughly 22 hours before this article was written, with 24 community quantization variants already published for llama.cpp, LM Studio, Jan, and Ollama. The benchmark scorecard now lives in the model card and on the Z.ai long-horizon blog post, with independent confirmation already filtering through llm-stats.com and a wave of analyst comparison posts.
The timing is the second story. Two business days earlier, on June 15, Anthropic paused the planned Claude Agent SDK credit changes without committing to a new effective date, and the same day Claude 4 retired. Fable 5 and Mythos 5 remain suspended under a US export-control directive, with no restoration timeline. Engineering leaders are actively looking for a fallback backend they control, and GLM 5.2 just became the first one with frontier-grade benchmark proof.
The benchmark scorecard
Z.ai published the following on the GLM 5.2 model card and the long-horizon blog. These are the highest open-weight scores on each benchmark at the time of writing.
| Benchmark | GLM 5.2 | GPT-5.5 | GLM 5.1 |
|---|---|---|---|
| SWE-bench Pro | 62.1 | 58.6 | 58.4 |
| Terminal-Bench 2.1 | 81.0 to 82.7 | not disclosed | 76.4 |
| HLE (no tools) | 40.5 | not disclosed | 36.2 |
| HLE (with tools) | 54.7 | not disclosed | 48.1 |
| AIME 2026 | 99.2 | not disclosed | 96.7 |
| GPQA-Diamond | 91.2 | not disclosed | 87.8 |
| MCP-Atlas | 76.8 | not disclosed | 71.3 |
| Tool-Decathlon | 48.2 | not disclosed | 43.7 |
| DeepSWE | 46.2 | not disclosed | 41.0 |
| NL2Repo | 48.9 | not disclosed | 44.2 |
Sources: the GLM 5.2 HuggingFace model card, the GLM 5.2 long-horizon blog, and aggregated independent runs on llm-stats.com.
A 62.1 on SWE-bench Pro is the first time an MIT-licensed open-weight model has led both an OpenAI and an Anthropic flagship on the agent-style SWE benchmark that buyers actually care about. The previous open-source ceiling for SWE-bench Pro was 58.4 (GLM 5.1). The number that matters most for teams replacing Claude Code is the combination of SWE-bench Pro 62.1 and Terminal-Bench 2.1 81.0, because they cover code reasoning under realistic repository conditions and shell tool use under realistic terminal conditions respectively.
What "MIT weights on HuggingFace" actually unlocks
Three different deployment modes are now usable from day one.
Hosted on the GLM Coding Plan. This is the simplest path. Set OPENAI_BASE_URL=https://api.z.ai/api/coding/paas/v4 and use zai-org/glm-5.2. Costs $10 to $80 per month flat for 1M context.
Hosted on the Vercel AI Gateway at model id zai/glm-5.2. No platform fee over the underlying provider rate, day-one support in the Vercel AI SDK, and instant interop with the eight coding agents (Claude Code, Cline, OpenCode, Roo Code, Crush, Goose, OpenClaw, Kilo Code) that already wire up to the gateway. Covered in detail in our GLM 5.2 as a Claude Code alternative writeup from June 17.
Self-hosted on your own GPU node. This is the new option as of June 17. The 753B-parameter MoE checkpoint is approximately 744 GB on disk in FP8 and 1,488 GB in BF16. vLLM 0.23+ and SGLang 0.5.13+ both ship working serving configs, and SGLang's RadixAttention gives roughly 3x the requests per second of vLLM at the same hardware when you have a large reused system prompt (typical for coding agents). KTransformers and Ollama exist for smaller setups. The 24 published community quantizations bring entry hardware down to 2 to 4 H100 80GB cards with FP4 or INT4 quant builds.
The economics question is simpler than the hype suggests. A multi-GPU node costs the same per hour whether it is busy or idle. Self-hosting beats hosted GLM 5.2 only above roughly 4 to 6 million coding tokens per day per node, depending on quantization. For sub-million-token-per-day usage, the $10 to $80 per month GLM Coding Plan or per-token Vercel AI Gateway pricing wins. For enterprises with a vendor-risk mandate (post the Fable 5 export-control suspension, that means almost every regulated buyer), the self-host path is now a real escrow option, not a hypothetical.
What this means for teams shipping with Claude Code or replacing it
GLM 5.2 is now the only open-weight model that ranks above the Anthropic and OpenAI flagships on SWE-bench Pro. That has three direct consequences for anyone building production coding agents.
First, the "no real open-weight competitor" objection is gone. Until June 17 it was defensible to argue that open-weight models could not catch frontier proprietary models on agentic code benchmarks. The 62.1 versus 58.6 SWE-bench Pro gap closes that argument for SWE-bench Pro at least. Internal evals will vary, but the prior that proprietary is automatically better no longer holds without verification.
Second, vendor-risk discovery calls are about to change. Procurement teams will start asking whether your stack can fall back to a self-hosted open-weight model in the event of another suspension like Fable 5 and Mythos 5. Saying yes used to mean DeepSeek V4 Pro or Kimi K2.7 Code at a noticeable quality gap. As of June 17, saying yes means GLM 5.2 with the same SWE-bench Pro lead as GPT-5.5. The defensive answer just got easier.
Third, the Vercel AI Gateway and the GLM Coding Plan are still the right starting point for most builders. Self-hosting only pays off above real volume. The right pattern for the next 90 days for most teams is to build on the Vercel AI Gateway with zai/glm-5.2 (or your existing Claude or OpenAI backend), keep a self-host runbook ready, and only flip to self-host when token volume and procurement risk both demand it. The same conclusion holds for teams that originally subscribed to the Claude Agent SDK during the planned credits program and are now sitting on subscription quota with no new effective date.
Building with Totalum on top of GLM 5.2
Totalum builds production-grade Next.js + TotalumSDK applications with auth, payments, database, file storage, deployment, custom domains, and an LLM-agnostic agent layer baked in. Because the LLM backend in a Totalum app is configurable, switching between Claude, GPT-5.5, GLM 5.2 hosted, or GLM 5.2 self-hosted is three environment variables, not a rewrite. The day GLM 5.2 weights dropped, Totalum apps stopped needing a "what if our model is suspended" disaster plan and started having one out of the box.
That property mattered during the Fable 5 suspension. It matters more now that the leading open-weight model is also competitive on the actual benchmarks. The same property is what makes Totalum useful for software agencies who want a model-portable foundation under each client app and for SaaS vendors embedding Totalum at runtime via API or MCP.
Comparison: hosted GLM 5.2 vs self-hosted GLM 5.2 vs Claude Code
| Dimension | Claude Code on Claude plan | GLM 5.2 on Vercel AI Gateway | GLM 5.2 self-hosted |
|---|---|---|---|
| Setup time | minutes | minutes (1 env var) | days (GPU procurement) |
| Per-token cost | high (premium provider rate) | medium (no gateway markup, model rate) | low at high volume, high at low volume |
| Context window | 1M (Claude 4.6 / 4.7) | 1M | 1M |
| Vendor-risk exposure | high (single vendor + suspension precedent) | medium (gateway + provider) | low (your own infrastructure) |
| MCP support | first-party | via Vercel AI SDK | via SGLang or vLLM tool runtime |
| Best for | teams already on the Claude plan | most builders right now | enterprises with vendor-risk mandate or 5M+ tokens per day |
Sources
- Z.ai GLM 5.2 model card: huggingface.co/zai-org/GLM-5.2
- GLM 5.2 long-horizon blog: huggingface.co/blog/zai-org/glm-52-blog
- Independent benchmark aggregation: llm-stats.com/models/glm-5.2
- TechTimes coverage of the open-weight drop and data-residency caveats: techtimes.com/articles/318543/20260617/glm-52-open-weights-live-top-coding-benchmark-api-use-carries-china-data-risk.htm
- Vercel AI Gateway availability of zai/glm-5.2: vercel.com/changelog/glm-5-2-now-available-on-ai-gateway
- Anthropic Agent SDK credits pause notice: support.claude.com/en/articles/15036540
- Anthropic statement on Fable 5 and Mythos 5 export-control suspension: anthropic.com/news/fable-mythos-access
FAQ
Q: What is the GLM 5.2 SWE-bench Pro score and why does it matter?
GLM 5.2 scores 62.1 on SWE-bench Pro. The prior open-weight ceiling was 58.4 (GLM 5.1), and GPT-5.5 scores 58.6 on the same benchmark. SWE-bench Pro measures agentic resolution of real-world software engineering issues, so the 62.1 lead is the first credible benchmark showing an MIT-licensed open-weight model ahead of both an Anthropic and an OpenAI flagship on the metric most procurement teams trust for coding agents.
Q: Where are the GLM 5.2 weights hosted?
HuggingFace at zai-org/GLM-5.2 (BF16, full 753B-parameter MoE checkpoint) and zai-org/GLM-5.2-FP8 (FP8 quantized). ModelScope mirrors both. Twenty-four community quantizations (FP4, INT4, GGUF, etc.) are also published for llama.cpp, LM Studio, Jan, and Ollama. All released under the MIT license.
Q: Can GLM 5.2 replace Claude Code or the Claude Agent SDK in production?
Yes for most cases, with caveats. Coding agent compatibility is already live for Claude Code (via community config), Cline, OpenCode, Roo Code, Crush, Goose, OpenClaw, and Kilo Code through the Vercel AI Gateway model id zai/glm-5.2. The benchmark gap on SWE-bench Pro favors GLM 5.2. The caveats are tool integration depth (still maturing on Claude Code), production telemetry (Anthropic still ahead on dashboards), and your own regression eval results.
Q: How much hardware do I need to self-host GLM 5.2?
Approximately 744 GB of weight memory in FP8 and 1,488 GB in BF16, plus KV cache and overhead. A practical FP8 deployment is a single multi-GPU node with 8 H100 80GB (640 GB) or 8 H200 141GB cards. Smaller quantizations (FP4, INT4) can run on 2 to 4 H100 80GB cards with quality tradeoffs. vLLM 0.23+ and SGLang 0.5.13+ both ship working serving configurations.
Q: When does self-hosting GLM 5.2 actually beat the GLM Coding Plan or the Vercel AI Gateway?
Above roughly 4 to 6 million coding tokens per day per node. Below that, the $10 to $80 per month GLM Coding Plan and the per-token Vercel AI Gateway rate win on raw economics, because GPU rental is fixed-cost. Above that, plus any vendor-risk premium your organization assigns to a single foreign API, the self-host path pulls ahead.
Q: Is GLM 5.2 actually MIT licensed?
Yes. The model card and the Z.ai blog both confirm a pure MIT open-source license on both the weights and the inference code samples. There is no separate commercial-use restriction, no field-of-use restriction, and no copyleft clause. This is the most permissive license currently attached to a 753B-parameter coding model.
Q: What is the data-residency risk if I use the GLM Coding Plan API?
Z.ai is a China-based provider, so API traffic is subject to Chinese data residency considerations. The self-hosted MIT-weight path removes that concern, since you control where the model runs and where prompts go. The hosted API path on Z.ai infrastructure does not. The Vercel AI Gateway routing for zai/glm-5.2 is something to confirm directly with Vercel for your specific deployment.
Q: Does Totalum support swapping the underlying coding model?
Yes. Totalum apps are LLM-agnostic at the agent layer, so switching from Claude to GPT-5.5 to GLM 5.2 (hosted or self-hosted) is a configuration change, not a rewrite. The Fable 5 and Mythos 5 suspension and the Claude 4 retirement during June 2026 made portability a procurement requirement rather than a nice-to-have, and Totalum is built that way by default.
Building a production app and want it to be portable across Claude, GPT-5.5, and GLM 5.2 from day one? Register at totalum.app and ship in minutes.
Running a software agency or shipping a SaaS where vendor-risk matters? Book a 20-minute call and we will map your model-portability plan around the new GLM 5.2 weights drop.
Francesc, Co-founder at Totalum | June 18, 2026