AI Coding Agents

GLM 5.2 as a Claude Code Alternative: 1M Context, $10 Plan, Vercel AI Gateway (June 2026)

FrancescJune 17, 202610 min read

GLM 5.2 card next to a paused Claude Code card with a swap arrow between them, illustrating GLM 5.2 as a Claude Code alternative on Vercel AI Gateway in June 2026

Quick Answer

GLM 5.2 is the new flagship coding model from Z.ai (formerly Zhipu AI). It shipped on June 13, 2026 with a 1M-token usable context, MIT-licensed open weights, and a $10 to $80 per month Coding Plan, then landed on the Vercel AI Gateway on June 16, 2026. That arrived four days after Anthropic paused the Claude Agent SDK credits program on June 15, 2026, the same day Claude 4 models were retired. For developers shopping for a cheap, swappable backend for Claude Code, GLM 5.2 is now the most realistic candidate. This post explains what changed, the honest tradeoffs, and how to swap your agent's backend to GLM 5.2 in roughly three lines of configuration.

GLM 5.2 lands on Vercel AI Gateway right after the Agent SDK credits pause

The timing matters because the Claude Code billing story changed twice in one week.

On June 9, 2026, Anthropic announced the Claude Agent SDK credit program: separate monthly credits for Pro, Max, Team and Enterprise plans that would no longer count Agent SDK usage against your subscription's normal limits, starting June 15. On the morning of June 15, the program was paused. The help-center article was rewritten to open with "We're pausing the changes to Claude Agent SDK usage described below. For now, nothing has changed." No new effective date was published. On June 17, 2026, the support article still reads "Updated yesterday" and the program is still paused with no replacement program announced.

Independently, on June 13, Z.ai shipped GLM 5.2 with a 1M-token context, two thinking-effort levels (High and Max), and day-one configuration support for at least eight coding agents including Claude Code, Cline, OpenCode, Roo Code, Crush, Goose, OpenClaw and Kilo Code. Three days later, Vercel published GLM 5.2 on the AI Gateway with the model id zai/glm-5.2, no Vercel markup over provider pricing, and direct AI SDK access via streamText.

For a developer running Claude Code who now has zero clarity on when paid Agent SDK credit caps will resume, GLM 5.2 became a practical answer almost overnight. The same agent UI you already use can route to a 1M-context, MIT-licensed model for a flat $10 per month at the entry tier. That is the news.

GLM 5.2 versus Claude Code: what is actually different in June 2026

The honest comparison, with everything we can verify today:

Attribute	Claude Code on the Claude plan	GLM 5.2 via GLM Coding Plan	GLM 5.2 via Vercel AI Gateway
Context window	200K (Sonnet 4.5 / Opus 4.6 / Opus 4.8)	1M tokens, `glm-5.2[1m]` endpoint	1M tokens
Output cap	Typically 64K	Up to 131,072 tokens	Up to 131,072 tokens
License	Proprietary	MIT, open weights (rolling)	Routed model, MIT upstream
Entry price	Claude Pro $20/mo, Agent SDK credits paused	GLM Coding Lite ~$10/mo, ~400 prompts/wk	Pay-per-token, no Vercel markup
Mid tier	Claude Max 5x $100/mo, Agent SDK credits paused	GLM Coding Pro ~$30/mo, ~2,000 prompts/wk	Pay-per-token
Top tier	Claude Max 20x $200/mo, Agent SDK credits paused	GLM Coding Max ~$80/mo, ~8,000 prompts/wk	Pay-per-token
Published benchmarks at launch	Anthropic publishes SWE-bench, etc.	None at launch on June 13	None
Day-one agent support	Native (it is Anthropic's own agent)	Claude Code, Cline, OpenCode, Roo Code, Crush, Goose, OpenClaw, Kilo Code	Vercel AI SDK, custom routing
Models available June 17	Sonnet 4.6, Opus 4.6 / 4.8 (Claude 4 retired June 15)	`glm-5.2`, `glm-5.2[1m]`, two thinking modes	`zai/glm-5.2`
Hosting	Anthropic infra (US, AWS Bedrock, Vercel AI Gateway)	Z.ai (China), self-host via MIT weights	Vercel global edge

Two things to take away from that table. First, GLM 5.2's 1M context lets you keep a mid-sized repo in working memory without the chunking, retrieval and summarization scaffolding that Claude Code users build to stay under 200K. Second, GLM 5.2's pricing is decoupled from the Anthropic credit story. You pay $10 to $80 per month flat for the GLM Coding Plan and you are done. The Vercel AI Gateway option is pure pay-per-token with no markup, which is the right choice when usage is bursty.

The cost we are not putting in the table: Z.ai did not publish benchmark scores at launch. GLM 5.1 self-reported around 94.6% of Claude Opus 4.6 on a coding benchmark, never independently confirmed. Real-world 48-hour reports from developers suggest GLM 5.2 is strong on UI and design code, weaker on multi-file architectural reasoning. Treat it as "good enough for a lot of tasks, not yet a one-for-one Opus 4.8 replacement."

How to swap Claude Code's backend to GLM 5.2 in three lines

If you already run Claude Code as your terminal agent, the swap is a three-line change in your settings file. The GLM Coding Plan exposes an OpenAI-compatible endpoint, and Claude Code's environment variable interface accepts that override.

Before you edit anything, sign up for a GLM Coding Plan tier at z.ai/subscribe. The Lite tier at roughly $10 per month is enough to evaluate. Once you have your API key, set three environment variables:

export ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN=""
export ANTHROPIC_MODEL="glm-5.2"

Then run claude exactly as you do today. Every Claude Code prompt now hits GLM 5.2 through the Z.ai endpoint. To flip back to Anthropic, unset those three vars or comment them out. That is the entire migration cost on the agent side.

For Vercel AI Gateway, the equivalent in your AI SDK app is even shorter:

import { streamText } from 'ai';
const result = streamText({ model: 'zai/glm-5.2', prompt });

You pay Vercel at provider rates with no platform fee, and you get the model behind your existing Gateway routing, observability and rate limits. The combination of "flat $10 to $80 per month coding plan for daily use" plus "AI Gateway with pay-per-token for production traffic" is now the cheapest serious agent setup published in June 2026.

How GLM 5.2 fits into the broader vendor risk story

Two recent posts from this radar are directly relevant.

The Claude Agent SDK credits pause shows that a major billing change can be announced, baked into a quarter's budget, and reversed on the day it ships. Anthropic explicitly said "we will share an update before anything takes effect" but did not commit to a date. If your only fallback is "use less Claude Code", a second-source backend like GLM 5.2 is a real form of insurance.

The Claude Fable 5 and Mythos 5 suspension shows that a frontier model can be pulled by a regulator without warning. The Fable 5 incident hit a 24-hour Anthropic follow-up commitment that is now more than 110 hours overdue as of June 17. GLM 5.2's MIT license is meaningful here: even if Z.ai's hosted endpoint became unavailable, the weights are downloadable. Self-hosting a 744B mixture-of-experts model with 40B active is not trivial, but it is possible. That option does not exist for Sonnet, Opus, GPT-5.5, Gemini 3.1, or Grok V8.

The honest tradeoff: GLM 5.2 is a Chinese model, hosted in China by default, and is subject to its own export-control and compliance considerations. If your data residency or procurement policy already excludes Chinese-origin models, the Vercel AI Gateway path does not change that. If your policy allows it, you now have a cheap, swappable, 1M-context backend with an open license. We expect more developers to keep one Anthropic key and one GLM key in their environment, swapping per task, the same way teams kept OpenAI and Anthropic keys side by side through 2024 and 2025.

What Totalum does about model choice

Totalum is the AI app builder for humans and for agents. We use Claude Code, Claude Opus and Claude Sonnet under the hood for the orchestration that turns your prompt into a production Next.js app with auth, payments, database, file storage, AI integrations, deployment and custom domains. We run Code with Claude in Tokyo on a cron through Routines.

What you do not have to worry about, as a Totalum user, is whether the Agent SDK credits pause changes your monthly bill, or whether GLM 5.2 is now a better backend, or whether the next vendor pricing surprise is around the corner. The platform absorbs that risk so you can focus on shipping product. If you are building a SaaS that needs its own embedded builder and wants control over which model serves users, we expose the same agent stack via API and MCP. Talk to us about embedding it.

FAQ

Q: When did GLM 5.2 launch and when did it land on the Vercel AI Gateway?

Z.ai launched GLM 5.2 on June 13, 2026. Vercel added it to the AI Gateway with the model id zai/glm-5.2 on June 16, 2026.

Q: Is GLM 5.2 actually a Claude Code alternative or just a model used by Claude Code?

Both. GLM 5.2 is a foundation model. You can run it inside Claude Code by overriding three environment variables that point Claude Code at Z.ai's OpenAI-compatible endpoint. You can also run it inside Cline, OpenCode, Roo Code, Crush, Goose, OpenClaw and Kilo Code. Or you can call it directly via the Vercel AI Gateway with the AI SDK.

Q: How much does GLM 5.2 cost compared to Claude Code on a Claude plan?

The GLM Coding Plan is flat $10 to $80 per month depending on tier. Lite (~$10) gives roughly 400 prompts per week. Pro (~$30) gives roughly 2,000. Max (~$80) gives roughly 8,000. Claude Pro is $20 per month and Claude Max 5x is $100 per month, but the separate Agent SDK credits announced for those plans are currently paused with no new effective date.

Q: Does GLM 5.2 really have a 1M-token context window?

Yes. The glm-5.2[1m] endpoint advertises 1,000,000 input tokens and up to 131,072 output tokens. That is a 5x jump from GLM 5.1's roughly 200K context. It is wide enough to hold a mid-sized repo in working memory without retrieval scaffolding.

Q: Has Z.ai published benchmark scores for GLM 5.2?

Not at launch. The company explicitly shipped GLM 5.2 without SWE-bench, LiveCodeBench or HumanEval results. Early developer reports describe it as strong on UI and design code, weaker on multi-file architectural reasoning. Treat it as promising and unverified for the next few weeks.

Q: What is the license and can I self-host GLM 5.2?

GLM 5.2 is MIT-licensed with open weights rolling out. The architecture is a 744B mixture-of-experts with 40B active parameters per token. Self-hosting is technically possible but requires serious GPU capacity. For most developers, the Vercel AI Gateway or the Z.ai endpoint is the practical option.

Q: Should I stop using Claude Code and switch to GLM 5.2?

No, not as a binary choice. The pragmatic move in June 2026 is to keep your Claude key and add a GLM key. Use Claude Opus 4.8 or Sonnet 4.6 for the multi-file architectural tasks where Claude still leads. Use GLM 5.2 for the long-context refactors, the daily UI work, and as your insurance against the next pricing or availability surprise from any single vendor.

Q: How does Totalum handle this in practice?

Totalum uses Claude under the hood today because it is the strongest agentic coding stack for our generation pipeline. Our customers do not pick the model, they pick the product they want to build. If you are an agency or a SaaS embedder who needs control over the model serving end users, we expose the builder via API and MCP and we can route to alternative backends on request. Book a call to discuss embedding.

Build with Totalum

If you are a developer, founder, or technical builder who is tired of changing your stack every time a vendor changes its pricing, register at totalum.app and ship a production app today. The platform handles auth, payments, database, file storage, AI integrations, deployment, and custom domains so you can focus on the product, not the next vendor announcement.

If you are an IT agency or a SaaS team that wants to embed an AI builder into your own product or offer it to clients, book a 30-minute call. We expose the full stack via API and MCP, and we can talk through how to route between Claude, GLM, and future backends without forcing your customers to care.

Francesc

Writes for the Totalum blog about AI app building, no-code development, and product engineering.