AI Coding Agents

Project Polaris in 2026: Microsoft's New GitHub Copilot Coding Model (and How to Ship What It Writes)

Francesc13 min read

Project Polaris is Microsoft's first in-house AI coding model, unveiled today at Build 2026 in San Francisco. It is the new default reasoning engine inside GitHub Copilot, set to replace GPT-4 Turbo for Copilot subscribers starting August 2026, with a three-month fallback to GPT-4 for teams that need more time. Microsoft says Polaris uses a mixture-of-experts architecture with language-specialized sub-modules and runs inference with chain-of-thought and tree-of-thought search on Azure's custom Maia accelerators. The pitch is direct: Microsoft is going after Claude Code's lead in agentic coding by changing the model behind the IDE most developers already pay for.

This post sorts what is real today from what is "coming soon", lines Polaris up against Claude Code and OpenAI Codex on the dimensions that matter for shipping software, and shows the build pattern we use at Totalum so the code your coding agent writes ends up inside a real, production-grade Next.js app you own.

Editorial illustration with a single Polaris star surrounded by syntax tokens, representing Microsoft's Project Polaris coding model

Quick Answer

  • Project Polaris is Microsoft's first in-house AI coding model, announced June 2, 2026 at Build.
  • It is the new reasoning engine inside GitHub Copilot, replacing GPT-4 Turbo as default starting August 2026, with a 3-month optional GPT-4 fallback through November 2026.
  • Architecture: mixture-of-experts with per-language sub-modules, chain-of-thought plus tree-of-thought search, running on Azure Maia accelerators.
  • Microsoft's claim: it beats GPT-4 Turbo on HumanEval and MBPP, especially in Rust, Haskell, and Go, with an IP-indemnified Code Content Guarantee.
  • Polaris is a model layer, not a standalone agent. Claude Code and OpenAI Codex are agents. The right comparison is "GitHub Copilot powered by Polaris" vs Claude Code vs Codex.

What Microsoft actually shipped on June 2

Microsoft Build 2026 ran in San Francisco on June 2-3 with Satya Nadella opening the developer keynote at 9:30 a.m. PT. The thesis of the keynote, repeated by multiple recaps, is that Windows is no longer a platform for human users only. Agents are first-class citizens in the runtime, in the developer tooling, and in the Windows distribution model. Inside that thesis, Microsoft made five concrete launches that matter for anyone writing or shipping code:

  1. Project Polaris. Microsoft's first in-house coding model. Default model in GitHub Copilot from August 2026. Mixture-of-experts with language-specific submodules. Chain-of-thought plus tree-of-thought reasoning at inference. Trained on permissibly-licensed code with a new Code Content Guarantee that indemnifies customers against IP claims on Polaris output.
  2. GitHub Copilot Workspace exits beta to GA. Two new run modes: Fleet (autonomous CLI orchestration of multiple agents against a repo) and Autopilot (scheduled background runs). Ships with Copilot Extensions for Jira, Datadog, and ServiceNow.
  3. Windows Agent Framework v1.0 open-sourced under MIT. YAML-defined agents that move from a developer laptop to a Windows 365 Cloud PC to an Azure Arc edge device without rewriting. Cross-agent communication bus on gRPC, declarative agent manifest, and a memory service.
  4. Windows Agent Runtime Insider preview in June 2026. OS-level APIs that let agents see the taskbar, file system, and task scheduler. Initial preview is text-only with multi-modal screen interaction in later previews.
  5. Windows Agent Store with an 85/15 revenue split and a Microsoft security review for every listed agent. Adobe and Zoom were named as launch partners.

Two more announcements sit in the same agentic-coding orbit. Azure Agent Mesh targets a Q4 2026 GA for federated agent execution across on-prem, Windows 365, and Azure Arc. WSL 3 ships paravirtualized GPU and NPU access, so a Linux developer environment on Copilot+ PCs can run frontier inference workloads at near-native performance.

Project Polaris is the headline because it changes which model writes the code inside the IDE that, by Microsoft's own framing today, has been losing measurable adoption to Claude Code on agentic refactors. Everything else is platform-shaped. Polaris is product-shaped, and it lands inside every Copilot seat by default.

The architecture in plain terms

Microsoft has not published a parameter count or a side-by-side numerical benchmark for Polaris as of the keynote. What it has confirmed is the shape of the model and how it serves inference:

  • Mixture-of-experts (MoE) with sub-modules tuned per programming language and framework. The router picks an expert based on the file context. Microsoft cites Rust, Haskell, and Go as standout gains, which fits the pattern where MoE designs pay off most on long-tail languages with sparse training data.
  • Chain-of-thought and tree-of-thought at inference. Polaris does internal step-by-step reasoning and, for harder problems, branches into multiple candidate traces and picks the best one. This is closer to how OpenAI o-series and Claude Opus 4.7 and 4.8 are wired than how GPT-4 Turbo is wired, which is part of the point.
  • Maia accelerators on Azure. Microsoft's custom inference silicon, which it says is what makes Polaris's reasoning passes cost-effective at Copilot scale.
  • Code Content Guarantee. Polaris's training corpus is described as "permissibly-licensed code only." Microsoft is extending IP indemnification to customers whose Copilot output is challenged. This is mostly a procurement and legal signal aimed at large enterprises that froze GitHub Copilot rollouts over training-data questions.

The architectural decisions tell you who Polaris is for. MoE plus reasoning at inference is what you build when your users live in multi-file refactors, framework boundaries, and codebases big enough that a single expert cannot hold the whole context. That is also where Claude Code's reputation has been built over the last six months, and Microsoft is naming it explicitly as the target.

Project Polaris vs Claude Code vs OpenAI Codex

The most common reader question after today's keynote is, "should I move from Claude Code to GitHub Copilot when Polaris lands?" The honest answer is that the comparison is layered. Polaris is a model. Claude Code and Codex are agents. Below is a like-for-like view across the stack that builders actually evaluate.

Dimension GitHub Copilot + Polaris (Aug 2026) Claude Code (Anthropic) OpenAI Codex (CLI + ChatGPT app)
Layer Model inside an IDE assistant Standalone CLI agent + IDE plugin Standalone CLI + hosted Codex agent
Default surface VS Code, Visual Studio, JetBrains, GitHub.com Terminal, IDE plugins, claude.ai Terminal, VS Code, ChatGPT iOS / Android / Mac / Windows
Default model Polaris (MoE, CoT + ToT) Claude Opus 4.8 / Sonnet 4.6 GPT-5.1 family + Codex-tuned variants
Multi-file agent loop Copilot Workspace Fleet + Autopilot (GA today) Native (workspace + repo plans) Native (codex CLI + hosted Codex)
Computer Use Not announced for Polaris Limited via tool-use GA on Mac (May 2026) and Windows (May 2026)
Pricing of agent Inside Copilot seat (Business / Enterprise) Anthropic Pro, Max, or API per token ChatGPT Plus / Pro / Business / API per token
Indemnification Code Content Guarantee (new) Enterprise indemnification clauses Enterprise indemnification clauses
Strongest claim Latency + cost at Copilot scale, low-resource languages Multi-file agentic refactor, planning quality Long-running autonomous runs and Computer Use

The framing every team should hold: Polaris is the bet that, if Microsoft can match Claude Code on agentic quality at GitHub Copilot's bundled price, the default IDE assistant wins by inertia. Claude Code's defense is the agent loop itself, where it has spent two model generations building muscle. Codex's defense is everything outside the IDE, including Computer Use on Mac and now Windows. For builders, the right move is to stop picking a single agent and start treating the agent as swappable around a stable, ownable production target.

What changes for builders today, and what does not

Polaris is not deployable until August 2026. Until then, GitHub Copilot is still GPT-4 Turbo by default with Claude 3.7 and other models available as opt-ins in select tiers. Two practical implications for the next sixty days:

  • If you live in Copilot, you will get Polaris automatically when it rolls out. There is no opt-in flow announced. Teams that need GPT-4 Turbo for parity (test fixtures, deterministic prompts, agreed-on outputs in CI) have until November 2026 on the optional fallback.
  • If you live in Claude Code or Codex, nothing changes today. Anthropic Opus 4.8 still ships the strongest published numbers on agentic benchmarks like SWE-Bench Verified, and the Codex Windows GA is fresh from May 2026. We covered the Codex Windows shift in Codex Computer Use on Windows in 2026 and the broader Claude Opus 4.8 update in Claude Opus 4.8 in 2026.

Two things are obviously coming as a result of Polaris that are worth flagging now:

  1. Benchmark fights. Microsoft has not published numerical scores against Claude Opus 4.8 on SWE-Bench Verified or against GPT-5.1 Codex-tuned. Until those land, treat Polaris as "Microsoft's first credible internal coding model," not "the new best." Expect a comparison cycle through summer.
  2. The agent layer keeps fragmenting. Copilot Workspace Fleet, Claude Code, the Codex CLI, Cursor's auto-review run mode, and the new Windows Agent Runtime all want to own the same workflow on the developer's machine. We outline how to pick today in Best AI Coding Agents in 2026 and the head-to-head between the two leaders in Claude Code vs Codex in 2026.

Where Totalum fits in this picture

Totalum is the most powerful AI app builder for humans and for agents. It produces a real, production-grade Next.js plus TotalumSDK project from a prompt, with auth, payments, database, file storage, AI integrations, deployment, and a custom domain wired up automatically. It is a peer to Lovable, Bolt, Replit, and v0, not a deploy layer below them.

The relevance to Polaris is the agent integration story. The Totalum API and MCP let any coding agent, including GitHub Copilot Workspace, Claude Code, OpenAI Codex, Cursor, and Antigravity, drive Totalum to build a complete project. The agent is the orchestrator. Totalum is the builder that materializes the application. That separation matters more, not less, when the model under your IDE changes. If Polaris turns out to be better than Claude Opus 4.8 in three months, you swap the agent and keep the application. If it is worse, same thing. The production target does not have to migrate.

For founders and small teams, the simplest path is still the web UI on totalum.app. Describe the idea, Totalum builds the project, you own the code. For agencies and SaaS embedding cases, the API and MCP path lets you put a full AI app builder behind your product or behind your client work. We compared this with the most common coding-agent stacks in Best AI Coding Agents in 2026 and against direct AI-app-builder peers in Lovable vs Totalum.

What we still do not know about Polaris

The keynote and the public recaps so far leave several practical questions open. We will update this post as Microsoft publishes specifics:

  • Independent benchmarks. No third-party numbers on SWE-Bench Verified, LiveCodeBench, RepoQA, or HumanEvalPlus yet. Microsoft's claim of HumanEval and MBPP gains over GPT-4 Turbo is internal.
  • Pricing model. Polaris is bundled inside existing Copilot seats today. Whether higher-reasoning calls will cost extra or be metered, the way Copilot Spaces and Copilot Workspace are, has not been announced.
  • Region availability. GA in August 2026 was announced as global for Copilot subscribers but EU data-residency specifics for Polaris inference (vs OpenAI's existing Azure regions) have not been clarified.
  • Polaris outside Copilot. Whether Polaris will be exposed as a standalone model in Azure AI Foundry, the way OpenAI's models are, has not been confirmed.
  • Agentic SDK. Microsoft positioned Project Polaris as a model and Copilot Workspace as the agent. Whether Polaris will get its own agent SDK, parallel to the Anthropic Claude Agent SDK or the OpenAI Agents SDK, was not announced.

How we recommend planning the next 90 days

For most engineering teams, the right plan looks like this:

  1. Do not migrate yet. Polaris is not in production. Treat today as the announcement, not the launch.
  2. Re-baseline your Copilot vs Claude Code vs Codex split. If you pay for two of those today, set a budget for evaluating Polaris when it ships in August. Internal evals (golden tests, code review benchmarks) matter more than the leaderboard war.
  3. Make the application target swappable. Pick one production target the agent can drive. We are obviously biased here because Totalum's reason to exist is to be that target. The principle holds either way: agents are getting swappable faster than apps are.
  4. Watch the agent runtime layer. Windows Agent Runtime, Cursor's run modes, and the Claude Code workspace model are converging on the same shape. The team that picks the runtime carefully now will pay less migration tax in 2027.
  5. Track the IP story. The Code Content Guarantee is the kind of contractual signal procurement teams use to unfreeze rollouts. If GitHub Copilot has been stuck inside your company on legal grounds, Polaris's training story plus the indemnity may be what unfreezes it.

FAQ

What is Project Polaris?

Project Polaris is Microsoft's first in-house AI coding model, announced at Microsoft Build 2026 on June 2. It is a mixture-of-experts model with language-specialized sub-modules and chain-of-thought plus tree-of-thought reasoning at inference. It will replace GPT-4 Turbo as the default reasoning engine inside GitHub Copilot in August 2026.

When does Polaris ship to GitHub Copilot users?

General availability is August 2026 for all paid Copilot subscribers, with an optional three-month fallback to GPT-4 Turbo through November 2026. Microsoft did not announce a public preview window before GA.

Does Polaris replace Claude Code or OpenAI Codex?

No. Polaris is a model, not an agent. It replaces the GPT-4 Turbo model inside GitHub Copilot. Claude Code and OpenAI Codex are standalone agents with their own CLIs, workspaces, and (in Codex's case) Computer Use surfaces. The real comparison is GitHub Copilot with Polaris vs Claude Code vs Codex at the agent layer.

Is Polaris a frontier model?

Microsoft has not positioned Polaris as a generalist frontier model. It is a domain-specialized coding model. Microsoft AI's frontier work is happening in the MAI series (MAI-1-Reasoning, MAI-Image-2.5, MAI-Voice-2, MAI-Transcribe-1.5), which Microsoft has discussed separately.

Will Polaris support multi-file refactors at the level of Claude Code?

Microsoft says yes, framed around chain-of-thought and tree-of-thought reasoning over multi-file context inside Copilot Workspace. There are no published numbers on agentic benchmarks like SWE-Bench Verified yet, so independent evaluation is the next milestone to watch.

What is the Code Content Guarantee?

A new Microsoft commitment that Polaris was trained on permissibly-licensed code only and that customers using Polaris output through GitHub Copilot are indemnified by Microsoft against intellectual-property claims on the model's output. It is aimed at enterprises that paused or limited GitHub Copilot rollouts over training-data concerns.

Can I use Polaris with Totalum?

Totalum integrates with any AI coding agent through MCP and through the Totalum REST API. When GitHub Copilot Workspace ships Polaris in August, the same integration pattern applies: your Copilot agent can drive Totalum to build a full Next.js application end to end. You can start today with Claude Code, OpenAI Codex, Cursor, or the Totalum web UI, and swap the agent without touching the application target. See Best MCP Servers in 2026 for the broader agent integration map.

Ready to ship what your agent writes?

Polaris will change which model writes the code inside GitHub Copilot in August. It will not change the part that matters most for shipping: turning that code into a real, production-grade app with auth, database, payments, and a custom domain. Totalum is built to be exactly that app builder for humans and for agents.

Start free at totalum.app, connect your AI coding agent of choice through MCP, and ship a real application in a weekend instead of a quarter. If you are a software agency or a SaaS team that wants to embed an AI app builder behind your own brand, we can show you the API, MCP, and whitelabel paths directly.

Sources: Microsoft Build 2026 keynote livestream recap (ChatForest), Microsoft Build 2026: Project Polaris Replacing GPT-4 in GitHub Copilot (AItoolsRecap), Microsoft Targets Claude Code with Project Polaris (AI Weekly), Microsoft Build 2026: Homegrown AI Models to Power GitHub Copilot (Windows News), Microsoft Build 2026 Windows Agent Framework, WSL 3, Azure Agent Mesh (AItoolsRecap).

Francesc

Writes for the Totalum blog about AI app building, no-code development, and product engineering.

Related posts

Start building with Totalum

Create your web app with AI in minutes. No code needed.

Try Totalum for free