What Just Shipped

Anthropic released Claude Sonnet 4.6 today, February 17, 2026. It is now the default model for Free and Pro users on claude.ai, replacing Sonnet 4.5 in the lineup. The model is also available immediately through the Anthropic API, Amazon Bedrock, and Google Vertex AI.

The positioning is straightforward: Opus-level performance at Sonnet pricing. That is not marketing hyperbole. The benchmarks back it up in ways that matter for working developers, and the pricing stays exactly where Sonnet 4.5 was. If you are building with Claude Code — or controlling it from your phone with CodeSail — this is a significant upgrade that costs you nothing extra.

Sonnet 4.6 is not a minor point release. It doubles the context window to 1 million tokens, introduces adaptive thinking that decides when to engage extended reasoning on its own, and adds context compaction for long-running sessions. These are the kinds of changes that fundamentally alter how you work with an AI coding assistant.

The Numbers That Matter

Benchmarks only tell part of the story, but when a mid-tier model starts matching the flagship, the numbers are worth paying attention to. Here is where Sonnet 4.6 lands on the evaluations that matter most for developers:

Benchmark Sonnet 4.6 Opus 4.6 GPT-5.2 Sonnet 4.5
SWE-bench 79.6% 80.8% 80.0% ~70%
OSWorld 72.5% ~73% ~58%
ARC-AGI-2 60.4% 68.8% ~42%
User preference vs Sonnet 4.5 70% preferred Baseline
User preference vs Opus 4.5 59% preferred

The SWE-bench score of 79.6% is the headline number. That nearly matches Opus 4.6 at 80.8% and actually beats GPT-5.2's 80.0%. For a model at one-fifth the cost of Opus, this is a remarkable result. SWE-bench tests real-world software engineering tasks — resolving GitHub issues, fixing bugs, implementing features — so this directly translates to better Claude Code performance.

OSWorld at 72.5% measures computer use and agent capabilities, and Sonnet 4.6 is essentially tied with Opus 4.6 here. ARC-AGI-2 at 60.4% still trails Opus 4.6's 68.8%, which is the clearest remaining gap between the two models and reflects the harder abstract reasoning tasks where the flagship still has an edge.

The user preference numbers are equally telling. In head-to-head comparisons, users preferred Sonnet 4.6 over Sonnet 4.5 70% of the time, and over the previous flagship Opus 4.5 59% of the time. Anthropic also notes that Sonnet 4.6 is "significantly less prone to overengineering" — a common complaint with earlier models that would sometimes rewrite entire files when a two-line fix was all that was needed.

What's Actually New

Beyond the raw benchmark improvements, Sonnet 4.6 introduces several architectural changes that directly impact daily development workflows.

1 Million Token Context Window

The context window has been doubled from 500K to 1 million tokens. To put that in perspective, 1 million tokens is roughly 750,000 words, or the equivalent of loading an entire medium-sized codebase into a single conversation. This means Claude Code can now hold significantly more of your project in memory at once, leading to better cross-file understanding, more accurate refactoring across multiple modules, and fewer instances where the model "forgets" context from earlier in the conversation.

For developers working on large monorepos or complex full-stack applications, this is a meaningful improvement. The model can now reason about your backend API, your frontend components, your database schema, and your test suite simultaneously without running out of room.

Adaptive Thinking

Sonnet 4.6 introduces adaptive thinking, which means the model now decides on its own when to engage extended reasoning. Rather than requiring you to explicitly request deep thinking (or having it default to quick responses), the model evaluates each prompt and allocates more computation to harder problems. A simple "fix this typo" gets a fast response. A "refactor this authentication system to support OAuth2 and SAML" gets the full chain-of-thought treatment.

This is a subtle but important change. It means you no longer need to think about whether to toggle extended thinking on or off. The model does the right thing automatically, which makes Claude Code sessions feel more fluid and responsive.

Context Compaction

Long Claude Code sessions have always had a ceiling: eventually the conversation fills the context window and the model starts losing track of earlier decisions. Sonnet 4.6 addresses this with context compaction, an automatic summarization system that condenses older parts of the conversation while preserving the essential decisions and context.

In practice, this means your Claude Code sessions can run significantly longer without degradation. The model maintains awareness of what it has already done, which files it has modified, and what architectural decisions were made earlier — even in sessions that span hundreds of messages.

Improved Computer Use and Agent Planning

Computer use capabilities have been upgraded, with better accuracy in navigating UIs, filling forms, and interacting with desktop applications. Agent planning is also improved, meaning multi-step tasks are broken down and executed more reliably. The model is better at deciding when to use which tools and in what order.

More Polished Design Outputs

Anthropic specifically calls out improvements in design-related outputs. When you ask Claude Code to generate UI components, CSS layouts, or design tokens, the results are more polished and production-ready. This pairs well with the newly announced Claude Code Figma integration that also shipped today.

Pricing: The Real Story

Sonnet 4.6 maintains the same pricing as Sonnet 4.5:

  • Input: $3 per million tokens
  • Output: $15 per million tokens
  • Extended context (200K–1M): $6 / $22.50 per million tokens (input/output)

For comparison, Opus 4.6 costs $15 / $75 per million tokens — five times more on both input and output. That means Sonnet 4.6 delivers roughly 98% of Opus 4.6's SWE-bench performance at 20% of the cost.

The value proposition is stark. Unless you are doing work that specifically requires the last few percentage points of abstract reasoning (ARC-AGI-2 type tasks) or multi-agent orchestration where Opus 4.6's Agent Teams capability is needed, Sonnet 4.6 is the rational default choice. You get near-flagship performance at mid-tier pricing, with the added benefit of a larger context window and adaptive thinking built in.

For Claude Code users specifically, this translates to longer sessions, more accurate code generation, and lower API costs — a combination that rarely shows up in the same release.

What This Means for Claude Code Users

Claude Code benefits directly and immediately from every improvement in Sonnet 4.6. The model is the engine that powers your coding sessions, so a better model means better sessions across the board.

Better Coding Performance

The jump from ~70% to 79.6% on SWE-bench is not just a number. It means fewer failed attempts at implementing features, more accurate bug fixes on the first try, and better understanding of complex codebases. If you have ever had Claude Code generate a solution that was almost right but required manual cleanup, you will notice the difference. Sonnet 4.6 is measurably better at getting things right the first time.

Larger Codebase Understanding

The doubled context window to 1 million tokens means Claude Code can ingest more of your project at once. When you ask it to refactor a module, it can now hold the entire dependency chain in memory. When you ask it to add a feature, it can reference your existing patterns across more files. This is especially impactful for full-stack developers working across frontend, backend, and infrastructure code.

Longer Sessions Without Losing Context

Context compaction is perhaps the most practical improvement for daily Claude Code usage. Sessions that previously hit the context ceiling after 50-100 messages can now continue productively for much longer. The model automatically summarizes older context, so you do not lose awareness of earlier decisions even in marathon coding sessions.

Automatic for CodeSail Users

If you use CodeSail to control Claude Code from your iPhone, you get all of these improvements automatically. There is nothing to update or configure. The next time you open a Claude Code session through CodeSail, it will be powered by Sonnet 4.6 (or whichever model you have selected). Your session monitoring, permission approvals, file browsing, and diff reviews all work exactly the same — just with a smarter model underneath.

The best upgrade is the one that requires zero effort on your part. Sonnet 4.6 makes your existing Claude Code workflow better without changing any of your habits.

Sonnet 4.6 vs Opus 4.6: Which Should You Use?

With Sonnet 4.6 closing so much of the gap, the question of when to use Opus becomes more nuanced. Here is a practical decision framework:

Use Sonnet 4.6 When:

  • Everyday coding and debugging. For writing features, fixing bugs, writing tests, and general development work, Sonnet 4.6 is now good enough that you will rarely notice a difference from Opus.
  • Quick iterations. When you are going back and forth rapidly with Claude Code, the lower cost of Sonnet means you can iterate more freely without watching your API bill.
  • Large codebase navigation. With its 1M token context window and context compaction, Sonnet 4.6 can handle extended sessions exploring and modifying large projects.
  • Cost-sensitive workflows. At one-fifth the price of Opus, Sonnet 4.6 is the clear choice when you are optimizing for value. Most teams should default to Sonnet and only escalate to Opus when needed.

Use Opus 4.6 When:

  • Complex multi-agent workflows. Opus 4.6 supports Agent Teams, which allows multiple Claude instances to collaborate on a task. If your workflow involves orchestrating multiple agents, Opus is required.
  • Research-heavy tasks. The 68.8% ARC-AGI-2 score versus Sonnet's 60.4% shows Opus still has a meaningful edge in abstract reasoning and novel problem-solving. For tasks that require genuine creativity or tackling problems the model has never seen before, Opus is worth the premium.
  • Mission-critical code. When the cost of an error is high — security-sensitive code, financial calculations, infrastructure automation — the extra few percentage points of accuracy can justify the price difference.
  • Maximum benchmark performance. If you need the absolute best results on SWE-bench type tasks (80.8% vs 79.6%), Opus remains the frontier model.

In CodeSail, you benefit either way. Whether your Claude Code session is powered by Sonnet 4.6 or Opus 4.6, your mobile workflow — session monitoring, file browsing, diff reviews, permission management — remains identical. The model choice is a server-side decision that does not affect how you interact with the app.

The Bigger Picture

Sonnet 4.6 does not exist in isolation. It is part of a rapid sequence of releases from Anthropic that has reshaped the Claude ecosystem in a matter of weeks:

  • January 26: Claude Apps and Connectors launched, allowing Claude to integrate with external tools and services.
  • February 5: Opus 4.6 released as the new frontier model, setting new records on SWE-bench and ARC-AGI-2.
  • February 17: Sonnet 4.6 released alongside the Claude Code Figma integration, bringing Opus-level coding performance to the mid-tier price point.

The pattern is clear: Anthropic is shipping fast, and the gap between "frontier" and "affordable" models is shrinking with each release. Sonnet 4.5 was already good. Sonnet 4.6 is now competitive with models that cost five times more. If this trajectory continues, the practical distinction between model tiers may effectively disappear for most development tasks within the next few releases.

For developers, this is unambiguously good news. Better models at lower prices mean more people can build with AI-assisted coding, longer sessions become affordable, and the barrier to entry drops. Whether you are a solo developer running Claude Code locally or a team deploying it across dozens of engineers, Sonnet 4.6 makes the economics work better.

If you have not tried the new Figma integration yet, read our breakdown of how Claude Code can now design directly in Figma. And if you are new to managing Claude Code sessions from your phone, our getting started guide will have you up and running in under five minutes.

Experience Sonnet 4.6 from Your iPhone

Control Claude Code sessions powered by Sonnet 4.6 from anywhere. CodeSail puts the latest AI models in your pocket — with session monitoring, file browsing, diff review, and end-to-end encryption.