Claude Opus 4.5 is Anthropic's most advanced AI model, released in November 2025, excelling in coding, agentic workflows, computer use, and complex reasoning tasks.platform.claude+1
Key Features
Claude Opus 4.5 supports an "effort" parameter to balance response thoroughness and token efficiency, unique among models. It includes enhanced computer use with a zoom action for inspecting screen details and automatic preservation of thinking blocks for multi-turn conversations. Context management automatically summarizes prior interactions to sustain long tasks without hitting limits.thepromptbuddy+3
Performance Strengths
The model leads benchmarks in coding, scoring highest on SWE-bench Multilingual across most languages and outperforming humans on a 2-hour technical exam. It handles autonomous agents that self-improve, orchestrate multiple tools, and maintain focus over 30-minute sessions. Opus 4.5 shows robust safety, resisting prompt injections better than predecessors.youtubeanthropic+1
Pricing and Availability
Priced at $5 input/$25 output per million tokens, it offers a 67% reduction from prior Opus models, available via API and platforms like Notion. This makes flagship capabilities more accessible for enterprises and developers.platform.claude+1youtube
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.
Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
📊 Model Comparison Table
Columns
-
Opus 4.5
-
Sonnet 4.5
-
Opus 4.1
-
Gemini 3 Pro
-
GPT-5.1
Agentic Coding (SWE-Bench Verified)
-
Opus 4.5: 80.9%
-
Sonnet 4.5: 77.2%
-
Opus 4.1: 74.5%
-
Gemini 3 Pro: 76.2%
-
GPT-5.1: 76.3% (Claude-Mix notation visible)
Agentic Terminal Coding (Terminal-Bench 2.0)
-
Opus 4.5: 59.3%
-
Sonnet 4.5: 50.0%
-
Opus 4.1: 46.5%
-
Gemini 3 Pro: 54.2%
-
GPT-5.1: 47.6% or 58.1% (image shows two values: one looks like Claude-Mix)
Agentic Tool Use (12-Bench)
Retail
-
Opus 4.5: 88.9%
-
Sonnet 4.5: 86.2%
-
Opus 4.1: 86.8%
-
Gemini 3 Pro: 85.3%
-
GPT-5.1: —
Telecom
-
Opus 4.5: 98.2%
-
Sonnet 4.5: 98.0%
-
Opus 4.1: 71.5%
-
Gemini 3 Pro: 98.0%
-
GPT-5.1: —
Scaled Tool Use (MCP Atlas)
-
Opus 4.5: 62.3%
-
Sonnet 4.5: 43.8%
-
Opus 4.1: 40.9%
-
Gemini 3 Pro: —
-
GPT-5.1: —
Computer Use (OSWorld)
-
Opus 4.5: 66.3%
-
Sonnet 4.5: 61.4%
-
Opus 4.1: 44.4%
-
Gemini 3 Pro: —
-
GPT-5.1: —
Novel Problem Solving (ARC-AGI-2 Verified)
-
Opus 4.5: 37.6%
-
Sonnet 4.5: 13.6%
-
Opus 4.1: —
-
Gemini 3 Pro: 31.1%
-
GPT-5.1: 17.6%
Graduate-Level Reasoning (GPQA Diamond)
-
Opus 4.5: 87.0%
-
Sonnet 4.5: 83.4%
-
Opus 4.1: 81.0%
-
Gemini 3 Pro: 91.9%
-
GPT-5.1: 88.1%
Visual Reasoning (MMMU – Validation)
-
Opus 4.5: 80.7%
-
Sonnet 4.5: 77.8%
-
Opus 4.1: 77.1%
-
Gemini 3 Pro: —
-
GPT-5.1: 85.4%
Multilingual Q&A (MMMU)
-
Opus 4.5: 90.8%
-
Sonnet 4.5: 89.1%
-
Opus 4.1: 89.5%
-
Gemini 3 Pro: 91.8%
-
GPT-5.1: 91.0%

No comments:
Post a Comment