See All Articles on AI
As 2025 comes to a close, the AI world is doing the opposite of slowing down. In just a few weeks, we’ve seen three major model launches from different labs:
-
Mistral 3
-
DeepSeek 3.2
-
Claude Opus 4.5
All three are strong. None are obviously “bad.” That alone is a big shift from just a couple of years ago, when only a handful of labs could credibly claim frontier-level models.
But the interesting story isn’t just that everything is good now.
The real story is this:
AI is entering a phase where differentiation comes from specialization and control over platforms, not just raw model quality.
We can see this in three places:
-
How Mistral, DeepSeek, and Anthropic are carving out different strengths.
-
How “scaling laws” are quietly becoming “experimentation laws.”
-
How Amazon’s move against ChatGPT’s shopping agent signals an emerging platform war around agents.
Let’s unpack each.
1. Mistral vs. DeepSeek vs. Claude: When Everyone Is Good, What Makes You Different?
On paper, the new Mistral and DeepSeek releases look like they’re playing the same game: open models, strong benchmarks, competitive quality.
Under the hood, they’re leaning into very different philosophies.
DeepSeek 3.2: Reasoning and Sparse Attention for Agents
DeepSeek has become synonymous with novel attention mechanisms and high-efficiency large models. The 3.2 release extends that trend with:
-
Sparse attention techniques that help big models run more efficiently.
-
A strong emphasis on reasoning-first performance, especially around:
-
Tool use
-
Multi-step “agentic” workflows
-
Math and code-heavy tasks
-
If you squint, DeepSeek is trying to be “the reasoning lab”:
If your workload is complex multi-step thinking with tools, we want to be your default.
Mistral 3: Simple Transformer, Strong Multimodality, Open Weights
Mistral takes almost the opposite architectural route.
-
No flashy linear attention.
-
No wild new topology.
-
Just a dense, relatively “plain” transformer — tuned very well.
The innovation is in how they’ve packaged the lineup:
-
Multimodal by default across the range, including small models.
-
You can run something like Mistral 3B locally and still get solid vision + text capabilities.
-
That makes small, on-device, multimodal workflows actually practical.
The message from Mistral is:
You don’t need a giant proprietary model to do serious multimodal work. You can self-host it, and it’s Apache 2.0 again, not a bespoke “research-only” license.
Claude Opus 4.5: From Assistant to Digital Worker
Anthropic’s Claude Opus 4.5 sits more on the closed, frontier side of the spectrum. Its differentiation isn’t just capabilities, but how it behaves as a collaborator.
A few emerging themes:
-
Strong focus on software engineering, deep code understanding, and long-context reasoning.
-
A growing sense of “personality continuity”:
-
Users report the model doing natural “callbacks” to earlier parts of the conversation.
-
It feels less like a stateless chat and more like an ongoing working relationship.
-
-
Framed by Anthropic as more of a “digital worker” than a simple assistant:
-
Read the 200-page spec.
-
Propose changes.
-
Keep state across a long chain of tasks.
-
If DeepSeek is leaning into reasoning, and Mistral into open multimodal foundations, Claude is leaning into:
“Give us your workflows and we’ll embed a digital engineer into them.”
The Big Shift: Differentiation by Domain, Not Just Quality
A few years ago, the question was: “Which model is the best overall?”
Now the better question is:
“Best for what?”
-
Best for local multimodal tinkering? Mistral is making a strong case.
-
Best for tool-heavy reasoning and math/code? DeepSeek is aiming at that.
-
Best for enterprise-grade digital teammates? Claude wants that slot.
This is how the “no moat” moment is resolving:
When everyone can make a good general model, you specialize by domain and workflow, not just by raw benchmark scores.
2. Are Scaling Laws Still a Thing? Or Are We Just Scaling Experimentation?
A recent blog post from VC Tomas Tunguz reignited debate about scaling laws. His claim, paraphrased: Gemini 3 shows that the old scaling laws are still working—with enough compute, we still get big capability jumps.
There’s probably some truth there, but the nuance matters.
Scaling Laws, the Myth Version
The “myth” version of scaling laws goes something like:
“Make the model bigger. Feed it more data. Profit.”
If that were the full story, only the labs with the most GPUs (or TPUs) would ever meaningfully advance the frontier. Google, with deep TPU integration, is the clearest example: it has “the most computers that ever computed” and the tightest hardware–software stack.
But that’s not quite what seems to be happening.
What’s Really Scaling: Our Ability to Experiment
With Gemini 3, Google didn’t massively increase parameters relative to Gemini 1.5. The improvements likely came from:
-
Better training methods
-
Smarter data curation and filtering
-
Different mixtures of synthetic vs human data
-
Improved training schedules and hyperparameters
In other words, the action is shifting from:
“Make it bigger” → to → “Train it smarter.”
The catch?
Training smarter still requires a lot of room to experiment. When:
-
One full-scale training run costs millions of dollars, and
-
Takes weeks or months,
…you can’t explore the space of training strategies very fully. There’s a huge hyperparameter and design space we’ve barely touched, simply because it’s too expensive to try things.
That leads to a more realistic interpretation:
Scaling laws are quietly turning into experimentation laws.
The more compute you have, the more experiments you can run on:
architecture
training data
curricula
optimization tricks
…and that’s what gives you better models.
From this angle, Google’s big advantage isn’t just size—it’s iteration speed at massive scale. As hardware gets faster, what really scales is how quickly we can search for better training strategies.
3. Agents vs Platforms: Amazon, ChatGPT, and the New Walled Gardens
While models are getting better, a different battle is playing out at the application layer: agents.
OpenAI’s Shopping Research agent is a clear example of the agent vision:
“Tell the agent what you need. It goes out into the world, compares products, and comes back with recommendations.”
If you think “online shopping,” you think Amazon. But Amazon recently took a decisive step:
It began blocking ChatGPT’s shopping agent from accessing product detail pages, review data, and deals.
Why Would Amazon Block It?
You don’t need a conspiracy theory to answer this. A few obvious reasons:
-
Control over the funnel
Amazon doesn’t want a third-party agent sitting between users and its marketplace. -
Protection of ad and search economics
Product discovery is where Amazon makes a lot of money. -
They’re building their own AI layers
With things like Alexa+ and Rufus, Amazon wants its own assistants to be the way you shop.
In effect, Amazon is saying:
“If you want to shop here, you’ll use our agent, not someone else’s.”
The Deeper Problem: Agents Need an Open Internet, but the Internet Is Not Open
Large-language-model agents rely on a simple assumption:
“They can go out and interact with whatever site or platform is needed on your behalf.”
But the reality is:
-
Cloudflare has started blocking AI agents by default.
-
Amazon is blocking shopping agents.
-
Many platforms are exploring paywalls or tollbooths for automated access.
So before we hit technical limits on what agents can do, we’re hitting business limits on where they’re allowed to go.
It raises an uncomfortable question:
Can we really have a “universal agent” if every major platform wants to be its own closed ecosystem?
Likely Outcome: Agents Become the New Apps
The original dream:
-
One personal agent
-
Talks to every service
-
Does everything for you across the web
The likely reality:
-
You’ll have a personal meta-agent, but it will:
-
Call Amazon’s agent for shopping
-
Call your bank’s agent for finance
-
Call your airline’s agent for travel
-
-
Behind the scenes, this will look less like a single unified agent and more like:
“A multi-agent OS for your life, glued together by your personal orchestrator.”
In other words, we may not be escaping the “app world” so much as rebuilding it with agents instead of apps.
The Big Picture: What Phase Are We Entering?
If you zoom out, these threads are connected:
-
Models are converging on “good enough,” so labs specialize by domain and workflow.
-
Scaling is shifting from “make it bigger” to “let us run more experiments on architectures, data, and training.”
-
Agents are bumping into platform economics and control, not just technical feasibility.
Put together, it suggests we’re entering a new phase:
From the Open Frontier Phase → to the Specialization and Platform Phase.
-
Labs will succeed by owning specific domains and developer workflows.
-
The biggest performance jumps may come from training strategy innovation, not parameter count.
-
Agent ecosystems will reflect platform power struggles as much as technical imagination.
The excitement isn’t going away. But the rules of the game are changing—from who can train the biggest model to who can:
-
Specialize intelligently
-
Experiment fast
-
Control key platforms
-
And still give users something that feels like a single, coherent AI experience.
That’s the next frontier.

No comments:
Post a Comment