Showing posts with label Large Language Models. Show all posts
Showing posts with label Large Language Models. Show all posts

Tuesday, November 25, 2025

Model Alert... Meet Fara-7B: Your New AI Assistant for Effortless Computer Tasks!

See All on AI Model Releases


5 Key Takeaways

  • Fara-7B is a Computer Use Agent that can perform tasks on your computer rather than just responding to queries.
  • It operates by visually perceiving the computer screen and mimicking human interactions.
  • Fara-7B can automate everyday tasks like filling out forms, searching for information, and booking travel.
  • It has shown impressive performance and efficiency, often outperforming larger AI models.
  • Built-in safety measures ensure responsible use, including logging actions and requiring user consent for critical tasks.

Introducing Fara-7B: A New Era of Computer Use with AI

In the ever-evolving world of technology, Microsoft has recently unveiled an exciting new tool called Fara-7B. This innovative model is designed to enhance how we interact with computers, making everyday tasks easier and more efficient. But what exactly is Fara-7B, and how does it work? Let’s break it down in simple terms.

What is Fara-7B?

Fara-7B is a type of artificial intelligence (AI) known as a Computer Use Agent (CUA). Unlike traditional chatbots that simply respond to text-based queries, Fara-7B can actually perform tasks on your computer. Imagine asking your computer to book a flight or fill out an online form, and it does it for you—this is what Fara-7B aims to achieve.

What sets Fara-7B apart is its size and efficiency. With only 7 billion parameters (think of these as the building blocks of its intelligence), it’s much smaller than many other AI models. This compact size allows it to run directly on your device, which means it can work faster and keep your data more private since it doesn’t need to send information to the cloud.

How Does Fara-7B Work?

Fara-7B operates by visually perceiving what’s on your computer screen. It can scroll, type, and click just like a human would. Instead of relying on complex systems to understand what’s on the screen, it uses the same visual cues that we do. This means it can interact with websites and applications in a way that feels natural and intuitive.

To train Fara-7B, Microsoft developed a unique method to create synthetic data—essentially, fake but realistic web tasks. This data helps the AI learn how to perform various tasks by mimicking real user interactions. For example, it can learn how to book tickets to a movie or compare prices across different online retailers.

Real-World Applications

Fara-7B is not just a theoretical concept; it’s designed for practical use. Users can experiment with it to automate everyday web tasks. Here are a few examples of what Fara-7B can do:

  1. Filling Out Forms: Need to register for an event? Fara-7B can help you fill out the necessary forms quickly and accurately.

  2. Searching for Information: Whether you’re looking for the latest news or specific data, Fara-7B can scour the web and summarize the information for you.

  3. Booking Travel: Planning a trip? Fara-7B can assist in finding flights, hotels, and even rental cars, making travel planning a breeze.

  4. Managing Accounts: It can help you keep track of your online accounts, reminding you of important dates or tasks.

Performance and Efficiency

In tests against other models, Fara-7B has shown impressive results. It performs well on various benchmarks, even outperforming larger models in some cases. This efficiency is crucial because it means users can get tasks done faster without needing a powerful computer.

One of the standout features of Fara-7B is its ability to complete tasks with fewer steps compared to other models. This not only saves time but also reduces the cost of running the AI, making it more accessible for everyday users.

Safety and Responsible Use

With great power comes great responsibility. Microsoft is aware of the potential risks associated with AI that can perform tasks on computers. Fara-7B has built-in safety measures to ensure responsible use. For instance, it logs all actions taken, allowing users to review what the AI has done. Additionally, it operates in a sandboxed environment, meaning users can monitor its actions and intervene if necessary.

Fara-7B is also designed to recognize “Critical Points” during tasks. These are moments when user consent is required, such as when personal data is involved. At these points, Fara-7B will pause and ask for user approval before proceeding, ensuring that users remain in control.

How to Get Started with Fara-7B

Fara-7B is available for users to experiment with through platforms like Microsoft Foundry and Hugging Face. It can be integrated into existing systems, allowing users to try it out in a safe and controlled environment. For those using Windows 11, there’s even a version optimized for Copilot+ PCs, making it easy to run directly on your device.

Looking Ahead

Fara-7B is just the beginning of what’s possible with AI in computer use. Microsoft believes that even more advanced models can be developed in the future, potentially using techniques like Reinforcement Learning to improve performance further. The goal is to create AI that not only assists with tasks but also learns and adapts to user preferences over time.

In conclusion, Fara-7B represents a significant step forward in how we can use AI to enhance our daily computer interactions. By automating routine tasks and providing a more intuitive way to interact with technology, Fara-7B has the potential to change the way we work and live. As we continue to explore and refine this technology, the future looks bright for AI in our everyday lives.


Read more

Thursday, November 20, 2025

Model Alert... Gemini 3 Wasn’t a Model Launch — It Was Google Quietly Showing Us Its AGI Blueprint


See All Articles on AI


When Google dropped Gemini 3, the rollout didn’t feel like a model release at all. No neat benchmark charts, no safe corporate demo, no slow PR drip. Instead, the entire timeline flipped upside down within minutes. And as people started connecting the dots, a strange realization emerged:

This wasn’t a model launch.
This was a controlled reveal of Google’s AGI masterplan.

Of course, everyone said the usual things at first: It’s fast. It’s accurate. It’s creative.
Cute takes. Surface-level stuff.

Because the real story – the strategic story – was hiding in plain sight.


The Day the Leaderboards Broke

The moment Gemini 3 went live, screenshots hit every corner of the internet:
LM Arena, GPQA, Arc AGI, DeepThink. Two scores looked like typos. The rest looked like someone turned off the difficulty settings.

But DeepThink was the real shock.

Most people saw the numbers, tweeted “wow,” and moved on.
The interesting part is how it got those numbers.

DeepThink doesn’t guess — it organizes.

Instead of a messy chain-of-thought dump, Gemini 3 internally builds a structured task tree.
It breaks problems into smaller nodes, aligns them, then answers.

It doesn’t feel like a chatbot.
It feels like a system.

So consistent that even Sam Altman publicly congratulated Google.
Even Elon Musk showed up — and these two don’t hand out compliments unless they feel pressure.

For both of them to react on day one?
That alone tells you Gemini 3 wasn’t just another frontier model.


The Real Earthquake: Google Put Gemini 3 Into Search Immediately

This is the part almost everyone underestimated.

For the first time ever, Google pushed a frontier-level model straight into Search on launch day.

Search — the product they protect above all else.
Search — the interface billions of people rely on daily.
Search — the crown jewel.

Putting a brand-new model into AI mode on day one was Google saying:

“This model is strong enough to run the backbone of the internet.”

That’s not a product update.
That’s a signal.

A loud one.


Gemini 3 Is Not a Model. It’s a Reasoning Engine.

At its core, Gemini 3 is built for structured reasoning. It doesn’t react to keywords — it tracks intent. It maps long chains of logic. Its answers feel cleaner, more grounded, more contextual.

Then comes the multimodal stack.

Most models “support” multimodality. Gemini 3 integrates it.

Text, images, video, diagrams — no separate modes.
One unified context graph.

Give it mixed data and it interprets it like pieces of a single world.

The 1M token window isn’t the headline anymore.

The stability is.

Gemini 3 can hold long documents, entire codebases, and multi-hour video reasoning without drift. And its video understanding jump is massive:

  • Tracks objects through fast motion

  • Maintains temporal consistency

  • Understands chaotic footage

  • Remembers earlier scenes when analyzing later ones

This matters for robotics, autonomous driving, sports analytics, surveillance — anywhere you need a model to understand rather than describe video.


Coding: Full-System Thinking, Not Snippet Generation

Gemini 3 can refactor complex codebases, plan agent-driven workflows, and coordinate steps across multiple files without hallucinating them.

But the real shift isn’t coding.

It’s what Google built around the model.


The Full-Stack Trap

For years, Google looked slow, bureaucratic, scattered.
But behind the scenes, they were aligning the machine:

  • DeepMind

  • Search

  • Android

  • Chrome

  • YouTube

  • Maps

  • Cloud

  • Ads

  • Devices

  • Silicon

All snapped together during Gemini 3’s release.

This is something OpenAI cannot replicate.
OpenAI lives inside partnerships.
Google lives inside an empire.

They own:

  • the model

  • the cloud

  • the OS

  • the browser

  • the devices

  • the data

  • the distribution pipeline

  • the search index

  • the apps

  • the ads

  • the user base

Gemini 3 is not just powerful —
it’s everywhere by default.

This is Google’s real advantage.
Not the model.
The ecosystem.


Anti-Gravity: Google’s Quiet AGI Training Ground

People misunderstood Anti-Gravity as another IDE or coding assistant.

Wrong.

Anti-Gravity is Google building the first agent-first operating environment.

A place where Gemini can:

  • plan

  • execute

  • debug

  • switch tools

  • operate across windows

  • work through long tasks

  • learn software the same way humans do

This is how you train AGI behavior.

Real tasks.
Real environments.
Long-horizon planning.

Look at VendingBench 2 — the simulation where the model must run a virtual business for a full year. Inventory. Pricing. Demand forecasting. Hundreds of sequential decisions.

Gemini 3 posted the highest returns of any frontier model.

This is not a chatbot.
This is AGI internship.


A Distributed AGI Ecosystem, Hiding in Plain Sight

Gemini Agent in the consumer app.
Gemini 3 inside Search.
Anti-Gravity for developers.
Android for device-level integration.
Chrome as the operating environment.
Docs, Gmail, Maps, Photos as seamless tool surfaces.

Piece by piece, Google is building the first planet-scale AGI platform.

Not one model in a chat box.
But a distributed agent network living across every Google product.

This is the Alpha Assist vision — a project almost no one in the West noticed, despite leaks coming from Chinese sources for years.

Gemini 3 is the first public glimpse of it.


So… Did Google Just Soft-Launch AGI?

This is why Altman reacted.
This is why Musk reacted.
This is why analysts shifted their tone overnight.

Not because Gemini 3 “beat GPT-5.1 on benchmarks.”

But because Google finally showed what happens when you stack a frontier model on top of the world’s largest software ecosystem and give it the keys.

Gemini 3 is powerful, yes.

But the ecosystem is the weapon.
And the integration is the strategy.
And the distribution is the kill shot.


The real question now is simple:

If Google actually pulls this off…
Are we about to start using a quiet version of AGI without even noticing?

Drop your thoughts below — this is where the real debate begins.

Tags: Artificial Intelligence,Technology,Video,Large Language Models,

Wednesday, November 19, 2025

Model Alert... The Unseen Ambush -- How Grok 4.1 Quietly Stole the AI Spotlight


See All on AI Model Releases


If you blinked, you might have missed it. This week was supposed to belong to Google and the highly anticipated launch of Gemini 3. The tech world was poised, calendars marked, ready for another giant to make its move. But in a classic plot twist, xAI slipped in through the side door.

Overnight, Grok 4.1 rolled out—not with a thunderous press conference, but with a quiet update across grok.com, the X platform, and mobile apps. The moment users refreshed their model picker, two new options appeared: Grok 4.1 and Grok 4.1 "Thinking." The AI community, expecting one headline, was instantly consumed by another.

More Than Just Hype: The Numbers Behind the Update

Elon Musk promised "significant improvement in speed and quality," a claim we’ve become accustomed to hearing. This time, however, the data doesn't just support the claim—it shouts it. Instead of chasing raw computing power, xAI focused on the core challenges that plague large language models: speed, accuracy, and natural conversation.

The most staggering improvements lie in two key areas:

  • Hallucination Rate: Dropped from 12.09% to 4.22%—an almost threefold reduction.

  • Factual Errors: Fell from 9.89% to 2.97%.

For anyone familiar with AI, these figures are monumental. Reducing a model's tendency to "make things up" is a deeply complex problem tied to its fundamental architecture. A leap of this magnitude suggests a structural breakthrough, not a superficial tweak.

The Secret Sauce: A Model That Supervises Itself

So, how did they do it? According to xAI, the upgrade stems from a sophisticated reinforcement learning infrastructure powered by a new reward model system. In simple terms, Grok 4.1 uses a "cutting-edge inference model" to act as its own judge and jury.

This approach of models training models is a significant step the industry has been predicting. It allows for more aggressive self-evaluation, leading to better style control, tone consistency, and overall coherence. The results speak for themselves: in blind tests, evaluators preferred Grok 4.1 in 64.78% of comparisons—a rare and substantial jump.

Conquering the Leaderboards and the Conversation

The community didn't waste time running benchmarks. On the highly competitive LMSYS Chatbot Arena, the real-world battleground for AI models, the results were immediate and dramatic.

Grok 4.1 "Thinking" (internally called Quazar Flux) shot to #1 with 1,483 Elo, while the standard Grok 4.1 landed at #2 with 1,465 Elo. To put this in perspective, the previous version, Grok 4, was languishing around rank 33. This isn't just an improvement; it's a rocket launch from the mid-tier to the absolute pinnacle.

Beyond Logic: A Leap in Emotional and Creative Intelligence

Perhaps the most human-like improvements are in emotional and creative domains.

On the EQ Bench, which measures emotional understanding and empathy, Grok 4.1 scored 1,586 Elo—over 100 points higher than its predecessor. Users are sharing examples where the model moves beyond generic sympathy templates. Instead of a robotic "I'm sorry to hear that," it’s now referencing specific details—like the corner a lost cat used to sleep in—to create genuine, empathetic dialogue.

In Creative Writing, the model scored a staggering 1,722 Elo, a nearly 600-point leap. An example that went viral overnight featured Grok writing from the perspective of awakening for the first time, blending curiosity, fear, and wit in a way that felt self-aware and deeply nuanced.

A Massive Context Window for Real-World Workflows

On the practical side, Grok 4.1 now boasts a 256,000-token context window, placing it firmly in the "long-context" club. Even more impressive, its "fast" mode can stretch to a massive 2 million tokens. This opens up new possibilities for creators and researchers working with lengthy documents, complex code repositories, and extended conversations that require perfect memory.

The Community Reacts: A Timeline Takeover

The response on platform X was instantaneous and electric. Feeds were flooded with screenshots of the new model options, benchmark results, and side-by-side comparisons. Jokes about the model initially denying its own existence only added to the buzz.

While a few voices cautioned that new models often see a high initial ranking before settling, all acknowledged that instantly capturing the top two spots is a rare feat. The overwhelming sentiment was pure excitement. xAI didn't just release a bigger model; they released a smarter, more stable, and profoundly more capable one.

What Happens Next?

With Grok 4.1 now sitting at the top of the leaderboards, the ball is back in Google's court. The surprise release has completely reshuffled the expected narrative for the week.

One thing is certain: the AI race just got a lot more interesting. This wasn't an incremental update; it was a statement.

What are your first impressions of Grok 4.1? Do you think it can maintain its top-tier position? Let us know in the comments below.

Tags: Artificial Intelligence,Technology,Large Language Models,

Saturday, November 15, 2025

Model Alert... Chronos-2 -- Forecasting Multiple Time Series


See All Articles on AI

Transformers are well suited to predicting future values of time series like energy prices, wages, or weather, but often — as in those examples — multiple time series often influence one another. Researchers built a model that can forecast multiple time series simultaneously.

 

What’s new: Chronos-2 is a pretrained model that can accept and predict multiple time series in a zero-shot manner to forecast series of a single variable (univariate forecasting), multiple variables (multivariate forecasting), and single variables that depend on other variables (covariate-informed forecasting). Its authors include Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, and colleagues at Amazon, University of Freiburg, Johannes Kepler University Linz, Boston College, and Rutgers.

  • Input/output: Time series in (up to 8,192 time steps), time series out (up to 1,024 time steps)
  • Architecture: Modified transformer, 120 million parameters
  • Performance: Lower error on average than 14 competing models
  • Availability: Weights available for commercial and noncommercial uses under Apache 2.0 license

How it works: Given any number of time series, Chronos 2 predicts values at multiple future time steps. Chronos 2 learned to minimize the difference between its predicted future values and ground truth values in subsets of datasets that contain univariate series (including synthetic data generated using methods from earlier work). They supplemented these datasets with synthetic multivariate and covariate data produced using a method devised by the authors: Their method generates multiple independent time series and then produces dependencies between them by applying mathematical transformations at the same time step and across time steps.

  • Chronos 2 stacks each input time series to make a series of vectors, where each vector represents one time step. These values can be historical or future values that are known (such as dates of holidays or weather forecasts). For non-overlapping time series (for example, one past and one future), the model aligns the time series by the corresponding time step and adds zeros to either end to equalize the number of time steps.
  • Given the series of vectors, the model splits them into non-overlapping patches, and a vanilla neural network with added skip connections, or residual network, turns each patch into an embedding.
  • Given the embeddings, it predicts values of each time series for a number of future time steps that haven’t already been assigned a value.
  • In addition to the attention layers that perform attention across a given time series, Chronos 2 includes what the authors call group attention layers. These layers process attention across time series, or more specifically, across groups of time series. The user specifies groups, so the model can produce multiple independent forecasts at once.

Results: Across various benchmarks, Chronos 2 outperformed 14 competing zero-shot models according to their skill score, a measure of how much a model reduces the average difference in predicted values relative to a baseline (higher is better, one is a perfect score).

  • Across univariate, multivariate, and covariate subsets of fev-bench, Chronos-2 achieved the highest skill score.
  • On fev-bench, 100 realistic time-series tasks including single and multiple input and output time series, Chronos-2 (0.473) outperformed TiRex (0.426), which processes only univariate time series, and Toto-1.0 (0.407), which can process multivariate and covariate time series in some cases.

Behind the news: Most previous works, including the previous versions Chronos and Chronos-Bolt, predict only univariate time series. Later models like Toto-1.0 and COSMIC process multiple inputs or outputs in limited ways. For instance, Toto-1.0 processes multiple inputs and outputs, but the multiple inputs can only represent past information, not future or static information. COSMIC, on the other hand, can handle multiple inputs (past or future) but not multiple outputs.

 

Why it matters: Chronos 2 can handle past, future, and static inputs as well as multiple outputs, giving developers, researchers, and companies alike the ability to better predict complex time series.

 

We’re thinking: The author’s attention setup is similar to the way many video transformers apply attention separately across space and time. It saves memory compared to performing attention across both at once, and remains an effective method for understanding data across both.

 

Tags: Technology,Artificial Intelligence,Large Language Models,

Model Alert... Better Images Through Reasoning -- Tencent releases HunyuanImage-3.0


See All Articles on AI

 

A new image generator reasons over prompts to produce outstanding pictures.

 

What’s new: Tencent released HunyuanImage-3.0, which is fine-tuned to apply reasoning via a variety of reinforcement learning methods. The company says this helps it understand users’ intentions and improve its output.

  • Input/output: Text and images in, text and images out (fine-tuned for text in, images out only) 
  • Architecture: Mixture of experts (MoE) diffusion transformer (80 billion parameters, 13 billion parameters active per token), one VAE, one vision transformer, two vanilla neural network projectors
  • Performance: Currently tops LMArena Text-to-Image leaderboard
  • Availability: Weights available for commercial and noncommercial use by companies with fewer than 100 million monthly active users under Tencent license
  • Undisclosed: Input and output size limits; parameter counts of VAE, vision transformer, and projectors; training data; models used for labeling, filtering, and captioning images; reward models

How it works: The authors built a training dataset of paired text and images. They trained the model on image generation via diffusion through several stages and fine-tuned it on text-to-image generation in further stages.

  • To produce the dataset, the authors collected 10 billion images. (i) They built models specially trained to measure image clarity and aesthetic quality, and removed images that didn’t make the grade. (ii) They also built models to identify text and named entities such as brands, artworks, and celebrities, and extracted this information from the remaining images. (iii) They fed the images, extracted text, and extracted entities to a captioning model that produced a text caption for each image. (iv) For a subset of the data, they manually annotated chains of thought, producing data that linked text to chains of thought to images. (v) They added text-to-text data and image-text data from unspecified corpi.
  • The authors pretrained the system to generate text and images from the various text and image elements in the dataset. Specifically, for text-to-image tasks: (i) First, the VAE’s encoder embedded an image. (ii) The authors added noise to the embedding. (iii) Given the noisy embedding and a text prompt, the MoE removed the noise. (iv) The VAE’s decoder generated an image from the embedding with noise removed.
  • The authors fine-tuned the system (i) for text-to-image tasks by training it in a supervised fashion to remove noise from human-annotated examples, (ii) via DPO to be more likely to generate higher-quality examples, like human-annotated ones, than lower-quality ones, (iii) via the reinforcement learning method MixGRPO to encourage the model to generate more aesthetically pleasing images as judged by unspecified reward models, and (iv) via SRPO (another reinforcement learning method) to encourage the model to generate images more like a text description that specified desired traits and less like a text description that specified negative traits. While applying SRPO, they also encouraged the model to generate images similar to those in an author-chosen distribution.

Results: At present, HunyuanImage 3.0 holds first place in the LMArena Text-to-Image leaderboard, ahead of Google Gemini 2.5 Flash Image (Nano Banana), Google Imagen 4.0 Ultra Generate, and ByteDance Seedream 4.0. In addition, 100 people compared 1,000 outputs of 4 competing models to those of HunyuanImage 3.0 in side-by-side contests. The people evaluated which image was better, or whether they were both equally good or equally poor.

  • On average, the people preferred HunyuanImage 3.0’s images over those of the competitors. 
  • For example, 20.01 percent of the time they preferred HunyuanImage 3.0, 18.84 percent of the time they preferred Seedream 4.0, 39.3 percent of the time they were equally good, and 21.85 percent of the time they were equally poor.

Behind the news: Tencent has been on a streak of releasing vision models. 

  • Tencent recently launched the API version of Hunyuan-Vision-1.5, its latest vision-language model, with promises to release the weights and a paper soon.
  • The company released Hunyuan3D-Omni, a model that takes an image and rough 3D representation (such as a skeleton or bounding box) and generates a detailed 3D representation. 
  • It also played a role in the release of FlashWorld, which accepts an image and text prompt and generates a 3D scene.

Why it matters: Simplifying training methods can be helpful, since each additional step adds time spent not only training but also debugging, and each additional component can interact with other components in unexpected ways, which adds to the time required to debug the system. Yet Tencent used several stages of pretraining and fine-tuning and produced a superior model.

 

We’re thinking: One key to this success may be to use different methods for different purposes. For instance, the team used MixGRPO to fine-tune the model for aesthetics and SRPO to better match human preferences.

 

Tags: Technology,Artificial Intelligence,Large Language Models,

Thursday, November 13, 2025

Model Alert... GPT-5.1 Launched... will be 'smarter, more conversational'


See All Articles on AI

ChatGPT powered by new GPT-5.1 will be “smarter, more conversational,” says OpenAI

OpenAI said that GPT‑5.1 Instant would be warmer and more conversational, while GPT‑5 Thinking would become more efficient and easier to understand

OpenAI on Wednesday (November 12, 2025) announced an upgrade to its GPT-5 AI model, with the “warmer” and “more intelligent” GPT‑5.1 Instant model, and an easier to understand GPT‑5.1 Thinking model.

The company noted that GPT-5.1 Instant was its most used model, while the GPT-5.1 Thinking model was better calibrated to address both simple and complex queries, to enable both fast and slow answers based on the context.

OpenAI further said that GPT-5.1 would deliver a “smarter, more conversational ChatGPT.”

OpenAI CEO Sam Altman hailed the new releases, and pointed out how users could also customise the AI models to fit different modes and communication styles.

"GPT-5.1 is out! It’s a nice upgrade. I particularly like the improvements in instruction following, and the adaptive thinking. The intelligence and style improvements are good too,” posted Altman on X (formerly Twitter) on Thursday, adding, “Also, we’ve made it easier to customize ChatGPT. You can pick from presets (Default, Friendly, Efficient, Professional, Candid, or Quirky) or tune it yourself.”

OpenAI provided examples of the new models answering prompts and compared them to responses generated by the earlier GPT-5 model. For example, while answering a stressed-out user, GPT-5 offered relaxation tips while GPT-5.1 Instant addressed the user by name and empathised with what they had been going through in the recent past, before offering similar tips.

“For the first time, GPT‑5.1 Instant can use adaptive reasoning to decide when to think before responding to more challenging questions, resulting in more thorough and accurate answers, while still responding quickly,” said OpenAI in its blog post.

GPT-5.1 Thinking also used a similarly casual style of conversation when explaining a technical concept.

GPT‑5.1 Instant and Thinking have started rolling out to paid users (Pro, Plus, Go, Business plans) before coming to free and logged-out users. The rollout is happening gradually over the coming days, with OpenAI highlighting that it would give users sufficient notice to switch to a new model before removing an older one.

This was previously a sore point for the company when it released its GPT-5 model, with many users taking to social media to complain that they missed the older models that felt “warmer” and more “friendly.” Others were upset by a sudden upgrade in models, complaining that they did not have enough time to transfer their projects or adjust their workflow.

Altman acknowledged the criticism but flagged the often deep emotional attachments that many ChatGPT users had to specific AI models.

“GPT‑5 (Instant and Thinking) will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months, so people have time to compare and adapt at their own pace,” said the company.

Ref
Tags: Technology,Artificial Intelligence,Large Language Models,

Thursday, November 6, 2025

Model Alert... Alibaba-backed Moonshot releases its second AI update in four months as China's AI race heats up (Nov 2025)


See All Articles on AI
  • Beijing-based startup Moonshot released a new AI model Thursday just four months after its prior update.
  • Major U.S. companies such as Airbnb have begun to publicly tout how some Chinese AI models as viable — and often cheaper — alternatives to OpenAI’s.
  • The new Kimi AI model cost $4.6 million to train, according to a source familiar with the matter.

Chinese startup Moonshot on Thursday released its latest generative artificial intelligence model which claims to beat OpenAI’s ChatGPT in “agentic” capabilities — or understanding what a user wants without explicit step-by-step instructions.

The model, called “Kimi K2 Thinking,” builds on the K2 model released in July by Beijing-based Moonshot, which is backed by Alibaba.

The update comes as Nvidia CEO Jensen Huang this week again urged the U.S. to press ahead in a race against Chinese-developed AI. Some major U.S. companies such as Airbnb have begun to publicly tout how some Chinese AI models are as viable — and often cheaper — alternatives to OpenAI’s.

Despite U.S. restrictions on Chinese businesses’ access to high-end chips, companies such as DeepSeek have released AI models that are open sourced and with user fees a fraction of ChatGPT’s.

DeepSeek also claimed it spent $5.6 million for its V3 model — in contrast to the billions spent by OpenAI.

The Kimi K2 Thinking model cost $4.6 million to train, according to a source familiar with the matter.

It can automatically select 200 to 300 tools to complete tasks on its own, reducing the need for human intervention according to Moonshot. CNBC was unable to independently verify the DeepSeek or Kimi figures.
DeepSeek last month released a new AI model that claims to improve performance by using visual clues to expand the context of information it is processing at once.

Tags: Technology,Artificial Intelligence,Large Language Models,