Showing posts with label Large Language Models. Show all posts
Showing posts with label Large Language Models. Show all posts

Thursday, September 11, 2025

India’s Own ChatGPT? Meet Sarvam AI and the Dawn of Desi Language Models

See All Articles

5 Key Takeaways

  • Sarvam AI is set to launch India’s first large language model (LLM) by early next year, supported by the IndiaAI Mission.
  • The Indian government is scaling up computing resources to 40,000 GPUs for developers, students, and startups—four times the initial target.
  • Sarvam AI received a record 4,096 NVIDIA H100 SXM GPUs and nearly Rs 99 crore in subsidies, making it the biggest beneficiary so far.
  • The IndiaAI Mission is backed by a Rs 10,000 crore fund, with Rs 111 crore already disbursed in GPU subsidies to support foundational AI models.
  • India is focusing on AI safety with a ‘techno-legal’ approach, establishing the AI Safety Institute to develop tools for detecting defects and biases.

India’s First Homegrown AI Language Model is Coming Soon – Here’s What You Need to Know

Exciting news is brewing in India’s tech world! By early next year, India is set to launch its very own large language model (LLM) – think of it as an Indian version of ChatGPT or Google’s Gemini. And leading the charge is a Bengaluru-based startup called Sarvam AI.

What’s an LLM and Why Does It Matter?

A large language model (LLM) is a type of artificial intelligence that can understand and generate human-like text. It powers chatbots, virtual assistants, and many smart tools we use today. Until now, most LLMs have been developed in the US or China, and they often don’t work as well with Indian languages or local contexts.

That’s about to change. Sarvam AI is building an LLM trained specifically on Indian languages and data, which means it will better understand our culture, languages, and needs.

How is the Government Helping?

The Indian government is backing this effort in a big way through the IndiaAI Mission, a program with a massive ₹10,000 crore fund to boost AI development in the country. One of the biggest challenges in building powerful AI models is having enough computing power. To solve this, the government is making 40,000 high-end graphics processing units (GPUs) available to developers, startups, and students. That’s four times more than what was originally planned!

Sarvam AI has already received a record 4,096 NVIDIA H100 GPUs (these are some of the world’s most powerful AI chips) and nearly ₹99 crore in subsidies to help them build India’s foundational AI model.

Who’s Behind Sarvam AI?

Sarvam AI was founded in July 2023 by Vivek Raghavan and Pratyush Kumar, both experienced in AI and technology. Raghavan, for example, played a key role in building Aadhaar, India’s digital ID system. Their goal is to create AI tools that can be used by Indian businesses and for public good, especially in areas like education, healthcare, and government services.

What’s Next?

With this support, Sarvam AI is expected to roll out India’s first homegrown LLM by early next year. This could be a game-changer for Indian tech, making AI more accessible and relevant for everyone in the country.

The government is also focusing on AI safety, working with top institutes like IIT Jodhpur to make sure these new tools are safe and unbiased.

In short, India is taking big steps to become a leader in AI, and Sarvam AI’s upcoming language model is a major milestone on that journey. Stay tuned – the future of Indian AI is just getting started!


Read more

Monday, September 8, 2025

China’s Trillion-Parameter Gambit

See All Articles





The global AI race just hit another gear. In a single week, China unleashed not one but two trillion-parameter AI models, shaking up the leaderboard and putting pressure on American labs to respond.

Alibaba’s Qwen-3 Max: A Trillion-Parameter Preview

The biggest headline comes from Alibaba’s Qwen team, which unveiled Qwen-3 Max Preview — a model weighing in at over 1 trillion parameters.

For context, many have speculated that OpenAI’s GPT-4o and its successors sit in a similar range, but most labs lately have leaned toward smaller, more efficient models. Qwen going bigger bucks that trend.

Benchmarks show why: on tests like SuperGQA, LiveCodeBench V6, Arena Hard V2, and LiveBench 2024, Qwen-3 Max outperformed rivals including Claude Opus 4, Kimi K2, and DeepSeek v3.1.

That’s no small feat — these are some of the toughest models to beat right now.

Availability and Pricing

Qwen-3 Max is already live:

  • Available via Qwen Chat (Alibaba’s ChatGPT competitor)

  • Accessible through Alibaba Cloud’s API

  • Integrated into OpenRouter and Anyscale Coder (Hugging Face’s coding tool), where it’s now the default model

But unlike some of Qwen’s earlier releases, this one isn’t open source. Access comes via Alibaba Cloud or its partners, with tiered pricing depending on context length:

  • Up to 32k tokens: $0.86 per million input tokens, $3.44 per million output

  • 32k–128k tokens: $1.43 input, $5.73 output

  • Up to 252k tokens: $2.15 input, $8.60 output

Short prompts? Affordable. Heavy, high-context workloads? Pricey.

Context Window and Features

  • Max context: 262,144 tokens

    • Input up to 258,048 tokens

    • Output up to 32,768 tokens (trade-off between input vs. output length)

  • Context caching: keeps long conversations alive without reprocessing

  • Use cases: complex reasoning, coding, JSON/data handling, and creative work

Early testers (including VentureBeat) report that it’s blazing fast — even quicker than ChatGPT in side-by-side trials — while avoiding common “big model” pitfalls like miscounting letters or botching arithmetic.

Moonshot AI: The Kimi Upgrade

While Qwen stole headlines, Moonshot AI, a Beijing startup valued at $3.3 billion, also made waves with an update to its Kimi series.

  • The new release (internally dubbed Kimi K2-0905) doubles the context window from 128k to 256k tokens

  • Focuses on improved coding skills and reduced hallucination

  • Keeps its creative writing strengths that made the first Kimi popular

Moonshot’s first trillion-parameter model, Kimi K2, was open source and climbed the LM Arena leaderboard (tied for 8th overall, 4th in coding). The company remains committed to open-sourcing future models, unlike Alibaba’s more closed approach.

Founder Yang Jullin has been outspoken:

  • Believes millions of tokens are needed for AI to truly solve hard problems

  • Argues that scaling laws are alive and well, with efficiency gains driving faster progress than ever

  • Revealed that K2 is already being used to train K3, their next-generation base model

What It Means for the AI Race

With Alibaba and Moonshot both flexing trillion-parameter models in the same week, it’s clear that China is serious about AI supremacy.

  • Enterprises now have access to longer context windows and more powerful reasoning engines — but they’ll need to weigh costs and risks.

  • Developers are already running into Qwen-3 Max inside tools like Anyscale Coder, often without realizing it.

  • The open-source vs. closed-source divide between Qwen and Moonshot could shape the global AI ecosystem just as much as raw performance.

The bigger question: does this mark the start of China overtaking the US in AI?

For now, what’s certain is that the competition just got fiercer — and trillion-parameter models are no longer the exception, but the new benchmark.

Tags: Technology,Large Language Models,Artificial Intelligence,

Saturday, August 30, 2025

Elon Musk’s xAI Unveils Grok-Code-Fast-1: The Speedy, Affordable AI Coding Assistant Shaking Up Tech

See All Articles


5 Key Takeaways

  • Elon Musk's xAI has launched a new agentic coding model called grok-code-fast-1, marking its entry into autonomous coding tools.
  • The model is described as 'speedy and economical,' designed to perform common coding tasks quickly and cost-effectively.
  • grok-code-fast-1 will be available for free for a limited time, with launch partners including GitHub Copilot and Windsurf.
  • AI companies like OpenAI and Microsoft are increasingly focusing on developing AI-powered coding assistants for users.
  • xAI recently sued Apple and OpenAI, alleging illegal conspiracy to stifle competition in the AI sector.

Elon Musk’s xAI Launches Fast, Affordable AI Coding Assistant: What You Need to Know

The world of artificial intelligence (AI) is moving fast, and now Elon Musk’s AI startup, xAI, is jumping into a hot new area: AI-powered coding assistants. On August 29, 2025, xAI announced the release of its latest tool, called grok-code-fast-1. But what does this mean for everyday people and programmers? Let’s break it down.

What is an AI Coding Assistant?

Imagine you’re writing code for a website or an app. Normally, you’d have to type out every line yourself, look up documentation, and fix errors as you go. AI coding assistants are like super-smart helpers that can write code for you, suggest improvements, and even fix bugs—all automatically. They save time and make coding easier, especially for beginners or busy professionals.

What’s Special About xAI’s New Tool?

xAI’s grok-code-fast-1 is described as “speedy and economical.” In simple terms, it works quickly and doesn’t require a lot of computer power, making it cheaper to use. This is important because many AI tools can be expensive or slow, especially if you don’t have a powerful computer.

For a limited time, xAI is making grok-code-fast-1 available for free to select partners, including big names like GitHub Copilot and Windsurf. This means some users will get to try it out and see how it stacks up against other popular tools.

Why Is This a Big Deal?

AI coding assistants are becoming a major focus for tech companies. Microsoft, for example, has its own tool called GitHub Copilot, and OpenAI (the company behind ChatGPT) has a coding assistant called Codex. In fact, Microsoft’s CEO recently said that up to 30% of the code at Microsoft is now written by AI!

By launching grok-code-fast-1, xAI is joining the race to make coding faster, easier, and more accessible. Their tool aims to handle common coding tasks quickly and at a lower cost, which could be a game-changer for both professional developers and hobbyists.

The Bigger Picture

As more companies compete to build the best AI coding assistants, we can expect these tools to get even smarter and more helpful. Whether you’re a seasoned programmer or just starting out, AI helpers like grok-code-fast-1 could soon become an everyday part of writing code.

In short, Elon Musk’s xAI is making waves in the world of AI coding, and it’s worth keeping an eye on how these tools evolve in the coming months!


Read more

Thursday, August 21, 2025

Deepseek v3.1: The Open Source AI That Just Shook the Industry

To See All Articles on Tech: Index of Lessons in Technology

Every so often, a release drops that completely rewrites the rules of the game. Last week, that moment arrived with Deepseek v3.1.

This model didn’t come with weeks of hype or flashy teasers. It simply appeared on Hugging Face — and within hours, the AI world realized we had just entered a new era.

The Numbers That Made Everyone Stop

  • 685 billion parameters

  • 128,000 token context window

  • 71.6% score on the Aider benchmark (beating Claude Opus 4)

  • 68x cheaper to run than closed competitors

This wasn’t just impressive — it was disruptive. A model that outperformed one of the most advanced closed systems while costing a fraction to run. Developers quickly realized tasks that previously cost $70 could now be executed for around $1. For enterprises or startups running thousands of jobs daily, that’s the kind of shift that changes budgets overnight.

Speed and Scale Together

What really caught people off guard was speed. Traditionally, reasoning-heavy tasks slowed models to a crawl. But v3.1 ripped through complex inputs almost instantly. Its 128k context window means it can process inputs at the scale of novels (up to a tenth of Dream of the Red Chamber, for perspective) without buckling.

The Secret Sauce: A Hybrid Architecture

Deepseek didn’t just scale up; they re-engineered.

  • Older models split reasoning, chatting, and coding into separate “flavors.”

  • v3.1 merges it all into one flagship system.

  • No more fragments, no more compromises.

Community researchers even found hidden tokens inside the model:

  • search begin / search end → enabling real-time web search

  • think / end think → private reasoning before responding

That means v3.1 doesn’t just answer — it can pause, think, and fetch. Exactly the features people had been waiting for.

Benchmark Wars: Open Source Catches Up

  • On SVG Bench (visual & structural reasoning), v3.1 nearly matched GPT-4.1 Mini.

  • On MMLU (broad knowledge), it held its ground against GPT-5.

  • Even on tricky logic (like choosing 9.11 > 9.9), it avoided classic mistakes.

GPT-5 is still ahead on graduate-level reasoning and advanced coding, but the gap has never been this small — and never at this price point.

The Cost Earthquake

As AI researcher Andrew Christiansen put it:

“71.6% on Aider, 1% above Claude Opus 4, and 68 times cheaper.”

Those aren’t abstract numbers. They’re real-world savings. And when developers can literally do the math and see the difference in their workflows, adoption spreads fast.

A Strategic Masterstroke

The timing was no accident. GPT-5 and Claude 4 had just launched with premium pricing and gated APIs. Deepseek dropped v3.1 quietly, free and open source.

This move aligns with China’s 14th Five-Year Plan, which emphasized open AI development as global infrastructure. And it’s working: Hugging Face trending charts were instantly dominated, with v3.1 shooting into the top five within hours.

The Bigger Picture

  • Back in January, Deepseek’s claim of training at just $5.6M already rattled Nvidia’s stock.

  • With v3.1, they’ve proved it wasn’t a fluke.

  • The myth that only giant U.S. labs can build frontier AI is fading.

Sure, the full model is massive (nearly 700 GB). Most won’t run it locally. But with hosted versions already in the works, that barrier is collapsing too.

Enterprises now face a stark question:
Why pay premium rates for closed systems when a free, frontier-level alternative exists?

The End of Artificial Scarcity

For years, what was “artificial” about AI wasn’t the intelligence. It was the scarcity — the closed access, the paywalls, the gated APIs. Deepseek just proved those walls aren’t necessary.

685B parameters. 128k tokens. Frontier performance. Zero paywalls.

This isn’t just another release. It’s a reset. And if this is just the road to v4, the real shock waves haven’t even started yet.


👉 What do you think — is Deepseek v3.1 the Linux moment for AI? Drop your thoughts in the comments.

Tags: Technology,Artificial Intelligence,Large Language Models,

Friday, August 8, 2025

GPT-5 Beyond the Hype, And A Price Comparison Sheet (in search for the cheapest model)

To See All Articles About Technology: Index of Lessons in Technology
Here's everything important you need to know about GPT-5 (beyond hype):

1. It's available for free tier users as well 🫡.

2. It mainly 𝗲𝘅𝗰𝗲𝗹𝘀 in coding, reasoning, and agentic tasks across all domains. Tool support: search, image generation, and MCP. 

3. Its hallucination rate is very low—for comparison: GPT-4o: 𝟮𝟮% vs GPT-5: 𝟭.𝟲% 📉

4. It supports 𝟰𝟬𝟬𝗸 tokens for input and 𝟭𝟮𝟴𝗸 for output, meaning a larger context window for both.

5. Released in three formats: 𝙂𝙋𝙏-5, 𝙂𝙋𝙏-5 𝙈𝙞𝙣𝙞 𝙖𝙣𝙙 𝙂𝙋𝙏-5 𝙣𝙖𝙣𝙤. Reasoning variants also available.

6. The main thing is the cost, where it really shines: $𝟭.𝟮𝟱 𝗽𝗲𝗿 𝟭𝗠 𝗶𝗻𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 𝗮𝗻𝗱 $𝟭𝟬 𝗽𝗲𝗿 𝟭𝗠 𝗼𝘂𝘁𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 💰

See the chart below to compare its pricing to all other competitor LLMs in the market. 𝙋𝙧𝙞𝙘𝙚 𝙖𝙣𝙙 𝙝𝙖𝙡𝙡𝙪𝙘𝙞𝙣𝙖𝙩𝙞𝙤𝙣 𝙧𝙖𝙩𝙚 𝙙𝙧𝙤𝙥 𝙖𝙧𝙚 𝙊𝙥𝙚𝙣𝘼𝙄'𝙨 𝙗𝙞𝙜 𝙬𝙞𝙣𝙨.

𝗕𝗼𝗻𝘂𝘀: OpenAI also released an upgraded voice mode that's unlimited free for all Plus users, and increased hours for free tier users.

A Price Comparison Sheet (in search for the cheapest model)

Using Kimi-K2

This is some list and pricing of US based LLM models. Can you enhance/augment this list by adding Chinese models to this list

Below is an augmented price list that merges the original U.S.–based models with the most-widely-used Chinese large-language models (all prices converted to USD per 1 million tokens, Input / Output).
Figures are taken from each vendor’s public pricing page on 9 Aug 2025 and rounded to two decimals.

🌏 Unified Price Sheet (USD / 1 M tokens)

ModelInput $/MOutput $/MOriginNotes
Claude Opus 415.0075.00🇺🇸 Anthropic
Claude Sonnet 43.0015.00🇺🇸 Anthropic
Gemini 2.5 Pro (>200 k)2.5015.00🇺🇸 GoogleRate for prompts >200 k tokens
GPT-4o2.5010.00🇺🇸 OpenAI
GPT-4.12.008.00🇺🇸 OpenAI
o32.008.00🇺🇸 OpenAIReasoning model
Gemini 2.5 Pro (<200 k)1.2510.00🇺🇸 GoogleRate for prompts ≤200 k tokens
GPT-51.2510.00🇺🇸 OpenAILatest flagship
GPT-4o-mini1.104.40🇺🇸 OpenAI
Claude 3.5 Haiku0.804.00🇺🇸 Anthropic
GPT-4.1-mini0.401.60🇺🇸 OpenAI
Gemini 2.5 Flash0.302.50🇺🇸 Google
GPT-5-mini0.252.00🇺🇸 OpenAI
GPT-4o-mini0.150.60🇺🇸 OpenAI
Gemini 2.5 Flash-Lite0.100.40🇺🇸 Google
GPT-4.1-Nano0.100.40🇺🇸 OpenAI
Amazon Nova Lite0.060.24🇺🇸 Amazon
GPT-5-Nano0.050.40🇺🇸 OpenAI
Amazon Nova Micro0.0350.14🇺🇸 Amazon
ERNIE 4.0 128K0.531.60🇨🇳 BaiduPublic cloud rate
ERNIE 3.5 8K0.0120.012🇨🇳 BaiduUltra-lightweight
GLM-4 Plus0.601.20🇨🇳 Zhipu AI
GLM-4 Air0.100.20🇨🇳 Zhipu AI
GLM-4 Flash0.020.06🇨🇳 Zhipu AI
Qwen-Max 20250.802.40🇨🇳 Alibaba Cloud
Qwen-Plus 20250.120.36🇨🇳 Alibaba Cloud
Qwen-Turbo 20250.020.06🇨🇳 Alibaba Cloud
Baichuan 40.601.20🇨🇳 Baichuan AI
Baichuan 3-Turbo0.120.24🇨🇳 Baichuan AI
Hunyuan-Pro0.702.10🇨🇳 Tencent
Hunyuan-Standard0.100.30🇨🇳 Tencent
Spark 4.0 Ultra0.601.80🇨🇳 iFlytek
Spark 3.5 Max0.100.30🇨🇳 iFlytek

🔍 Legend & Tips

  • Input = prompt tokens you send; Output = tokens the model returns.
  • Prices may vary slightly by region or currency (CNY prices converted at 1 USD ≈ 7.23 CNY).
  • All Chinese vendors offer tiered discounts (≥30 %) once monthly usage exceeds 1 B tokens—contact their sales teams for enterprise rates.

Are there more cheaper models than available on this list?

Yes—there are several even-cheaper options (and some free ones) that did not appear on the original price sheet.
I’ve grouped them by “truly cheaper than the cheapest line-item on the previous list” and “free / open-source you can self-host at pennies.”

🔖 Ultra-Cheap Cloud APIs (sub-$0.10 per 1 M tokens)

Model / EndpointInput $/MOutput $/MNotes
Claude Lite (Anthropic via select partners)0.300.30Early-access program, rate-limited
Cohere Command R-7B0.100.30Summarization & RAG focus
Google PaLM 2 Starter0.120.12Very low rate-limits, good for prototyping
OpenAI GPT-3.5 Turbo (Batch API)0.752.2550 % off if you accept 24 h batch window
ERNIE Lite-8K (Baidu)0.0080.008Chinese-language optimized [vendor page]
Qwen-Tiny-7B (Alibaba Cloud)0.0060.0128 k context, Chinese & English [vendor page]

🧩 Open-source / Self-hostable (≈ free if you own GPUs or use spot instances)

ModelSizeTypical Cost on 1×A100 spot for 1 M tokens*Notes
DeepSeek-V3-Base671 B MoE≈ $0.003–$0.005Apache-2.0, outperforms GPT-4o on many benchmarks
Llama-3.1-8B-Instruct8 B≈ $0.0015GGUF runs on 1×RTX 4090; great for chat
Mistral-Small-24B24 B≈ $0.002Apache-2.0, strong reasoning & multilingual
Vicuna-13B13 B≈ $0.001ShareGPT-tuned, good for QA bots
GPT-NeoX-20B20 B≈ $0.002EleutherAI, permissive license
*Assuming $0.5–$0.8 per GPU-hour on AWS/GCP spot; cost scales linearly with prompt+completion length.

🧪 Completely Free Tiers / Research APIs

  • Perplexity Sonar-Huge-Online (rate-limited)
  • DeepSeek R1 Chat (web demo) – free while in research preview
  • Hugging-Face Inference Endpoints – 30 k tokens/day free on most GGUF models
  • Ollama – run any GGUF locally at zero marginal cost after download

🚀 TL;DR Cheapest Stack

Need rock-bottom price today?
  1. ERNIE Lite-8K or Qwen-Tiny-7B via cloud API → ~ $0.008 / 1 M tokens
  2. DeepSeek-V3 GGUF on a $0.40/hr spot A100 → ~ $0.004 / 1 M tokens
  3. Llama-3.1-8B on your own RTX 4090 → effectively $0 after hardware pay-back
So yes—there are multiple tiers below the Amazon Nova Micro price floor on the original list, especially if you are willing to use Chinese ultra-light APIs or self-host open-source weights.

Can you give me the pricing estimate of HelpingAI's models?

HelpingAI does not publish any per-token pricing on its own website or in its GitHub repos at the moment .
All current models (HelpingAI-15B, HelpingAI2-9B, HelpingAI2-5B, Dhanistha-large, etc.) are released only as open-source weights under Apache-2.0 on Hugging Face .
That means:
  • Cost = whatever you pay for compute.
    – A 9 B model on an RTX 4090 at $0.50/hr → ≈ $0.002 per 1 M tokens .
  • No managed API is offered today; you self-host or use a third-party provider that has mirrored the weights (none of the major serverless endpoints list HelpingAI yet).
So, until HelpingAI launches a hosted endpoint, the effective price is the GPU rental cost of the machine you run it on.

Thursday, August 7, 2025

What's new in GPT-5 (Aug 2025)

To See All Articles on Tech: Index of Lessons in Technology

What’s New in GPT-5 Over GPT-4.5

1. Unified, Smarter, More Conversational

GPT-5 consolidates multiple prior variants into a single, unified model that handles text, images, voice, and even video seamlessly—no need to select between versions like 4, 4o, turbo, etc. Spaculus SoftwareThe VergeEl País.

2. Persistent Memory & Huge Context Window

GPT-5 remembers across sessions—retaining project details, tone, and preferences—making interactions feel more continuous and natural. Its context window has expanded dramatically, reportedly supporting up to 1 million tokens (or ~256 k tokens per some sources) Spaculus SoftwareThe Times of IndiaCinco Días.

3. Improved Reasoning & Task Autonomy

Unlike GPT-4 which sometimes needed explicit “chain-of-thought” prompts, GPT-5 integrates reasoning natively and reliably, delivering structured, multi-step answers by default. Spaculus SoftwareThe VergeThe Washington Post. It can go further—executing tasks like scheduling meetings, drafting emails, updating databases, generating slides, and even coding autonomously within a conversation. Spaculus SoftwareThe Washington PostThe Times of India.

4. Better Accuracy, Less Hallucination, and “PhD-level” Expertise

GPT-5 brings a major upgrade in reasoning, factual accuracy, and creativity. It’s less prone to flattery or misleading answers (“sycophancy”), and better at writing nuanced, human-like responses. The model now resembles a “PhD-level expert” in its dialogue quality. The GuardianThe VergeThe Washington Post.

5. Enhanced Integration & Developer Features

GPT-5 supports deep integrations with apps like Gmail and Google Calendar—so it can help schedule, draft, and manage tasks with context. For developers, it includes native ability to call tools, invoke APIs, and chain actions—all without external plugins. The GuardianThe Washington PostThe Times of India.


GPT-4.5 (and 4.1): A Transition Step

GPT-4.5 offered noticeable improvements over GPT-4—better accuracy, emotional intelligence, multilingual fluency, and reduced hallucinations. However, it lacked the leap in reasoning, memory, and autonomy that mark GPT-5. scalablehuman.comPaperblog.


Evolution Timeline Recap

  • GPT-3.5 → GPT-4: Improved general reasoning, broader context, multimodal input.

  • GPT-4 → 4.1 → 4.5: Incremental refinements in emotion, accuracy, and conversational tone.

  • GPT-5: A transformational leap—unified model, persistent memory, massive context, native reasoning and autonomy, tool/task execution, and expert-level responses.


In Summary

GPT-5 elevates the user experience from “getting answers” to “getting things done.” It’s your project partner, not just your assistant—capable of reasoning, remembering, acting, and conversing like an expert.