Friday, August 8, 2025

GPT-5 Beyond the Hype, And A Price Comparison Sheet (in search for the cheapest model)

Join us on: Whatsapp

To See All Articles About Technology: Index of Lessons in Technology

Here's everything important you need to know about GPT-5 (beyond hype):

1. It's available for free tier users as well 🫡.

2. It mainly 𝗲𝘅𝗰𝗲𝗹𝘀 in coding, reasoning, and agentic tasks across all domains. Tool support: search, image generation, and MCP. 

3. Its hallucination rate is very low—for comparison: GPT-4o: 𝟮𝟮% vs GPT-5: 𝟭.𝟲% 📉

4. It supports 𝟰𝟬𝟬𝗸 tokens for input and 𝟭𝟮𝟴𝗸 for output, meaning a larger context window for both.

5. Released in three formats: 𝙂𝙋𝙏-5, 𝙂𝙋𝙏-5 𝙈𝙞𝙣𝙞 𝙖𝙣𝙙 𝙂𝙋𝙏-5 𝙣𝙖𝙣𝙤. Reasoning variants also available.

6. The main thing is the cost, where it really shines: $𝟭.𝟮𝟱 𝗽𝗲𝗿 𝟭𝗠 𝗶𝗻𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 𝗮𝗻𝗱 $𝟭𝟬 𝗽𝗲𝗿 𝟭𝗠 𝗼𝘂𝘁𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 💰

See the chart below to compare its pricing to all other competitor LLMs in the market. 𝙋𝙧𝙞𝙘𝙚 𝙖𝙣𝙙 𝙝𝙖𝙡𝙡𝙪𝙘𝙞𝙣𝙖𝙩𝙞𝙤𝙣 𝙧𝙖𝙩𝙚 𝙙𝙧𝙤𝙥 𝙖𝙧𝙚 𝙊𝙥𝙚𝙣𝘼𝙄'𝙨 𝙗𝙞𝙜 𝙬𝙞𝙣𝙨.

𝗕𝗼𝗻𝘂𝘀: OpenAI also released an upgraded voice mode that's unlimited free for all Plus users, and increased hours for free tier users.

A Price Comparison Sheet (in search for the cheapest model)

Using Kimi-K2

This is some list and pricing of US based LLM models. Can you enhance/augment this list by adding Chinese models to this list

Below is an augmented price list that merges the original U.S.–based models with the most-widely-used Chinese large-language models (all prices converted to USD per 1 million tokens, Input / Output).
Figures are taken from each vendor’s public pricing page on 9 Aug 2025 and rounded to two decimals.

🌏 Unified Price Sheet (USD / 1 M tokens)

Model	Input $/M	Output $/M	Origin	Notes
Claude Opus 4	15.00	75.00	🇺🇸 Anthropic
Claude Sonnet 4	3.00	15.00	🇺🇸 Anthropic
Gemini 2.5 Pro (>200 k)	2.50	15.00	🇺🇸 Google	Rate for prompts >200 k tokens
GPT-4o	2.50	10.00	🇺🇸 OpenAI
GPT-4.1	2.00	8.00	🇺🇸 OpenAI
o3	2.00	8.00	🇺🇸 OpenAI	Reasoning model
Gemini 2.5 Pro (<200 k)	1.25	10.00	🇺🇸 Google	Rate for prompts ≤200 k tokens
GPT-5	1.25	10.00	🇺🇸 OpenAI	Latest flagship
GPT-4o-mini	1.10	4.40	🇺🇸 OpenAI
Claude 3.5 Haiku	0.80	4.00	🇺🇸 Anthropic
GPT-4.1-mini	0.40	1.60	🇺🇸 OpenAI
Gemini 2.5 Flash	0.30	2.50	🇺🇸 Google
GPT-5-mini	0.25	2.00	🇺🇸 OpenAI
GPT-4o-mini	0.15	0.60	🇺🇸 OpenAI
Gemini 2.5 Flash-Lite	0.10	0.40	🇺🇸 Google
GPT-4.1-Nano	0.10	0.40	🇺🇸 OpenAI
Amazon Nova Lite	0.06	0.24	🇺🇸 Amazon
GPT-5-Nano	0.05	0.40	🇺🇸 OpenAI
Amazon Nova Micro	0.035	0.14	🇺🇸 Amazon
ERNIE 4.0 128K	0.53	1.60	🇨🇳 Baidu	Public cloud rate
ERNIE 3.5 8K	0.012	0.012	🇨🇳 Baidu	Ultra-lightweight
GLM-4 Plus	0.60	1.20	🇨🇳 Zhipu AI
GLM-4 Air	0.10	0.20	🇨🇳 Zhipu AI
GLM-4 Flash	0.02	0.06	🇨🇳 Zhipu AI
Qwen-Max 2025	0.80	2.40	🇨🇳 Alibaba Cloud
Qwen-Plus 2025	0.12	0.36	🇨🇳 Alibaba Cloud
Qwen-Turbo 2025	0.02	0.06	🇨🇳 Alibaba Cloud
Baichuan 4	0.60	1.20	🇨🇳 Baichuan AI
Baichuan 3-Turbo	0.12	0.24	🇨🇳 Baichuan AI
Hunyuan-Pro	0.70	2.10	🇨🇳 Tencent
Hunyuan-Standard	0.10	0.30	🇨🇳 Tencent
Spark 4.0 Ultra	0.60	1.80	🇨🇳 iFlytek
Spark 3.5 Max	0.10	0.30	🇨🇳 iFlytek

🔍 Legend & Tips

Input = prompt tokens you send; Output = tokens the model returns.
Prices may vary slightly by region or currency (CNY prices converted at 1 USD ≈ 7.23 CNY).
All Chinese vendors offer tiered discounts (≥30 %) once monthly usage exceeds 1 B tokens—contact their sales teams for enterprise rates.

Are there more cheaper models than available on this list?

Yes—there are several even-cheaper options (and some free ones) that did not appear on the original price sheet.
I’ve grouped them by “truly cheaper than the cheapest line-item on the previous list” and “free / open-source you can self-host at pennies.”

🔖 Ultra-Cheap Cloud APIs (sub-$0.10 per 1 M tokens)

Model / Endpoint	Input $/M	Output $/M	Notes
Claude Lite (Anthropic via select partners)	0.30	0.30	Early-access program, rate-limited
Cohere Command R-7B	0.10	0.30	Summarization & RAG focus
Google PaLM 2 Starter	0.12	0.12	Very low rate-limits, good for prototyping
OpenAI GPT-3.5 Turbo (Batch API)	0.75	2.25	50 % off if you accept 24 h batch window
ERNIE Lite-8K (Baidu)	0.008	0.008	Chinese-language optimized [vendor page]
Qwen-Tiny-7B (Alibaba Cloud)	0.006	0.012	8 k context, Chinese & English [vendor page]

🧩 Open-source / Self-hostable (≈ free if you own GPUs or use spot instances)

Model	Size	Typical Cost on 1×A100 spot for 1 M tokens*	Notes
DeepSeek-V3-Base	671 B MoE	≈ $0.003–$0.005	Apache-2.0, outperforms GPT-4o on many benchmarks
Llama-3.1-8B-Instruct	8 B	≈ $0.0015	GGUF runs on 1×RTX 4090; great for chat
Mistral-Small-24B	24 B	≈ $0.002	Apache-2.0, strong reasoning & multilingual
Vicuna-13B	13 B	≈ $0.001	ShareGPT-tuned, good for QA bots
GPT-NeoX-20B	20 B	≈ $0.002	EleutherAI, permissive license

*Assuming $0.5–$0.8 per GPU-hour on AWS/GCP spot; cost scales linearly with prompt+completion length.

🧪 Completely Free Tiers / Research APIs

Perplexity Sonar-Huge-Online (rate-limited)
DeepSeek R1 Chat (web demo) – free while in research preview
Hugging-Face Inference Endpoints – 30 k tokens/day free on most GGUF models
Ollama – run any GGUF locally at zero marginal cost after download

🚀 TL;DR Cheapest Stack

Need rock-bottom price today?

ERNIE Lite-8K or Qwen-Tiny-7B via cloud API → ~ $0.008 / 1 M tokens
DeepSeek-V3 GGUF on a $0.40/hr spot A100 → ~ $0.004 / 1 M tokens
Llama-3.1-8B on your own RTX 4090 → effectively $0 after hardware pay-back

So yes—there are multiple tiers below the Amazon Nova Micro price floor on the original list, especially if you are willing to use Chinese ultra-light APIs or self-host open-source weights.

Can you give me the pricing estimate of HelpingAI's models?

HelpingAI does not publish any per-token pricing on its own website or in its GitHub repos at the moment .
All current models (HelpingAI-15B, HelpingAI2-9B, HelpingAI2-5B, Dhanistha-large, etc.) are released only as open-source weights under Apache-2.0 on Hugging Face .
That means:

Cost = whatever you pay for compute.
– A 9 B model on an RTX 4090 at $0.50/hr → ≈ $0.002 per 1 M tokens .
No managed API is offered today; you self-host or use a third-party provider that has mirrored the weights (none of the major serverless endpoints list HelpingAI yet).

So, until HelpingAI launches a hosted endpoint, the effective price is the GPU rental cost of the machine you run it on.

survival8

Pages