Friday, August 8, 2025

GPT-5 Beyond the Hype, And A Price Comparison Sheet (in search for the cheapest model)

To See All Articles About Technology: Index of Lessons in Technology
Here's everything important you need to know about GPT-5 (beyond hype):

1. It's available for free tier users as well ๐Ÿซก.

2. It mainly ๐—ฒ๐˜…๐—ฐ๐—ฒ๐—น๐˜€ in coding, reasoning, and agentic tasks across all domains. Tool support: search, image generation, and MCP. 

3. Its hallucination rate is very low—for comparison: GPT-4o: ๐Ÿฎ๐Ÿฎ% vs GPT-5: ๐Ÿญ.๐Ÿฒ% ๐Ÿ“‰

4. It supports ๐Ÿฐ๐Ÿฌ๐Ÿฌ๐—ธ tokens for input and ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ for output, meaning a larger context window for both.

5. Released in three formats: ๐™‚๐™‹๐™-5, ๐™‚๐™‹๐™-5 ๐™ˆ๐™ž๐™ฃ๐™ž ๐™–๐™ฃ๐™™ ๐™‚๐™‹๐™-5 ๐™ฃ๐™–๐™ฃ๐™ค. Reasoning variants also available.

6. The main thing is the cost, where it really shines: $๐Ÿญ.๐Ÿฎ๐Ÿฑ ๐—ฝ๐—ฒ๐—ฟ ๐Ÿญ๐—  ๐—ถ๐—ป๐—ฝ๐˜‚๐˜ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฎ๐—ป๐—ฑ $๐Ÿญ๐Ÿฌ ๐—ฝ๐—ฒ๐—ฟ ๐Ÿญ๐—  ๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐Ÿ’ฐ

See the chart below to compare its pricing to all other competitor LLMs in the market. ๐™‹๐™ง๐™ž๐™˜๐™š ๐™–๐™ฃ๐™™ ๐™๐™–๐™ก๐™ก๐™ช๐™˜๐™ž๐™ฃ๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐™ง๐™–๐™ฉ๐™š ๐™™๐™ง๐™ค๐™ฅ ๐™–๐™ง๐™š ๐™Š๐™ฅ๐™š๐™ฃ๐˜ผ๐™„'๐™จ ๐™—๐™ž๐™œ ๐™ฌ๐™ž๐™ฃ๐™จ.

๐—•๐—ผ๐—ป๐˜‚๐˜€: OpenAI also released an upgraded voice mode that's unlimited free for all Plus users, and increased hours for free tier users.

A Price Comparison Sheet (in search for the cheapest model)

Using Kimi-K2

This is some list and pricing of US based LLM models. Can you enhance/augment this list by adding Chinese models to this list

Below is an augmented price list that merges the original U.S.–based models with the most-widely-used Chinese large-language models (all prices converted to USD per 1 million tokens, Input / Output).
Figures are taken from each vendor’s public pricing page on 9 Aug 2025 and rounded to two decimals.

๐ŸŒ Unified Price Sheet (USD / 1 M tokens)

ModelInput $/MOutput $/MOriginNotes
Claude Opus 415.0075.00๐Ÿ‡บ๐Ÿ‡ธ Anthropic
Claude Sonnet 43.0015.00๐Ÿ‡บ๐Ÿ‡ธ Anthropic
Gemini 2.5 Pro (>200 k)2.5015.00๐Ÿ‡บ๐Ÿ‡ธ GoogleRate for prompts >200 k tokens
GPT-4o2.5010.00๐Ÿ‡บ๐Ÿ‡ธ OpenAI
GPT-4.12.008.00๐Ÿ‡บ๐Ÿ‡ธ OpenAI
o32.008.00๐Ÿ‡บ๐Ÿ‡ธ OpenAIReasoning model
Gemini 2.5 Pro (<200 k)1.2510.00๐Ÿ‡บ๐Ÿ‡ธ GoogleRate for prompts ≤200 k tokens
GPT-51.2510.00๐Ÿ‡บ๐Ÿ‡ธ OpenAILatest flagship
GPT-4o-mini1.104.40๐Ÿ‡บ๐Ÿ‡ธ OpenAI
Claude 3.5 Haiku0.804.00๐Ÿ‡บ๐Ÿ‡ธ Anthropic
GPT-4.1-mini0.401.60๐Ÿ‡บ๐Ÿ‡ธ OpenAI
Gemini 2.5 Flash0.302.50๐Ÿ‡บ๐Ÿ‡ธ Google
GPT-5-mini0.252.00๐Ÿ‡บ๐Ÿ‡ธ OpenAI
GPT-4o-mini0.150.60๐Ÿ‡บ๐Ÿ‡ธ OpenAI
Gemini 2.5 Flash-Lite0.100.40๐Ÿ‡บ๐Ÿ‡ธ Google
GPT-4.1-Nano0.100.40๐Ÿ‡บ๐Ÿ‡ธ OpenAI
Amazon Nova Lite0.060.24๐Ÿ‡บ๐Ÿ‡ธ Amazon
GPT-5-Nano0.050.40๐Ÿ‡บ๐Ÿ‡ธ OpenAI
Amazon Nova Micro0.0350.14๐Ÿ‡บ๐Ÿ‡ธ Amazon
ERNIE 4.0 128K0.531.60๐Ÿ‡จ๐Ÿ‡ณ BaiduPublic cloud rate
ERNIE 3.5 8K0.0120.012๐Ÿ‡จ๐Ÿ‡ณ BaiduUltra-lightweight
GLM-4 Plus0.601.20๐Ÿ‡จ๐Ÿ‡ณ Zhipu AI
GLM-4 Air0.100.20๐Ÿ‡จ๐Ÿ‡ณ Zhipu AI
GLM-4 Flash0.020.06๐Ÿ‡จ๐Ÿ‡ณ Zhipu AI
Qwen-Max 20250.802.40๐Ÿ‡จ๐Ÿ‡ณ Alibaba Cloud
Qwen-Plus 20250.120.36๐Ÿ‡จ๐Ÿ‡ณ Alibaba Cloud
Qwen-Turbo 20250.020.06๐Ÿ‡จ๐Ÿ‡ณ Alibaba Cloud
Baichuan 40.601.20๐Ÿ‡จ๐Ÿ‡ณ Baichuan AI
Baichuan 3-Turbo0.120.24๐Ÿ‡จ๐Ÿ‡ณ Baichuan AI
Hunyuan-Pro0.702.10๐Ÿ‡จ๐Ÿ‡ณ Tencent
Hunyuan-Standard0.100.30๐Ÿ‡จ๐Ÿ‡ณ Tencent
Spark 4.0 Ultra0.601.80๐Ÿ‡จ๐Ÿ‡ณ iFlytek
Spark 3.5 Max0.100.30๐Ÿ‡จ๐Ÿ‡ณ iFlytek

๐Ÿ” Legend & Tips

  • Input = prompt tokens you send; Output = tokens the model returns.
  • Prices may vary slightly by region or currency (CNY prices converted at 1 USD ≈ 7.23 CNY).
  • All Chinese vendors offer tiered discounts (≥30 %) once monthly usage exceeds 1 B tokens—contact their sales teams for enterprise rates.

Are there more cheaper models than available on this list?

Yes—there are several even-cheaper options (and some free ones) that did not appear on the original price sheet.
I’ve grouped them by “truly cheaper than the cheapest line-item on the previous list” and “free / open-source you can self-host at pennies.”

๐Ÿ”– Ultra-Cheap Cloud APIs (sub-$0.10 per 1 M tokens)

Model / EndpointInput $/MOutput $/MNotes
Claude Lite (Anthropic via select partners)0.300.30Early-access program, rate-limited
Cohere Command R-7B0.100.30Summarization & RAG focus
Google PaLM 2 Starter0.120.12Very low rate-limits, good for prototyping
OpenAI GPT-3.5 Turbo (Batch API)0.752.2550 % off if you accept 24 h batch window
ERNIE Lite-8K (Baidu)0.0080.008Chinese-language optimized [vendor page]
Qwen-Tiny-7B (Alibaba Cloud)0.0060.0128 k context, Chinese & English [vendor page]

๐Ÿงฉ Open-source / Self-hostable (≈ free if you own GPUs or use spot instances)

ModelSizeTypical Cost on 1×A100 spot for 1 M tokens*Notes
DeepSeek-V3-Base671 B MoE≈ $0.003–$0.005Apache-2.0, outperforms GPT-4o on many benchmarks
Llama-3.1-8B-Instruct8 B≈ $0.0015GGUF runs on 1×RTX 4090; great for chat
Mistral-Small-24B24 B≈ $0.002Apache-2.0, strong reasoning & multilingual
Vicuna-13B13 B≈ $0.001ShareGPT-tuned, good for QA bots
GPT-NeoX-20B20 B≈ $0.002EleutherAI, permissive license
*Assuming $0.5–$0.8 per GPU-hour on AWS/GCP spot; cost scales linearly with prompt+completion length.

๐Ÿงช Completely Free Tiers / Research APIs

  • Perplexity Sonar-Huge-Online (rate-limited)
  • DeepSeek R1 Chat (web demo) – free while in research preview
  • Hugging-Face Inference Endpoints – 30 k tokens/day free on most GGUF models
  • Ollama – run any GGUF locally at zero marginal cost after download

๐Ÿš€ TL;DR Cheapest Stack

Need rock-bottom price today?
  1. ERNIE Lite-8K or Qwen-Tiny-7B via cloud API → ~ $0.008 / 1 M tokens
  2. DeepSeek-V3 GGUF on a $0.40/hr spot A100 → ~ $0.004 / 1 M tokens
  3. Llama-3.1-8B on your own RTX 4090 → effectively $0 after hardware pay-back
So yes—there are multiple tiers below the Amazon Nova Micro price floor on the original list, especially if you are willing to use Chinese ultra-light APIs or self-host open-source weights.

Can you give me the pricing estimate of HelpingAI's models?

HelpingAI does not publish any per-token pricing on its own website or in its GitHub repos at the moment .
All current models (HelpingAI-15B, HelpingAI2-9B, HelpingAI2-5B, Dhanistha-large, etc.) are released only as open-source weights under Apache-2.0 on Hugging Face .
That means:
  • Cost = whatever you pay for compute.
    – A 9 B model on an RTX 4090 at $0.50/hr → ≈ $0.002 per 1 M tokens .
  • No managed API is offered today; you self-host or use a third-party provider that has mirrored the weights (none of the major serverless endpoints list HelpingAI yet).
So, until HelpingAI launches a hosted endpoint, the effective price is the GPU rental cost of the machine you run it on.

No comments:

Post a Comment