Monday, December 8, 2025

Where We Stand on AGI: Latest Developments, Numbers, and Open Questions


See All Articles on AI

Executive summary (one line)

Top models have made rapid, measurable gains (e.g., GPT‑5 reported around 50–70% on several AGI-oriented benchmarks), but persistent, hard-to-solve gaps — especially durable continual learning, robust multimodal world models, and reliable truthfulness — mean credible AGI timelines still range from a few years (for narrow definitions) to several decades (for robust human‑level generality). Numbers below are reported by labs and studies; where results come from internal tests or single groups I flag them as provisional.

Quick snapshot of major recent headlines

  • OpenAI released GPT‑5 (announced Aug 7, 2025) — presented as a notable step up in reasoning, coding and multimodal support (press release and model paper reported improvements).
  • Benchmarks and expert studies place current top models roughly “halfway” to some formal AGI definitions: a ten‑ability AGI framework reported GPT‑4 at 27% and GPT‑5 at 57% toward its chosen AGI threshold (framework authors’ reported scores).
  • Some industry/academic reports and panels (for example, an MIT/Arm deep dive) warn AGI‑like systems might appear as early as 2026; other expert surveys keep median predictions later (many 50%‑probability dates clustered around 2040–2060).
  • Policy and geopolitics matter: RAND (modeling reported Dec 1, 2025) frames the US–China AGI race as a prisoner’s dilemma — incentives favor speed absent stronger international coordination and verification.

Methods and definitions (short)

What “AGI score” means here: the draft uses several benchmarking frameworks that combine multiple task categories (reasoning, planning, perception, memory, tool use). Each framework weights abilities differently and maps aggregate performance to a 0–100% scale relative to an internal "AGI threshold" chosen by its authors. These mappings are normative — not universally agreed — so percentages should be read as framework‑specific progress indicators, not absolute measures of human‑level general intelligence.

Provenance notes: I flag results as (a) published/peer‑reviewed, (b) public benchmark results, or (c) reported/internal tests by labs. Where items are internal or single‑lab reports they are provisional and should be independently verified before being used as firm evidence.

Benchmarks and headline numbers (compact table)

BenchmarkWhat it measuresModel / ScoreHuman baseline / NotesSource type
Ten‑ability AGI framework Aggregate across ~10 cognitive abilities GPT‑4: 27% · GPT‑5: 57% Framework‑specific AGI threshold (authors' mapping) Reported framework scores (authors)
SPACE (visual reasoning subset) Visual reasoning tasks (subset) GPT‑4o: 43.8% · GPT‑5 (Aug 2025): 70.8% Human average: 88.9% Internal/public benchmark reports (reported)
MindCube Spatial / working‑memory tests GPT‑4o: 38.8% · GPT‑5: 59.7% Still below typical human average Benchmark reports (reported)
SimpleQA Hallucination / factual accuracy GPT‑5: hallucinations in >30% of questions (reported) Some other models (e.g., Anthropic Claude variants) report lower hallucination rates Reported / model vendor comparisons
METR endurance test Sustained autonomous task performance GPT‑5.1‑Codex‑Max: ~2 hours 42 minutes · GPT‑4: few minutes Measures autonomous chaining and robustness over time Internal lab test (provisional)
IMO 2025 (DeepMind Gemini, "Deep Think" mode) Formal math problem solving under contest constraints Solved 5 of 6 problems within 4.5 hours (gold‑level performance reported) Shows strong formal reasoning in a constrained task Reported by DeepMind (lab result)

Where models still struggle (the real bottlenecks)

  • Continual learning / long‑term memory: Most models remain effectively "frozen" after training; reliably updating and storing durable knowledge over weeks/months remains unsolved and is widely cited as a high‑uncertainty obstacle.
  • Multimodal perception (vision & world models): Text and math abilities have improved faster than visual induction and physical‑world modeling; visual working memory and physical plausibility judgments still lag humans.
  • Hallucinations and reliable retrieval: High‑confidence errors persist (SimpleQA >30% hallucination reported for GPT‑5 in one test); different model families show substantial variance.
  • Low‑latency tool use & situated action: Language is fast; perception‑action loops and real‑world tool use (robotics) remain harder and slower.

How researchers think we’ll get from here to AGI

Two broad routes dominate discussion:

  1. Scale current methods: Proponents argue more parameters, compute and better data will continue yielding returns. Historical training‑compute growth averaged ~4–5×/year (with earlier bursts up to ~9×/year until mid‑2020).
  2. New architectures / breakthroughs: Others (e.g., prominent ML researchers) argue scaling alone won’t close key gaps and that innovations (robust world models, persistent memory systems, tighter robotics integration) are needed.

Compute projections vary: one analysis (Epoch AI) suggested training budgets up to ~2×10^29 FLOPs could be feasible by 2030 under optimistic assumptions; other reports place upper bounds near ~3×10^31 FLOPs depending on power and chip production assumptions.

Timelines: why predictions disagree

Different metrics, definitions and confidence levels drive wide disagreement. Aggregated expert surveys show medians often in the 2040–2060 range, while some narrow frameworks and industry estimates give earlier dates (one internal framework estimated 50% by end‑2028 and 80% by end‑2030 under its assumptions). A minority of experts and some industry reports have suggested AGI‑like capabilities could appear as early as 2026. When using these numbers, note the underlying definition of AGI, which benchmark(s) are weighted most heavily, and whether the estimate is conditional on continued scaling or a specific breakthrough.

Risks, governance and geopolitics

  • Geopolitics: RAND models (Dec 1, 2025 reporting) show a prisoner’s dilemma: nations face incentives to accelerate unless international verification and shared risk assessments improve.
  • Security risks: Reports warn of misuse (e.g., advances in bio‑expertise outputs), espionage, and supply‑chain chokepoints (chip export controls and debates around GPU access matter for pace of progress).
  • Safety strategies: Proposals range from technical assurance and transparency to verification regimes and deterrence ideas; all face verification and observability challenges.
  • Ethics and law: Active debates continue over openness, liability, and model access control (paywalls vs open releases).

Bottom line for students (and what to watch)

Progress is real and measurable: top models now match or beat humans on many narrow tasks, have larger context windows, and can sustain autonomous code writing for hours in some internal tests. But key human‑like capacities — durable continual learning, reliable multimodal world models, and trustworthy factuality — remain outstanding. Timelines hinge on whether these gaps are closed by continued scaling, a single breakthrough (e.g., workable continual learning), or new architectures. Policy and safety research must accelerate in parallel.

Watch these signals: AGI‑score framework updates, SPACE / IntPhys / MindCube / SimpleQA benchmark results, compute growth analyses (e.g., Epoch AI), major model releases (GPT‑5 and successors), METR endurance reports, and policy studies like RAND’s — and when possible, prioritize independently reproducible benchmark results over single‑lab internal tests.

References and sources (brief)

  • OpenAI GPT‑5 announcement — Aug 7, 2025 (model release/press materials; reported performance claims).
  • Ten‑ability AGI framework — authors’ reported scores for GPT‑4 (27%) and GPT‑5 (57%) (framework paper/report; framework‑specific mapping to AGI threshold).
  • SPACE visual reasoning subset results — reported GPT‑4o 43.8%, GPT‑5 (Aug 2025) 70.8%, human avg 88.9% (benchmark report / lab release; flagged as reported/internal where applicable).
  • MindCube spatial/working‑memory benchmark — reported GPT‑4o 38.8%, GPT‑5 59.7% (benchmark report).
  • SimpleQA factuality/hallucination comparison — GPT‑5 reported >30% hallucination rate; other models (Anthropic Claude variants) report lower rates (vendor/benchmark reports).
  • METR endurance test — reported GPT‑5.1‑Codex‑Max sustained autonomous performance ~2 hours 42 minutes vs GPT‑4 few minutes (internal lab test; provisional).
  • DeepMind Gemini (’Deep Think’ mode) — reported solving 5 of 6 IMO 2025 problems within 4.5 hours (DeepMind report; task‑constrained result).
  • Epoch AI compute projection — suggested ~2×10^29 FLOPs feasible by 2030 under some assumptions; other reports give upper bounds up to ~3×10^31 FLOPs (compute projection studies).
  • RAND modeling of US–China race — reported Dec 1, 2025 (prisoner’s dilemma framing; policy analysis report).
  • Expert surveys and timeline aggregates — multiple surveys report medians often in 2040–2060 with notable variance (survey meta‑analyses / aggregated studies).

Notes: Where a result was described in the original draft as coming from “internal tests” or a single lab, I preserved the claim but flagged it above as provisional and recommended independent verification. For any use beyond classroom discussion, consult the original reports and benchmark datasets to confirm methodology, sample sizes, dates and reproducibility.

Tags: Artificial Intelligence,Technology,

Latest Updates on Development at Gurgaon/Gurugram Railway Station and Gurugram Metro

See All Articles

Executive summary

  • New elevated Gurugram Metro corridor (HUDA/Millennium City Centre → Cyber City via Old Gurugram) approved; ~28.5 km including a 1.85 km Dwarka Expressway spur, 27 stations (including 1 depot).
  • Construction activity underway (casting yard, piling); Phase 1 civil contract reportedly awarded to a Dilip Buildcon Ltd + RBL JV with work targeted to start Sep 2025 and complete civil works within 30 months.
  • Separate important 1.8 km, Rs 450 crore spur proposed to link Sector 5 directly to Gurugram Railway Station (RITES DPR completed; to be considered by GMRL/Haryana/Union ministries).
  • Several cost, ridership and date figures are reported differently across outlets—these are presented below with source‑style attribution (reported/one report/another report) rather than single definitive values.

Key numbers at a glance

ItemFigure(s) / note
Total corridor length~28.5 km (main line ~26.65 km + Dwarka Expressway spur ~1.85 km)
Stations27 approved (includes 1 depot)
SignallingCBTC (Communication‑Based Train Control) reported
Estimated construction costReported: Rs 6,800 crore (one report); alternate breakdown — Centre Rs 896.19 crore + Haryana Rs 4,556.53 crore = Rs 5,452.72 crore (another report)
Phase 1 reported costReported: Rs 1,286 crore (tender estimate in one report); civil contract value reported as INR 1,503 crore (award to JV)
Ridership projectionReported: ~5.4 lakh by 2026 and ~7.26 lakh by 2031; corridor said to benefit >2.5 million people (reported figures)
Sectors 5 → Railway Station spurLength 1.8 km; estimated cost Rs 450 crore; RITES DPR completed (reported)
Phase 1 construction timingTender float reported end‑July 2025 (one report); contract start Sep 2025 (reported); Phase 1 civil works targeted within 30 months from start (reported)

Approved stations (27)

The approved corridor will have 27 stations. Important named stations (as reported):

Millennium City Centre (HUDA City Centre) — interchange with Delhi Metro Yellow LineSector 45
Cyber Park (Sector 46)Sector 47
Subhash Chowk (proposed interchange with future Bhondsi line)Sector 48
Sector 72AHero Honda Chowk
Udyog Vihar Phase 6Sector 10 (near Bus Terminal)
Sector 37Basai Village (junction for Dwarka Expressway spur)
Sector 9Sector 7
Sector 4 (near Gurgaon Railway Station)Sector 5
Ashok ViharSector 3
Bajghera RoadPalam Vihar Extension
Palam ViharSector 23A
Sector 22Udyog Vihar Phase 4 & 5
Cyber City (interchange with Rapid Metro)Dwarka Expressway (Sector 101) — on 1.85 km spur

Phases, contractors and timelines (reported)

PhScope (reported)Schedule / status
Ph 1Millennium City Centre (HUDA City Centre) → Sector 9; various reports: 15 km / 15.2 km; one report: 15 stations; another: 14 stations.Tender expected end‑July 2025 (one report). Civil contract reportedly awarded to a JV of Dilip Buildcon Ltd (DBL) & RBL (lowest bidder). Construction anticipated to begin Sep 2025; reported contract value INR 1,503 crore; 30‑month completion target for the contract (reported).
Ph 2Sector 9 → Cyber City; length reported ~13–16 km depending on source; may include Sector 5 spur to Railway Station.Tenders and geotechnical surveys being floated (reported).
Ph 3Metro depot development in Sector 33 (reported).Depot works to follow main civil packages (reported).

Important reported dates & progress notes

  • Bhoomi pujan / foundation stone: reported on 3 September 2025 (one report) and 5 September 2025 (another report); leaders reported present included Haryana Chief Minister Nayab Singh Saini and Union Minister Manohar Lal Khattar (reported).
  • Site works: small works and piling have started; contractor began piling near the Sector 31 traffic signal and a casting yard beside Hero Honda Chowk was being prepared (reported).
  • Planned schedule in one report: site work end‑2025; major construction 2026–2027; testing 2028; public operations around 2029. Other reporting aligns Phase 1 structural completion within 30 months from contract start (reported).

Sectors 5 → Gurugram Railway Station spur (detailed)

  • Length: 1.8 km (reported). Estimated cost: Rs 450 crore (reported).
  • Status: RITES DPR/study reportedly completed; proposal to be sent to GMRL board, then Haryana government and Union Ministry for approval; planned inclusion in Phase 2 tender once approved (reported).
  • Land needs for Railway Station entry/exit: total ~1,069 sq m (419 sq m government land; 446 sq m private land; 204 sq m railway land). Sector 5 station area needs ~605 sq m more (reported).
  • Earlier option: skywalk with escalators; state decided dedicated metro spur preferable to boost ridership and connectivity (reported).
  • Separately, an HMRTC DPR for a Bhondsi–Gurugram railway station line is being prepared; with this spur the Bhondsi line may terminate at Sector 5 (reported).

Construction setup & technology

On‑site preparations include a three‑hectare casting yard producing precast U‑girders, pier caps and viaduct segments. Load testing on piles is underway; full piling will follow clearance. Contractor planned facilities for over 1,200 workers in two shifts to enable near‑round‑the‑clock construction. Signalling is reported to use CBTC for safety and precise train control.

Land acquisition policy (reported)

Haryana approved a new land acquisition policy for the Gurugram Metro expansion (reported highlights): GMRL authorised to buy private land by negotiated settlement (direct purchase); an 11‑member Land Acquisition Committee chaired by the Gurugram Deputy Commissioner will oversee negotiations; purchase price to be consolidated and include compensation and rehabilitation as per RFCTLARR Act schedules; no extra R&R beyond negotiated price for titleholders; non‑titleholders handled separately. The process involves site inspection → committee review → public notice → negotiation & purchase → transfer, etc. (reported).

Other network expansions mentioned (reported)

  • Ballabgarh → Palwal: techno‑feasibility underway; ~25 km, ~10 stations (reported).
  • Vatika Chowk → Pachgaon: DPR underway; ~30 km (reported).
  • Sector 45 (Gurgaon) → Bata Chowk (Faridabad): ~31 km corridor planned (reported).
  • Millennium City Centre → Gurgaon Railway Station: alternate route ~11.15 km passing Rajiv Chowk, Subhash Chowk and New Colony Mor (reported).
  • Sector 56 → Pachgaon: new 35.2 km line with 28 stations; estimated cost ~Rs 8,500 crore; Haryana to fund most and seek a 10% central grant; will link to RRTS at Kherki Daula and Pachgaon (reported).

Gurugram Railway Station redevelopment (EPC project)

Reported EPC redevelopment elements: Main Station Building (height 41.3 m), air concourse, through roof, foot over bridge, multi‑level car parking (24.8 m), platform refurbishment, sewage & water treatment, rainwater harvesting and structural glazing. Project targets IGBC Platinum rating and includes sustainable passenger‑experience upgrades (reported).

Real‑world effects and practical notes

Officials expect the corridor to better connect New and Old Gurugram, improve mobility and stimulate economic and real estate activity in Old Gurugram (reported). Reported property price ranges: Sector 4 buying Rs 4,600–7,600 / sq ft (rent Rs 9–15 / sq ft); Palam Vihar buying Rs 7,700–12,000 / sq ft (rent Rs 13–21 / sq ft); Sector 9 buying Rs 5,993–6,351 / sq ft; Ashok Vihar buying Rs 6,027–6,112 / sq ft (reported).

Interoperability: reports say travel will be possible with Delhi Metro cards; interchange with Delhi Metro Yellow Line at HUDA City Centre and Rapid Metro at Cyber City. One article described Gurugram Metro as "first privately owned and operated metro project in India" — this is an attributed claim in media reports and should be treated as reported (unverified here).

What’s confirmed vs reported (guidance)

  • Confirmed (strongly reported across sources): project approved; GMRL as executing agency; 27 stations approved; construction site works (casting yard, piling) underway; RITES DPR prepared for Sector 5 spur (reported consistently).
  • Reported / unverified / conflicting across outlets: exact total project cost (Rs 6,800 crore vs Rs 5,452.72 crore breakdown), Phase 1 cost/tender figures (Rs 1,286 crore vs INR 1,503 crore contract), bhoomi pujan date (3 Sept vs 5 Sept 2025), ridership projections and the "first privately owned" ownership claim. Treat these as provisional until official GMRL/state/central releases confirm.

Glossary (brief)

  • CBTC — Communication‑Based Train Control (modern signalling system).
  • DPR — Detailed Project Report.
  • RITES — Government consultancy agency (conducts DPRs/studies).
  • EPC — Engineering, Procurement & Construction (contract type).
  • RFCTLARR — Right to Fair Compensation and Transparency in Land Acquisition, Rehabilitation and Resettlement Act.
  • HMRTC — Haryana Mass Rapid Transport Corporation (planning body referenced in reports).
  • RRTS — Regional Rapid Transit System (interchange links reported at Kherki Daula, Pachgaon).

Summary (plainly)

In short: the Gurugram Metro corridor (~28.5 km, 27 stations) is approved and early construction activity has begun. Key immediate items: Phase 1 civil contract reportedly awarded with a Sep 2025 start target; a separate 1.8 km, Rs 450 crore Sector 5 → Gurugram Railway Station spur has a RITES DPR and awaits board/government approvals; multiple cost, date and ridership figures are reported differently by outlets and should be treated as provisional until official releases clarify them. I can convert this into a one‑page timeline diagram or a compact station map summary on request.

Tags: Gurugram,Railways,

What’s New in Quantum Computing? A Friendly Update for Students (2024–2025)

See All Articles

Quantum computing — quick refresher

A quantum computer is a machine that uses quantum mechanics (things like superposition and entanglement) to represent and process information. Instead of ordinary bits (0 or 1), quantum bits or qubits can be in combinations of 0 and 1 at once, and qubits can become entangled so their states link together. If large-scale quantum computers become real, they could solve some problems far faster than classical machines (Shor’s algorithm for factoring is the classic example). Right now the field is still early: experiments have run on small numbers of qubits and work continues across hardware, software, theory, and applications. Many governments and military agencies fund quantum research because of potential civilian uses and security implications (like cryptanalysis). Note: without genuine quantum resources such as entanglement, experts generally think you can’t get an exponential advantage over classical computers.

Hardware highlights — chips, qubits and interconnects A lot of the recent work focuses on making qubits more reliable, connecting them, and scaling up:

  • New device records and materials: Researchers reported a terahertz device that sets a performance record and “opens new quantum horizons” — improvements like this can enable better control or readout of quantum states. Other work demonstrated control of triple quantum dots in a zinc oxide (ZnO) semiconductor, expanding the set of materials and device types being explored.

  • Majorana and other processor work: Microsoft unveiled a Majorana-based processor dubbed Majorana 1. It’s being talked about as a potentially transformative step — Majorana fermions are special quasiparticles that could help reduce certain types of errors.

  • Photonics and interconnects: Photonic approaches are getting attention. There’s progress on efficient quantum process tomography (techniques to characterize quantum operations) aimed at scalable optical quantum computing. MIT researchers developed a photon-shuttling “interconnect” that enables direct communication among multiple quantum processors and facilitates remote entanglement — a key step toward distributed quantum computing. MIT also reported a fast coupling between artificial atoms and photons that could enable readout and processing of quantum information in a few nanoseconds.

  • 3D chips and superconducting semiconductors: MIT teams reported new 3D chips that could make electronics faster and more energy-efficient, and work toward superconducting semiconductors that might one day replace components in quantum and high-performance computing.

  • Corral technique and fragile states: A “corral” measurement technique was used to observe fragile quantum states in magnet–superconductor hybrid materials from a distance, which helps study sensitive quantum behavior without destroying it.

Photonics, twisted light, and strong-field quantum effects
Photonics (using light) keeps feeding progress: researchers advanced quantum signaling using “twisted light” (light carrying orbital angular momentum), and synchrotron radiation sources are being framed as toolboxes for quantum technologies. Studies using bright squeezed vacuum uncovered hidden quantum effects in strong-field physics, pointing to new regimes where quantum light matters for experiments and devices.

Networks and communications — building quantum links Quantum networks are moving from theory to real deployments:

  • IonQ expanded into the EU by helping establish Slovakia’s first national quantum communication network.

  • New partnerships: New Zealand partnered with Korea on quantum communication projects, and many countries are deepening quantum ties (for example the UK and Germany committed £14 million to joint efforts).

  • Practical hacks and products: There was a surprising demonstration where a shop-bought cable helped power two quantum networks — this highlights how some quantum testbeds can use surprisingly simple hardware in creative ways.

  • Commercial products: Autocrypt announced a post-quantum PKI product for automotive OEMs (press release dated December 8, 2025), aiming to prepare vehicle systems for future cryptographic threats from quantum computers.

Industry, funding and national strategies
Quantum is attracting money and national initiatives:

  • Investments and deals: Horizon Quantum raised a $110 million PIPE with IonQ among lead investors, intended to support a SPAC merger. Niobium raised more than $23 million to advance next-generation FHE hardware. Delft Circuits appointed Martin Danoesastro as CEO and extended funding. ParityQC won a contract from DLR (German Aerospace Center) to integrate quantum computing into mobility solutions. SEALSQ made a strategic investment in EeroQ.

  • National plans and events: The “Quantum World Tour” and many international events are promoting national visions (e.g., Brazil, Saudi Arabia, Malta, Australia). The UK launched five research hubs with £100 million funding, including one in Oxford. Many countries (China, India, New Zealand, UK, Germany, etc.) are building quantum roadmaps, aiming to develop startups and scientific leadership.

Companies and software direction
- Quantum Source outlined engineering pathways to fault-tolerant quantum computing and promoted scalable photon–atom tech as a practical route. Microsoft researchers emphasized geometric error-correcting codes as steps toward useful applications. Startups and platforms like qBraid are making it easier for nontechnical users to access quantum devices through cloud interfaces.

  • Coverage and market watch: Reports examined “What is the price of a quantum computer in 2025?” and mapped the global quantum landscape, helping businesses and researchers plan strategies.

Theory, algorithms and cryptography
- New algorithms and codes: There was news about a new quantum algorithm that speeds up solving a broad class of problems, and three-way entanglement results hint at better quantum error-correcting codes.

  • Noise and error correction: Symmetry-based simplifications of quantum noise analysis were reported, which may pave the way for better error correction. Efficient process tomography work supports scalable verification of photonic quantum processors.

  • Security implications: As quantum power grows, cryptographers are discovering new rules for quantum encryption. Coverage warned of a “Quantum Apocalypse” angle — the idea that powerful quantum machines could threaten present-day encryption (Shor’s algorithm is a core reason). In response, companies and services (for example, Apple updating iMessage) are working on future-resistant encryption strategies. There was also a warning: a new attack recently invalidated a candidate encryption algorithm, reminding us that both quantum and classical cryptography evolve quickly.

Science community and recognition
- MIT’s Quantum Initiative is growing, and MIT researchers won recognition (Lincoln Laboratory technologies won seven R&D 100 Awards in 2025). MIT published many quantum-related stories in 2025, from quantum modeling for materials to device advances. Daniel Kleppner, a highly influential atomic physicist linked to quantum advances, died at 92 (July 15, 2025).

Big picture and timescales Different players give different timelines. Microsoft has suggested powerful quantum machines could arrive “in years not decades,” while others urge cautious, stepwise progress. The field is broad: hardware (Majorana, superconductors, photonics), networks, error correction, algorithms, and national strategies are all moving in parallel. Phys.org, Quantum Insider, Wired, BBC, MIT News and other outlets tracked this progress — Phys.org alone reaches over 10 million monthly readers through the Science X network.

If you’re a student curious about quantum computing: focus on basic quantum concepts (superposition, entanglement), get comfortable with linear algebra, and follow hardware (superconducting qubits, ion traps, photonics) and software (error correction, algorithms). The field is fast-moving, international, and full of interdisciplinary opportunities — from building new chips and networks to designing the cryptography of the future.

Tags: Technology,Quantum Computing,

Sunday, December 7, 2025

Model Alert... World Labs launched Marble -- Generated, Editable Virtual Spaces

See All on AI Model Releases

Generated, Editable Virtual Spaces

 

Models that generate 3D spaces typically generate them as users move through them without generating a persistent world to be explored later. A new model produces 3D worlds that can be exported and modified.

 

What’s new: World Labs launched Marble, which generates persistent, editable, reusable 3D spaces from text, images, and other inputs. The company also debuted Chisel, an integrated editor that lets users modify Marble’s output via text prompts and craft spaces environments from scratch.

  • Input/output: Text, images, panoramas, videos, 3D layouts of boxes and planes in; Gaussian splats, meshes, or videos out.
  • Features: Expand spaces, combine spaces, alter visual style, edit spaces via text prompts or visual inputs, download generated spaces
  • Availability: Subscription tiers include Free (4 outputs based on text, images, or panoramas), $20 per month (12 outputs based on multiple images, videos, or 3D layouts), $35 per month (25 outputs with expansion and commercial rights), and $95 per month (75 outputs, all features)

How it works: Marble accepts several media types and exports 3D spaces in a variety of formats.

  • The model can generate a 3D space from a single text prompt or image. For more control, it accepts multiple images with text prompts (like front, back, left, or right) that specify which image should map to what areas. Users can also input short videos, 360-degree panoramas, or 3D models and connect outputs to build complex spaces.
  • The Chisel editor can create and edit 3D spaces directly. Geometric shapes like planes or blocks can be used to build structural elements like walls or furniture and styled via text prompts or images.
  • Generated spaces can be extended by clicking on an area to be extended or connected.
  • Model outputs can be Gaussian splats (high-quality representations composed of semi-transparent particles that can be rendered in web browsers), collider meshes (simplified 3D geometries that define object boundaries for physics simulations), and high-quality meshes (detailed geometries suitable for editing). Video output can include controllable camera paths and effects like smoke or flowing water.

Performance: Early users report generating game-like environments and photorealistic recreations of real-world locations.

  • Marble generates more complete 3D structures than depth maps or point clouds, which represent surfaces but not object geometries, World Labs said.
  • Its mesh outputs integrate with tools commonly used in game development, visual effects, and 3D modeling.

Behind the news: Earlier generative models can produce 3D spaces on the fly, but typically such spaces can’t be saved or revisited interactively. Marble stands out by generating spaces that can be saved and edited. For instance, in October, World Labs introduced RTFM, which generates spaces in real time as users navigate through them. Competing startups like Decart and Odyssey are available as demos, and Google’s Genie 3 remains a research preview.

 

Why it matters: World Labs founder and Stanford professor Fei-Fei Li argues that spatial intelligence — understanding how physical objects occupy and move through space — is a key aspect of intelligence that language models can’t fully address. With Marble, World Labs aspires to catalyze development in spatial AI just as ChatGPT and subsequent large language models ignited progress in text processing.

 

We’re thinking: Virtual spaces produced by Marble are geometrically consistent, which may prove valuable in gaming, robotics, and virtual reality. However, the objects within them are static. Virtual worlds that include motion will bring AI even closer to understanding physics.

 

Tags: AI Model Alert,Artificial Intelligence,Technology,

Model Alert... Open 3D Generation Pipeline -- Meta’s Segment Anything Model (SAM) image-segmentation model

See All on AI Model Releases

Open 3D Generation Pipeline

 

Meta’s Segment Anything Model (SAM) image-segmentation model has evolved into an open-weights suite for generating 3D objects. SAM 3 segments images, SAM 3D turns the segments into 3D objects, and SAM 3D Body produces 3D objects of any people among the segments. You can experiment with all three.

 

SAM 3: SAM 3 now segments images and videos based on input text. It retains the ability to segment objects based on input geometry (bounding boxes or points that are labeled to include or exclude the objects at those locations), like the previous version. 

  • Input/output: Images, video, text, geometry in; segmented images or video out
  • Performance: In Meta’s tests, SAM 3 outperformed almost all competitors on a variety of benchmarks that test image and video segmentation. For instance, on LVIS (segmenting objects from text), SAM 3 (48.5 percent average precision) outperformed DINO-X (38.5 percent average precision). It fell behind APE-D (53.0 percent average precision), which was trained on LVIS’ training set. 
  • Availability: Weights and fine-tuning code freely available for noncommercial and commercial uses in countries that don’t violate U.S., EU, UK, and UN trade restrictions under Meta license 

SAM 3D: This model generates 3D objects from images based on segmentation masks. By individually predicting each object in an image, it can represent the entire scene. It can also take in point clouds to improve its output.

  • Input/output: Image, mask, point cloud in; 3D object (mesh, Gaussian splat) out
  • Performance: Judging both objects and scenes generated from photos, humans preferred SAM 3D’s outputs over those by other models. For instance, when generating objects from the LVIS dataset, people preferred SAM 3D nearly 80 percent of the time, Hunyuan3d 2.0 about 12 percent of the time, and other models 8 percent of the time.
  • Availability: Weights and inference code freely available for noncommercial and commercial uses in countries that don’t violate U.S., EU, UK, and UN trade restrictions under Meta license

SAM 3D Body: Meta released an additional model that produces 3D human figures from images. Input bounding boxes or masks can also determine which figures to produce, and an optional transformer decoder can refine the positions and shapes of human hands.

  • Input/output: Image, bounding boxes, masks in; 3D objects (mesh, Gaussian splat) out
  • Performance: In Meta’s tests, SAM 3D Body achieved the best performance across a number of datasets compared to other models that take images or videos and generate 3D human figures. For example, on the EMDB dataset of people in the wild, SAM 3D Body achieved 62.9 Mean Per Joint Position Error (MPJPE, a measure of how different the predicted joint positions are from the ground truth, lower is better) compared to next best Neural Localizer Fields, which achieved 68.4 MPJPE. On Freihand (a test of hand correctness), SAM 3D Body achieved similar or slightly worse performance than models that specialize in estimating hand poses. (The authors claim the other models were trained on Freihand’s training set.)
  • Availability: Weights, inference code, and training data freely available in countries that don’t violate U.S., EU, UK, and UN trade restrictions under Meta license

Why it matters: This SAM series offers a unified pipeline for making 3D models from images. Each model advances the state of the art, enabling more-accurate image segmentations from text, 3D objects that human judges preferred, and 3D human figures that also appealed to human judges. These models are already driving innovations in Meta’s user experience. For instance, SAM 3 and SAM 3D enable users of Facebook marketplace to see what furniture or other home decor looks like in a particular space.

 

We’re thinking:  At the highest level, all three models learned from a similar data pipeline: Find examples the model currently performs poorly on, use humans to annotate them, and train on the annotations. According to Meta’s publications, this process greatly reduced the time and money required to annotate quality datasets.

 

Tags: Technology,Artificial Intelligence,AI Model Alert,

Model Alert... Ernie -- Baidu’s Multimodal Bids

See All on AI Model Releases

Baidu’s Multimodal Bids

 

Baidu debuted two models: a lightweight, open-weights, vision-language model and a giant, proprietary, multimodal model built to take on U.S. competitors.

 

Ernie-4.5-VL-28B-A3B-Thinking: Baidu’s new open-weights model is based on the earlier Ernie-4.5-21B-A3B Thinking, a text-only MoE reasoning model, plus a 7 billion-parameter vision encoder to process images.It outperforms comparable and larger models on visual reasoning tasks. It can extract on-screen text and analyze videos across time, and it can call tools to zoom in on image details and search for related images.

  • Input/output: Text, image, video in (up to 128,000 tokens); text out
  • Architecture: Mixture-of-experts (MoE) transformer (28 billion parameters total, 3 billion active per token), 21 billion-parameter language decoder/encoder. 
  • Training: The authors used vision-language reasoning examples during mid-training, an emerging phase that typically uses mid-size datasets to sharpen distinct skills or impart specific domains prior to fine-tuning. In addition, they fine-tune via reinforcement learning (RL) with multimodal data. Because MoE architectures can become unstable during RL, the team used a combination of GSPO and IcePop to stabilize the fine-tuning.
  • Features: Tool use, reasoning
  • Performance: Ernie-4.5-VL-28B-A3B-Thinking competes with larger proprietary models on document understanding tasks despite activating only 3 billion parameters, Baidu said. For instance, on ChartQA (chart interpretation), Ernie-4.5-VL-28B-A3B-Thinking reached 87.1 percent accuracy, outperforming Gemini 2.5 Pro (76.3 percent) and GPT-5 set to high reasoning (78.2 percent). On OCRBench (text recognition in images), it achieved 858, ahead of GPT-5 set to high reasoning (810) but trailing Gemini 2.5 Pro (866).
  • Availability: Weights free for noncommercial and commercial uses under Apache 2.0 license via HuggingFace. API $0.14/$0.56 per million input/output tokens via Baidu Qianfan.
  • Undisclosed: Output size limit, training data, reward models

Ernie-5.0: Baidu describes Ernie-5.0’s approach as natively multimodal, meaning it was trained on text, images, audio, and video together rather than fusing different media encoders after training or routing inputs to specialized models. It performs comparably to the similarly multimodal Google Gemini 2.5 or OpenAI GPT-5, according to Baidu.

  • Input/output: Text, image, audio, and video in (up to 128,000 tokens); text, image, audio, video out (up to 64,000 tokens)
  • Architecture: Mixture-of-experts (MoE) transformer (2.4 trillion parameters total, less than 72 billion active per token)
  • Features: Vision-language-audio understanding, reasoning, agentic planning, tool use
  • Performance: In Baidu’s tests of multimodal reasoning, document understanding, and visual question-answering, the company reports that Ernie-5.0 matched or exceeded OpenAI GPT-5 set to high reasoning and Google Gemini 2.5 Pro. For instance, on OCRBench (document comprehension), DocVQA (document comprehension), and ChartQA (structured data reasoning), Baidu Ernie-5.0 achieved top scores. On MM-AU (multimodal audio understanding) and TUT2017 (acoustic scene classification), it demonstrated competitive performance, Baidu said without publishing specific metrics.
  • Availability: Free web interface, API $0.85/$3.40 per million input/output tokens via Baidu Qianfan
  • Undisclosed: Training data, training methods

Yes, but: Shortly after Ernie-5.0's launch, a developer reported that the model repeatedly called tools even after instruction not to. Baidu acknowledged the issue and said it was fixing it.

 

Why it matters: Ernie-4.5-VL-28B-A3B-Thinking offers top visual reasoning at the fraction of the cost of competing models, and more flexibility for fine-tuning and other commercial customizations. However, the long-awaited Ernie 5.0 appears to fall short of expectations. It matches top models on some visual tasks but stops short of the forefront (including Qwen3-Max and Kimi-K2-Thinking) on leaderboards like LM Arena. Pretraining on text, images, video, and audio together is a relatively fresh approach that could simplify current systems that piece together different encoders and decoders for different media types.

 

We’re thinking: Ernie-5.0 may outperform Gemini 2.5 and GPT-5, but Google and OpenAI have already moved on to Gemini 3 and GPT-5.1!