Tuesday, June 16, 2026

Attempting to run a local LLM via LM Studio

See All on GenAI    « Previously

Explain these logs... (logs attached at the end of the post)




The key error is this line:

ggml_vulkan: device Vulkan0 does not support 16-bit storage.
...
llama_model_load: error loading model: Unsupported device

Everything else is LM Studio repeatedly trying to load different GGUF models and failing for the same reason.

What's happening

LM Studio is using the Vulkan backend (GPU acceleration via llama.cpp/ggml).

When it starts loading a model, it detects your Vulkan-capable GPU:

MESA-INTEL: warning: Haswell Vulkan support is incomplete

This is already a warning sign. It means you're using an Intel Haswell-generation GPU (around 2013–2015), and Mesa's Vulkan implementation for it is incomplete.

Then:

ggml_vulkan: device Vulkan0 does not support 16-bit storage.

Modern llama.cpp Vulkan kernels require the Vulkan feature:

VK_KHR_16bit_storage

or equivalent 16-bit storage support.

Your GPU/driver does not provide it.

As soon as ggml checks the GPU capabilities, model initialization aborts:

error loading model: Unsupported device

Why every model fails

You tried multiple models:

  • LFM2.5-1.2B-Instruct-Q6_K

  • gemma-3-270m-it-qat-Q4_0

  • LFM2-350M-Q4_0

All fail identically.

That tells us:

✅ Models are not corrupted

✅ Quantization is not the issue

✅ Context size is not the issue

✅ RAM is probably not the issue

❌ GPU backend initialization is failing before model loading completes


The important clues

Clue 1

MESA-INTEL: warning: Haswell Vulkan support is incomplete

Haswell Intel GPUs are:

  • HD 4200

  • HD 4400

  • HD 4600

  • Iris 5100

  • Iris Pro 5200

These GPUs have only experimental Vulkan support through Mesa's ANV driver.


Clue 2

device Vulkan0 does not support 16-bit storage

llama.cpp Vulkan backend requires GPU features that Haswell generally lacks.

Therefore LM Studio cannot use Vulkan on this hardware.


The strange token warning

This warning is unrelated:

control-looking token: 212 '</s>' was not control-type

This comes from the Gemma GGUF metadata.

It's usually harmless and not the reason for failure.


Most likely setup

You're probably running:

  • Linux

  • Intel Haswell integrated graphics

  • Mesa Vulkan driver

  • LM Studio attempting GPU acceleration

The GPU backend is unsupported.


How to verify

Run:

Bash
lspci | grep VGA

and

Bash
vulkaninfo | grep "deviceName"

or:

Bash
glxinfo | grep "OpenGL renderer"

You'll likely see something like:

Intel HD Graphics 4600

or another Haswell GPU.


Fixes

Option 1: Disable GPU acceleration (recommended)

In LM Studio:

  1. Settings

  2. Runtime / Hardware

  3. Disable Vulkan GPU acceleration

  4. Force CPU inference

or set GPU layers to:

0

Then reload the model.

For small models like:

  • 270M

  • 350M

  • 1.2B

CPU inference should work fine.


Option 2: Update Mesa

Check:

Bash
mesa --version

or:

Bash
vulkaninfo

A newer Mesa may expose more features, but Haswell hardware itself generally lacks the required capability, so don't expect this to solve it.


Option 3: Use OpenCL/CPU backend

If LM Studio offers:

  • CPU

  • OpenBLAS

  • CLBlast/OpenCL

use those instead of Vulkan.


Option 4: Use a newer GPU

Any reasonably modern GPU supports the needed Vulkan features:

  • Intel Xe

  • Intel Arc

  • AMD RDNA

  • NVIDIA GTX 10xx and newer


Bottom line

Your models are failing because LM Studio is trying to use the Vulkan backend on an older Intel Haswell GPU, and that GPU does not support Vulkan 16-bit storage, which the llama.cpp Vulkan implementation requires:

ggml_vulkan: device Vulkan0 does not support 16-bit storage

The practical fix is to disable GPU/Vulkan acceleration and run the models on CPU, or use newer graphics hardware.




(base) jain@jain-ThinkPad-T440s:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)




That confirms it.

Your machine has:

Intel Corporation Haswell-ULT Integrated Graphics Controller

On a ThinkPad T440s this is typically one of:

  • Intel HD Graphics 4400

  • Intel HD Graphics 5000

These are 4th-generation Intel GPUs (2013-era Haswell).

Why LM Studio fails

The log showed:

ggml_vulkan: device Vulkan0 does not support 16-bit storage

and

error loading model: Unsupported device

The Vulkan backend used by LM Studio (via llama.cpp/ggml) expects GPU features that Haswell iGPUs do not provide. Even though Mesa exposes Vulkan support, it is incomplete:

MESA-INTEL: warning: Haswell Vulkan support is incomplete

So LM Studio sees a Vulkan device, tries to use it, then aborts when the required feature checks fail.


What you should do

Disable GPU acceleration completely

In LM Studio:

  1. Open Settings

  2. Go to Runtime or Hardware Acceleration

  3. Select CPU instead of Vulkan

  4. Set GPU Offload Layers to 0

  5. Reload the model

For your hardware this is the correct configuration.


Can the T440s still run local models?

Yes, but on CPU.

The T440s usually has:

  • 2-core/4-thread or 2-core/4-thread Haswell-U CPU

  • 8–12 GB RAM (sometimes 16 GB)

Reasonable models:

ModelStatus
Gemma 3 270MExcellent
LFM2 350MExcellent
TinyLlama 1.1BGood
Qwen 2.5 0.5BGood
Qwen 2.5 1.5B Q4Usable
Phi-3 Mini Q4Slow but usable
7B modelsVery slow

Verify LM Studio is using CPU

After changing settings, load a model and check the logs.

You should not see:

ggml_vulkan

or

Vulkan0

Instead you'll see messages mentioning CPU buffers, threads, and model tensors loading successfully.


If LM Studio still insists on Vulkan

Run:

Bash
LIBGGML_VULKAN=0 lmstudio

or

Bash
GGML_VULKAN_DISABLE=1 lmstudio

depending on how LM Studio is installed.

If you're using the AppImage, tell me:

Bash
which lmstudio

or how you installed it (AppImage, .deb, Flatpak, etc.), and I can give the exact command to force CPU mode.




Complete Logs


Developer Logs
2026-06-16 18:43:30 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:30 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:30 [DEBUG]
 0.00.109.877 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:31 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:31 [DEBUG]
 0.00.349.937 E llama_model_load: error loading model: Unsupported device
0.00.349.964 E llama_model_load_from_file_impl: failed to load model
0.00.349.965 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.349.968 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:31 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:43:49 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:49 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:49 [DEBUG]
 0.00.160.092
2026-06-16 18:43:49 [DEBUG]
 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:49 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:50 [DEBUG]
 0.00.405.523 E llama_model_load: error loading model: Unsupported device
0.00.405.543 E llama_model_load_from_file_impl: failed to load model
0.00.405.545 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.405.547 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:50 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:44:03 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:44:03 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:44:03 [DEBUG]
 0.00.071.932 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:44:03 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:44:03 [DEBUG]
 0.00.269.473 E llama_model_load: error loading model: Unsupported device
0.00.269.499 E llama_model_load_from_file_impl: failed to load model
0.00.269.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.269.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:44:03 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.110.637 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:30 [DEBUG]
 0.00.668.125 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:30 [DEBUG]
 0.00.836.575 E llama_model_load: error loading model: Unsupported device
0.00.836.602 E llama_model_load_from_file_impl: failed to load model
0.00.836.603 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.836.606 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:58 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.107.554 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:59 [DEBUG]
 0.00.632.439 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:59 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:59 [DEBUG]
 0.00.802.706 E llama_model_load: error loading model: Unsupported device
0.00.802.735 E llama_model_load_from_file_impl: failed to load model
0.00.802.737 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.802.739 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:59 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:53:59 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:53:59 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:53:59 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.161.414 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:54:00 [DEBUG]
 0.00.813.256
2026-06-16 18:54:00 [DEBUG]
 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:54:00 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:54:00 [DEBUG]
 0.01.092.471 E llama_model_load: error loading model: Unsupported device
0.01.092.498 E llama_model_load_from_file_impl: failed to load model
0.01.092.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.01.092.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:54:00 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:56:57 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:56:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:56:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.113.876 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:56:58 [DEBUG]
 0.00.664.116 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:56:58 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:56:58 [DEBUG]
 0.00.832.215 E llama_model_load: error loading model: Unsupported device
0.00.832.233 E llama_model_load_from_file_impl: failed to load model
0.00.832.234 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.832.237 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:56:58 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:58:53 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:58:53 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:58:53 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.141.947 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:58:54 [DEBUG]
 0.00.766.698 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:58:54 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:58:54 [DEBUG]
 0.00.952.940 E llama_model_load: error loading model: Unsupported device
0.00.952.974 E llama_model_load_from_file_impl: failed to load model
0.00.952.976 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.952.979 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:58:54 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:22 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:22 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:22 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.125.143 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:23 [DEBUG]
 0.00.736.472 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:23 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:23 [DEBUG]
 0.00.917.603 E llama_model_load: error loading model: Unsupported device
0.00.917.620 E llama_model_load_from_file_impl: failed to load model
0.00.917.622 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.917.624 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:23 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
2026-06-16 18:59:29 [DEBUG]
 0.00.117.496 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:30 [DEBUG]
 0.00.682.134 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:30 [DEBUG]
 0.00.855.008 E llama_model_load: error loading model: Unsupported device
0.00.855.042 E llama_model_load_from_file_impl: failed to load model
0.00.855.044 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.855.046 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:02 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:02 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:02 [DEBUG]
 0.00.092.224 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:02 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:02 [DEBUG]
 0.00.300.094 E llama_model_load: error loading model: Unsupported device
0.00.300.121 E llama_model_load_from_file_impl: failed to load model
0.00.300.123 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.300.125 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:02 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:06 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:06 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:06 [DEBUG]
 0.00.077.805 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:06 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:07 [DEBUG]
 0.00.301.853 E llama_model_load: error loading model: Unsupported device
0.00.301.872 E llama_model_load_from_file_impl: failed to load model
0.00.301.874 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.301.876 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:07 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:38 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:38 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:38 [DEBUG]
 0.00.079.694 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:38 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:38 [DEBUG]
 0.00.316.444 E llama_model_load: error loading model: Unsupported device
0.00.316.476 E llama_model_load_from_file_impl: failed to load model
0.00.316.478 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.316.480 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:38 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
See All on GenAI    « Previously
Tags: Generative AI,Agentic AI,

Liquid LFM2.5-1.2B: A Tiny AI Model with Surprisingly Big Ambitions

See All on AI Model Releases    « Previously

Liquid LFM2.5-1.2B: A Tiny AI Model with Surprisingly Big Ambitions

Exploring how Liquid AI's compact 1.2-billion-parameter model delivers fast, efficient, and capable AI experiences on everyday devices.

The AI industry has spent years chasing larger and larger language models, often requiring powerful servers, expensive GPUs, and significant energy consumption. Liquid AI is taking a different path with the LFM2.5-1.2B family—a compact model designed to bring advanced AI capabilities directly onto edge devices while maintaining competitive performance.

Available in Base, Instruct, and Thinking variants, the LFM2.5-1.2B models demonstrate that useful AI doesn't necessarily require tens or hundreds of billions of parameters. Instead, Liquid AI focuses on efficiency, optimized architecture design, and extensive training to maximize performance within a small footprint.

What Is LFM2.5-1.2B?

LFM2.5-1.2B is part of Liquid AI's latest generation of Liquid Foundation Models (LFMs). Built specifically for on-device deployment, the model contains approximately 1.2 billion parameters and is optimized for low memory usage, fast inference, and real-world deployment scenarios.

Model Size

1.2 Billion Parameters

Context Length

Up to 32K Tokens

Deployment

On-device, Edge, Cloud

Memory Footprint

Under 1GB in optimized deployments

The model builds upon Liquid AI's hybrid architecture approach, combining attention mechanisms with specialized convolutional components to improve speed and efficiency compared to traditional transformer-only designs.

Why Small Models Matter Again

As enterprises and developers seek lower costs, improved privacy, and reduced latency, compact models are becoming increasingly important. Rather than sending every request to a remote server, organizations can deploy lightweight AI systems directly on laptops, smartphones, embedded systems, and edge infrastructure.

Key Idea: Instead of competing solely on parameter count, LFM2.5-1.2B competes on efficiency, speed, and deployment flexibility.

This shift is particularly valuable for industries where privacy, reliability, and offline operation are critical. Healthcare, industrial automation, customer service, field operations, and mobile applications can all benefit from running AI locally.

The Three Variants of LFM2.5-1.2B

1. LFM2.5-1.2B-Base

The Base version serves as a foundation model intended for customization, fine-tuning, and specialized applications. Developers can adapt it to domain-specific tasks without starting from scratch.

2. LFM2.5-1.2B-Instruct

The Instruct model is optimized for conversational AI and instruction following. It delivers a user-friendly chat experience while maintaining fast response times and low hardware requirements.

3. LFM2.5-1.2B-Thinking

The Thinking version is designed for reasoning-heavy tasks. It generates intermediate reasoning steps before producing answers, helping improve performance on multi-step problems, logical reasoning, planning, and complex decision-making tasks.

Performance Beyond Its Size

One of the most impressive aspects of LFM2.5-1.2B is how effectively it uses its limited parameter budget. Through expanded pretraining, reinforcement learning techniques, and architecture optimization, Liquid AI positions the model as a serious competitor to significantly larger open-source alternatives.

The company reports strong benchmark performance across reasoning, instruction following, and knowledge tasks while maintaining inference speeds suitable for edge deployment.

Built for Edge AI

Edge AI is rapidly becoming one of the most important trends in machine learning. Users increasingly expect intelligent systems to work instantly, privately, and without constant internet connectivity.

LFM2.5-1.2B was designed with these requirements in mind:

  • Fast CPU inference
  • Mobile NPU compatibility
  • Low memory consumption
  • Reduced cloud costs
  • Improved privacy through local execution
  • Support for long-context applications

This makes the model particularly attractive for AI-powered mobile apps, personal assistants, local copilots, and enterprise edge deployments.

Developer Ecosystem and Deployment Options

Another strength of the LFM2.5 ecosystem is broad tooling support. Developers can deploy the model using popular frameworks and runtimes, enabling quick experimentation and production deployment.

  • llama.cpp
  • vLLM
  • MLX
  • Ollama
  • Transformers
  • Fine-tuning frameworks such as Unsloth and TRL

This flexibility lowers adoption barriers and makes the model accessible to startups, enterprises, and independent developers alike.

Potential Use Cases

Thanks to its balance between size and capability, LFM2.5-1.2B can power a wide range of applications:

  • AI chat assistants
  • Document analysis and summarization
  • Retrieval-augmented generation (RAG)
  • Agent-based workflows
  • Code assistance
  • Knowledge management systems
  • Offline AI applications
  • Smart device integrations

The Future of Efficient AI

The AI industry is gradually realizing that bigger isn't always better. While frontier-scale models continue to push capability boundaries, compact models like LFM2.5-1.2B are making advanced AI accessible to a much broader range of devices and users.

This trend mirrors the evolution of computing itself: powerful technologies eventually become smaller, cheaper, and more widely available.

Final Thoughts

Liquid AI's LFM2.5-1.2B demonstrates how thoughtful architecture, extensive training, and deployment-focused engineering can create a highly capable language model without requiring massive computational resources.

For developers seeking fast, private, and cost-effective AI solutions, LFM2.5-1.2B represents an exciting glimpse into the future of edge AI. Rather than chasing scale alone, it proves that efficiency can be just as transformative.


See All on AI Model Releases    « Previously Tags: Agentic AI,Large Language Models,AI Model Alert,