survival8: Agentic AI

Showing posts with label Agentic AI. Show all posts

Tuesday, June 16, 2026

Attempting to run a local LLM via LM Studio

Explain these logs... (logs attached at the end of the post)

The key error is this line:


ggml_vulkan: device Vulkan0 does not support 16-bit storage.
...
llama_model_load: error loading model: Unsupported device

Everything else is LM Studio repeatedly trying to load different GGUF models and failing for the same reason.

What's happening

LM Studio is using the Vulkan backend (GPU acceleration via llama.cpp/ggml).

When it starts loading a model, it detects your Vulkan-capable GPU:


MESA-INTEL: warning: Haswell Vulkan support is incomplete

This is already a warning sign. It means you're using an Intel Haswell-generation GPU (around 2013–2015), and Mesa's Vulkan implementation for it is incomplete.

Then:


ggml_vulkan: device Vulkan0 does not support 16-bit storage.

Modern llama.cpp Vulkan kernels require the Vulkan feature:


VK_KHR_16bit_storage

or equivalent 16-bit storage support.

Your GPU/driver does not provide it.

As soon as ggml checks the GPU capabilities, model initialization aborts:


error loading model: Unsupported device

Why every model fails

You tried multiple models:

LFM2.5-1.2B-Instruct-Q6_K
gemma-3-270m-it-qat-Q4_0
LFM2-350M-Q4_0

All fail identically.

That tells us:

✅ Models are not corrupted

✅ Quantization is not the issue

✅ Context size is not the issue

✅ RAM is probably not the issue

❌ GPU backend initialization is failing before model loading completes

The important clues

Clue 1


MESA-INTEL: warning: Haswell Vulkan support is incomplete

Haswell Intel GPUs are:

HD 4200
HD 4400
HD 4600
Iris 5100
Iris Pro 5200

These GPUs have only experimental Vulkan support through Mesa's ANV driver.

Clue 2


device Vulkan0 does not support 16-bit storage

llama.cpp Vulkan backend requires GPU features that Haswell generally lacks.

Therefore LM Studio cannot use Vulkan on this hardware.

The strange token warning

This warning is unrelated:


control-looking token: 212 '</s>' was not control-type

This comes from the Gemma GGUF metadata.

It's usually harmless and not the reason for failure.

Most likely setup

You're probably running:

Linux
Intel Haswell integrated graphics
Mesa Vulkan driver
LM Studio attempting GPU acceleration

The GPU backend is unsupported.

How to verify

Run:

Bash
lspci | grep VGA

and

Bash
vulkaninfo | grep "deviceName"

or:

Bash
glxinfo | grep "OpenGL renderer"

You'll likely see something like:


Intel HD Graphics 4600

or another Haswell GPU.

Fixes

Option 1: Disable GPU acceleration (recommended)

In LM Studio:

Settings
Runtime / Hardware
Disable Vulkan GPU acceleration
Force CPU inference

or set GPU layers to:

Then reload the model.

For small models like:

270M
350M
1.2B

CPU inference should work fine.

Option 2: Update Mesa

Check:

Bash
mesa --version

or:

Bash
vulkaninfo

A newer Mesa may expose more features, but Haswell hardware itself generally lacks the required capability, so don't expect this to solve it.

Option 3: Use OpenCL/CPU backend

If LM Studio offers:

CPU
OpenBLAS
CLBlast/OpenCL

use those instead of Vulkan.

Option 4: Use a newer GPU

Any reasonably modern GPU supports the needed Vulkan features:

Intel Xe
Intel Arc
AMD RDNA
NVIDIA GTX 10xx and newer

Bottom line

Your models are failing because LM Studio is trying to use the Vulkan backend on an older Intel Haswell GPU, and that GPU does not support Vulkan 16-bit storage, which the llama.cpp Vulkan implementation requires:


ggml_vulkan: device Vulkan0 does not support 16-bit storage

The practical fix is to disable GPU/Vulkan acceleration and run the models on CPU, or use newer graphics hardware.

(base) jain@jain-ThinkPad-T440s:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)

That confirms it.

Your machine has:


Intel Corporation Haswell-ULT Integrated Graphics Controller

On a ThinkPad T440s this is typically one of:

Intel HD Graphics 4400
Intel HD Graphics 5000

These are 4th-generation Intel GPUs (2013-era Haswell).

Why LM Studio fails

The log showed:


ggml_vulkan: device Vulkan0 does not support 16-bit storage

and


error loading model: Unsupported device

The Vulkan backend used by LM Studio (via llama.cpp/ggml) expects GPU features that Haswell iGPUs do not provide. Even though Mesa exposes Vulkan support, it is incomplete:


MESA-INTEL: warning: Haswell Vulkan support is incomplete

So LM Studio sees a Vulkan device, tries to use it, then aborts when the required feature checks fail.

What you should do

Disable GPU acceleration completely

In LM Studio:

Open Settings
Go to Runtime or Hardware Acceleration
Select CPU instead of Vulkan
Set GPU Offload Layers to 0
Reload the model

For your hardware this is the correct configuration.

Can the T440s still run local models?

Yes, but on CPU.

The T440s usually has:

2-core/4-thread or 2-core/4-thread Haswell-U CPU
8–12 GB RAM (sometimes 16 GB)

Reasonable models:

Model	Status
Gemma 3 270M	Excellent
LFM2 350M	Excellent
TinyLlama 1.1B	Good
Qwen 2.5 0.5B	Good
Qwen 2.5 1.5B Q4	Usable
Phi-3 Mini Q4	Slow but usable
7B models	Very slow

Verify LM Studio is using CPU

After changing settings, load a model and check the logs.

You should not see:


ggml_vulkan


Vulkan0

Instead you'll see messages mentioning CPU buffers, threads, and model tensors loading successfully.

If LM Studio still insists on Vulkan

Run:

Bash
LIBGGML_VULKAN=0 lmstudio

Bash
GGML_VULKAN_DISABLE=1 lmstudio

depending on how LM Studio is installed.

If you're using the AppImage, tell me:

Bash
which lmstudio

or how you installed it (AppImage, .deb, Flatpak, etc.), and I can give the exact command to force CPU mode.

Complete Logs


Developer Logs
2026-06-16 18:43:30 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:30 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:30 [DEBUG]
 0.00.109.877 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:31 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:31 [DEBUG]
 0.00.349.937 E llama_model_load: error loading model: Unsupported device
0.00.349.964 E llama_model_load_from_file_impl: failed to load model
0.00.349.965 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.349.968 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:31 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:43:49 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:49 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:49 [DEBUG]
 0.00.160.092
2026-06-16 18:43:49 [DEBUG]
 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:49 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:50 [DEBUG]
 0.00.405.523 E llama_model_load: error loading model: Unsupported device
0.00.405.543 E llama_model_load_from_file_impl: failed to load model
0.00.405.545 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.405.547 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:50 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:44:03 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:44:03 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:44:03 [DEBUG]
 0.00.071.932 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:44:03 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:44:03 [DEBUG]
 0.00.269.473 E llama_model_load: error loading model: Unsupported device
0.00.269.499 E llama_model_load_from_file_impl: failed to load model
0.00.269.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.269.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:44:03 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.110.637 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:30 [DEBUG]
 0.00.668.125 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:30 [DEBUG]
 0.00.836.575 E llama_model_load: error loading model: Unsupported device
0.00.836.602 E llama_model_load_from_file_impl: failed to load model
0.00.836.603 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.836.606 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:58 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.107.554 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:59 [DEBUG]
 0.00.632.439 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:59 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:59 [DEBUG]
 0.00.802.706 E llama_model_load: error loading model: Unsupported device
0.00.802.735 E llama_model_load_from_file_impl: failed to load model
0.00.802.737 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.802.739 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:59 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:53:59 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:53:59 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:53:59 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.161.414 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:54:00 [DEBUG]
 0.00.813.256
2026-06-16 18:54:00 [DEBUG]
 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:54:00 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:54:00 [DEBUG]
 0.01.092.471 E llama_model_load: error loading model: Unsupported device
0.01.092.498 E llama_model_load_from_file_impl: failed to load model
0.01.092.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.01.092.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:54:00 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:56:57 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:56:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:56:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.113.876 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:56:58 [DEBUG]
 0.00.664.116 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:56:58 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:56:58 [DEBUG]
 0.00.832.215 E llama_model_load: error loading model: Unsupported device
0.00.832.233 E llama_model_load_from_file_impl: failed to load model
0.00.832.234 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.832.237 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:56:58 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:58:53 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:58:53 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:58:53 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.141.947 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:58:54 [DEBUG]
 0.00.766.698 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:58:54 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:58:54 [DEBUG]
 0.00.952.940 E llama_model_load: error loading model: Unsupported device
0.00.952.974 E llama_model_load_from_file_impl: failed to load model
0.00.952.976 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.952.979 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:58:54 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:22 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:22 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:22 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.125.143 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:23 [DEBUG]
 0.00.736.472 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:23 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:23 [DEBUG]
 0.00.917.603 E llama_model_load: error loading model: Unsupported device
0.00.917.620 E llama_model_load_from_file_impl: failed to load model
0.00.917.622 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.917.624 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:23 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
2026-06-16 18:59:29 [DEBUG]
 0.00.117.496 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:30 [DEBUG]
 0.00.682.134 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:30 [DEBUG]
 0.00.855.008 E llama_model_load: error loading model: Unsupported device
0.00.855.042 E llama_model_load_from_file_impl: failed to load model
0.00.855.044 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.855.046 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:02 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:02 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:02 [DEBUG]
 0.00.092.224 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:02 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:02 [DEBUG]
 0.00.300.094 E llama_model_load: error loading model: Unsupported device
0.00.300.121 E llama_model_load_from_file_impl: failed to load model
0.00.300.123 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.300.125 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:02 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:06 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:06 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:06 [DEBUG]
 0.00.077.805 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:06 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:07 [DEBUG]
 0.00.301.853 E llama_model_load: error loading model: Unsupported device
0.00.301.872 E llama_model_load_from_file_impl: failed to load model
0.00.301.874 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.301.876 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:07 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:38 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:38 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:38 [DEBUG]
 0.00.079.694 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:38 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:38 [DEBUG]
 0.00.316.444 E llama_model_load: error loading model: Unsupported device
0.00.316.476 E llama_model_load_from_file_impl: failed to load model
0.00.316.478 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.316.480 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:38 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}

See All on GenAI « Previously
Tags: Generative AI,Agentic AI,

Day out with LM Studio (for running local LLMs)

See All on GenAI « Previously Next »

LM Studio is widely considered the absolute gold standard for running local LLMs if you prefer a clean, visual interface over a terminal window. It abstracts away all the complex command-line arguments of tools like llama.cpp while still giving you deep developer controls under the hood.

Setting it up and getting your first model running takes less than 10 minutes.

1. System Check (What Fits?)

Before downloading a massive model that locks up your computer, check your hardware specs. LM Studio relies heavily on VRAM (GPU Memory), with system RAM as a fallback.

Total Available VRAM	Recommended Model Size	Best Quantization Format
8 GB	7B - 8B models (e.g., Llama 3 8B)	`Q4_K_M` (Practical baseline)
12 GB - 16 GB	12B - 14B models (e.g., Gemma 4 12B, Qwen 3.6 14B)	`Q4_K_M` or `Q6_K`
24 GB	32B - 35B models (e.g., Qwen 3.6 35B MoE)	`Q4_K_M` or `Q6_K` (The sweet spot)
48 GB+	70B+ models	Full 8-bit (`Q8_0`) or unquantized (`BF16`)

💡 Apple Silicon Note: If you are running an M-series Mac, LM Studio automatically defaults to Apple's MLX runtime. Because Mac uses unified memory, your system RAM handles the heavy lifting directly.

2. Step-by-Step Setup Guide

Download and Install

~2 minutes

Go to lmstudio.ai and download the installer matching your OS (Windows x64/ARM, macOS M-series, or Linux AppImage). Run the installer to open the GUI.

Discover and Download a Model

~3-5 minutes

Click the Search/Discover icon (Magnifying Glass) on the left sidebar. Type in a popular open model like Gemma 4 12B or Qwen 3.6 Coder.

LM Studio will display a list of available Hugging Face files. Look for the green rocket icon next to the files—this indicates the model quantization will comfortably fit your hardware profile. Click Download.

Configure Your Hardware Engine

~1 minute

Head to the AI Chat view (Bubble icon) and look at the right-hand settings panel. Under Hardware Settings, select your runtime engine:

NVIDIA: Choose CUDA 12 llama.cpp.
Apple Silicon: Leave it on MLX.
AMD/Intel GPU: Choose Vulkan llama.cpp.
CPU Only: Choose CPU llama.cpp (if you don't have a dedicated GPU).

Adjust GPU Offload and Context

~1 minute

If you're using a discrete GPU (like NVIDIA), locate the GPU Offload slider. Toggle it to Max to push as many layers of the model into your VRAM as possible.

Set your Context Length next (start with 4096 or 8192 tokens). Higher context lengths use exponentially more VRAM.

Load and Chat

Instant

At the very top of the window, click the "Select a model to load" dropdown and select your downloaded model. Once the progress bar fills, type your prompt in the bottom text box and enjoy 100% private, offline AI.

3. Power-User Features to Explore Later

Once you have basic chat working, LM Studio has major features designed for software development and local workflows:

Local OpenAI-Compatible Server

Click the Developer tab (Code brackets icon) on the left menu. Here, you can click Start Server to spin up a local API endpoint on localhost:1234. Because it is fully OpenAI-compatible, you can drop this endpoint straight into developer setups, IDE extensions (like Continue or VS Code Copilot alternatives), or local scripts using the standard OpenAI SDK format:

Python
                                    
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="local-model", # It automatically targets whatever model is currently loaded
    messages=[{"role": "user", "content": "Write a quick Python sort algorithm."}]
)
print(response.choices[0].message.content)

Chat with Documents (Local RAG)

You can attach local text files, PDFs, or code repositories directly into your chat. LM Studio handles the text extraction and local embedding vectorization completely offline, allowing you to ask questions about your private files without data leaking to external servers.

LM Link (Remote Workloads)

If you have a powerful machine (like a desktop rig with a great GPU) but want to work from a lightweight laptop on your couch, you can turn on LM Link in your settings. It leverages a secure, end-to-end encrypted mesh network (powered by Tailscale) to let you stream your desktop's heavy model processing directly to your laptop as if it were running locally.

See All on GenAI « Previously Next »
Tags: Large Language Models,Generative AI,Agentic AI,

Pages