Showing posts with label Agentic AI. Show all posts
Showing posts with label Agentic AI. Show all posts

Tuesday, June 16, 2026

Attempting to run a local LLM via LM Studio

See All on GenAI    « Previously

Explain these logs... (logs attached at the end of the post)




The key error is this line:

ggml_vulkan: device Vulkan0 does not support 16-bit storage.
...
llama_model_load: error loading model: Unsupported device

Everything else is LM Studio repeatedly trying to load different GGUF models and failing for the same reason.

What's happening

LM Studio is using the Vulkan backend (GPU acceleration via llama.cpp/ggml).

When it starts loading a model, it detects your Vulkan-capable GPU:

MESA-INTEL: warning: Haswell Vulkan support is incomplete

This is already a warning sign. It means you're using an Intel Haswell-generation GPU (around 2013–2015), and Mesa's Vulkan implementation for it is incomplete.

Then:

ggml_vulkan: device Vulkan0 does not support 16-bit storage.

Modern llama.cpp Vulkan kernels require the Vulkan feature:

VK_KHR_16bit_storage

or equivalent 16-bit storage support.

Your GPU/driver does not provide it.

As soon as ggml checks the GPU capabilities, model initialization aborts:

error loading model: Unsupported device

Why every model fails

You tried multiple models:

  • LFM2.5-1.2B-Instruct-Q6_K

  • gemma-3-270m-it-qat-Q4_0

  • LFM2-350M-Q4_0

All fail identically.

That tells us:

✅ Models are not corrupted

✅ Quantization is not the issue

✅ Context size is not the issue

✅ RAM is probably not the issue

❌ GPU backend initialization is failing before model loading completes


The important clues

Clue 1

MESA-INTEL: warning: Haswell Vulkan support is incomplete

Haswell Intel GPUs are:

  • HD 4200

  • HD 4400

  • HD 4600

  • Iris 5100

  • Iris Pro 5200

These GPUs have only experimental Vulkan support through Mesa's ANV driver.


Clue 2

device Vulkan0 does not support 16-bit storage

llama.cpp Vulkan backend requires GPU features that Haswell generally lacks.

Therefore LM Studio cannot use Vulkan on this hardware.


The strange token warning

This warning is unrelated:

control-looking token: 212 '</s>' was not control-type

This comes from the Gemma GGUF metadata.

It's usually harmless and not the reason for failure.


Most likely setup

You're probably running:

  • Linux

  • Intel Haswell integrated graphics

  • Mesa Vulkan driver

  • LM Studio attempting GPU acceleration

The GPU backend is unsupported.


How to verify

Run:

Bash
lspci | grep VGA

and

Bash
vulkaninfo | grep "deviceName"

or:

Bash
glxinfo | grep "OpenGL renderer"

You'll likely see something like:

Intel HD Graphics 4600

or another Haswell GPU.


Fixes

Option 1: Disable GPU acceleration (recommended)

In LM Studio:

  1. Settings

  2. Runtime / Hardware

  3. Disable Vulkan GPU acceleration

  4. Force CPU inference

or set GPU layers to:

0

Then reload the model.

For small models like:

  • 270M

  • 350M

  • 1.2B

CPU inference should work fine.


Option 2: Update Mesa

Check:

Bash
mesa --version

or:

Bash
vulkaninfo

A newer Mesa may expose more features, but Haswell hardware itself generally lacks the required capability, so don't expect this to solve it.


Option 3: Use OpenCL/CPU backend

If LM Studio offers:

  • CPU

  • OpenBLAS

  • CLBlast/OpenCL

use those instead of Vulkan.


Option 4: Use a newer GPU

Any reasonably modern GPU supports the needed Vulkan features:

  • Intel Xe

  • Intel Arc

  • AMD RDNA

  • NVIDIA GTX 10xx and newer


Bottom line

Your models are failing because LM Studio is trying to use the Vulkan backend on an older Intel Haswell GPU, and that GPU does not support Vulkan 16-bit storage, which the llama.cpp Vulkan implementation requires:

ggml_vulkan: device Vulkan0 does not support 16-bit storage

The practical fix is to disable GPU/Vulkan acceleration and run the models on CPU, or use newer graphics hardware.




(base) jain@jain-ThinkPad-T440s:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)




That confirms it.

Your machine has:

Intel Corporation Haswell-ULT Integrated Graphics Controller

On a ThinkPad T440s this is typically one of:

  • Intel HD Graphics 4400

  • Intel HD Graphics 5000

These are 4th-generation Intel GPUs (2013-era Haswell).

Why LM Studio fails

The log showed:

ggml_vulkan: device Vulkan0 does not support 16-bit storage

and

error loading model: Unsupported device

The Vulkan backend used by LM Studio (via llama.cpp/ggml) expects GPU features that Haswell iGPUs do not provide. Even though Mesa exposes Vulkan support, it is incomplete:

MESA-INTEL: warning: Haswell Vulkan support is incomplete

So LM Studio sees a Vulkan device, tries to use it, then aborts when the required feature checks fail.


What you should do

Disable GPU acceleration completely

In LM Studio:

  1. Open Settings

  2. Go to Runtime or Hardware Acceleration

  3. Select CPU instead of Vulkan

  4. Set GPU Offload Layers to 0

  5. Reload the model

For your hardware this is the correct configuration.


Can the T440s still run local models?

Yes, but on CPU.

The T440s usually has:

  • 2-core/4-thread or 2-core/4-thread Haswell-U CPU

  • 8–12 GB RAM (sometimes 16 GB)

Reasonable models:

ModelStatus
Gemma 3 270MExcellent
LFM2 350MExcellent
TinyLlama 1.1BGood
Qwen 2.5 0.5BGood
Qwen 2.5 1.5B Q4Usable
Phi-3 Mini Q4Slow but usable
7B modelsVery slow

Verify LM Studio is using CPU

After changing settings, load a model and check the logs.

You should not see:

ggml_vulkan

or

Vulkan0

Instead you'll see messages mentioning CPU buffers, threads, and model tensors loading successfully.


If LM Studio still insists on Vulkan

Run:

Bash
LIBGGML_VULKAN=0 lmstudio

or

Bash
GGML_VULKAN_DISABLE=1 lmstudio

depending on how LM Studio is installed.

If you're using the AppImage, tell me:

Bash
which lmstudio

or how you installed it (AppImage, .deb, Flatpak, etc.), and I can give the exact command to force CPU mode.




Complete Logs


Developer Logs
2026-06-16 18:43:30 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:30 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:30 [DEBUG]
 0.00.109.877 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:31 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:31 [DEBUG]
 0.00.349.937 E llama_model_load: error loading model: Unsupported device
0.00.349.964 E llama_model_load_from_file_impl: failed to load model
0.00.349.965 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.349.968 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:31 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:43:49 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:43:49 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:43:49 [DEBUG]
 0.00.160.092
2026-06-16 18:43:49 [DEBUG]
 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:43:49 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:43:50 [DEBUG]
 0.00.405.523 E llama_model_load: error loading model: Unsupported device
0.00.405.543 E llama_model_load_from_file_impl: failed to load model
0.00.405.545 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.405.547 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:43:50 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:44:03 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:44:03 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:44:03 [DEBUG]
 0.00.071.932 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
2026-06-16 18:44:03 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:44:03 [DEBUG]
 0.00.269.473 E llama_model_load: error loading model: Unsupported device
0.00.269.499 E llama_model_load_from_file_impl: failed to load model
0.00.269.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf'
0.00.269.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q6_K.gguf': error loading model: Unsupported device
2026-06-16 18:44:03 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.110.637 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:30 [DEBUG]
 0.00.668.125 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:30 [DEBUG]
 0.00.836.575 E llama_model_load: error loading model: Unsupported device
0.00.836.602 E llama_model_load_from_file_impl: failed to load model
0.00.836.603 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.836.606 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:47:58 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:47:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:47:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.107.554 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:47:59 [DEBUG]
 0.00.632.439 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:47:59 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:47:59 [DEBUG]
 0.00.802.706 E llama_model_load: error loading model: Unsupported device
0.00.802.735 E llama_model_load_from_file_impl: failed to load model
0.00.802.737 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.802.739 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:47:59 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:53:59 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:53:59 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:53:59 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.161.414 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:54:00 [DEBUG]
 0.00.813.256
2026-06-16 18:54:00 [DEBUG]
 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:54:00 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:54:00 [DEBUG]
 0.01.092.471 E llama_model_load: error loading model: Unsupported device
0.01.092.498 E llama_model_load_from_file_impl: failed to load model
0.01.092.500 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.01.092.502 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:54:00 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:56:57 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 18:56:58 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:56:58 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.113.876 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:56:58 [DEBUG]
 0.00.664.116 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:56:58 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:56:58 [DEBUG]
 0.00.832.215 E llama_model_load: error loading model: Unsupported device
0.00.832.233 E llama_model_load_from_file_impl: failed to load model
0.00.832.234 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.832.237 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:56:58 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:58:53 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:58:53 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:58:53 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.141.947 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:58:54 [DEBUG]
 0.00.766.698 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:58:54 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:58:54 [DEBUG]
 0.00.952.940 E llama_model_load: error loading model: Unsupported device
0.00.952.974 E llama_model_load_from_file_impl: failed to load model
0.00.952.976 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.952.979 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:58:54 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:22 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:22 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:22 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
0.00.125.143 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:23 [DEBUG]
 0.00.736.472 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:23 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:23 [DEBUG]
 0.00.917.603 E llama_model_load: error loading model: Unsupported device
0.00.917.620 E llama_model_load_from_file_impl: failed to load model
0.00.917.622 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.917.624 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:23 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 18:59:29 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8196 kv_unified=true
2026-06-16 18:59:29 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 18:59:29 [DEBUG]
 Applying legacy swa_full=true default for arch gemma3
2026-06-16 18:59:29 [DEBUG]
 0.00.117.496 I srv    load_model: loading model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
2026-06-16 18:59:30 [DEBUG]
 0.00.682.134 W load: control-looking token:    212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-16 18:59:30 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 18:59:30 [DEBUG]
 0.00.855.008 E llama_model_load: error loading model: Unsupported device
0.00.855.042 E llama_model_load_from_file_impl: failed to load model
0.00.855.044 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf'
0.00.855.046 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/lmstudio-community/gemma-3-270m-it-qat-GGUF/gemma-3-270m-it-qat-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 18:59:30 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:02 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:02 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:02 [DEBUG]
 0.00.092.224 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:02 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:02 [DEBUG]
 0.00.300.094 E llama_model_load: error loading model: Unsupported device
0.00.300.121 E llama_model_load_from_file_impl: failed to load model
0.00.300.123 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.300.125 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:02 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:06 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:06 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:06 [DEBUG]
 0.00.077.805 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:06 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:07 [DEBUG]
 0.00.301.853 E llama_model_load: error loading model: Unsupported device
0.00.301.872 E llama_model_load_from_file_impl: failed to load model
0.00.301.874 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.301.876 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:07 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
2026-06-16 19:02:38 [DEBUG]
 LlamaV4::load called with model path: /home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf
LlamaV4::load config: n_parallel=4 n_ctx=8192 kv_unified=true
2026-06-16 19:02:38 [DEBUG]
 MESA-INTEL: warning: Haswell Vulkan support is incomplete
2026-06-16 19:02:38 [DEBUG]
 0.00.079.694 I srv    load_model: loading model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
2026-06-16 19:02:38 [DEBUG]
 ggml_vulkan: device Vulkan0 does not support 16-bit storage.
2026-06-16 19:02:38 [DEBUG]
 0.00.316.444 E llama_model_load: error loading model: Unsupported device
0.00.316.476 E llama_model_load_from_file_impl: failed to load model
0.00.316.478 E common_init_from_params: failed to load model '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf'
0.00.316.480 E srv    load_model: failed to load model, '/home/jain/.lmstudio/models/LiquidAI/LFM2-350M-GGUF/LFM2-350M-Q4_0.gguf': error loading model: Unsupported device
2026-06-16 19:02:38 [DEBUG]
 [LLMProcess] Failed to load model _0x3f9935 [Error]: Failed to load model.
    at _0x3f7ad2.loadModel (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:562652)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async _0x3f7ad2.handleMessage (/opt/LM-Studio/resources/app/.webpack/lib/llmworker.js:1:554788) {
  cause: 'Failed to load model',
  suggestion: undefined,
  errorData: undefined,
  data: undefined,
  displayData: undefined,
  title: 'Failed to load model.'
}
See All on GenAI    « Previously
Tags: Generative AI,Agentic AI,

Day out with LM Studio (for running local LLMs)

See All on GenAI    « Previously    Next »

LM Studio is widely considered the absolute gold standard for running local LLMs if you prefer a clean, visual interface over a terminal window. It abstracts away all the complex command-line arguments of tools like llama.cpp while still giving you deep developer controls under the hood.

Setting it up and getting your first model running takes less than 10 minutes.

1. System Check (What Fits?)

Before downloading a massive model that locks up your computer, check your hardware specs. LM Studio relies heavily on VRAM (GPU Memory), with system RAM as a fallback.

Total Available VRAM Recommended Model Size Best Quantization Format
8 GB 7B - 8B models (e.g., Llama 3 8B) Q4_K_M (Practical baseline)
12 GB - 16 GB 12B - 14B models (e.g., Gemma 4 12B, Qwen 3.6 14B) Q4_K_M or Q6_K
24 GB 32B - 35B models (e.g., Qwen 3.6 35B MoE) Q4_K_M or Q6_K (The sweet spot)
48 GB+ 70B+ models Full 8-bit (Q8_0) or unquantized (BF16)

💡 Apple Silicon Note: If you are running an M-series Mac, LM Studio automatically defaults to Apple's MLX runtime. Because Mac uses unified memory, your system RAM handles the heavy lifting directly.

2. Step-by-Step Setup Guide

1
Download and Install
~2 minutes

Go to lmstudio.ai and download the installer matching your OS (Windows x64/ARM, macOS M-series, or Linux AppImage). Run the installer to open the GUI.

2
Discover and Download a Model
~3-5 minutes

Click the Search/Discover icon (Magnifying Glass) on the left sidebar. Type in a popular open model like Gemma 4 12B or Qwen 3.6 Coder.

LM Studio will display a list of available Hugging Face files. Look for the green rocket icon next to the files—this indicates the model quantization will comfortably fit your hardware profile. Click Download.

3
Configure Your Hardware Engine
~1 minute

Head to the AI Chat view (Bubble icon) and look at the right-hand settings panel. Under Hardware Settings, select your runtime engine:

  • NVIDIA: Choose CUDA 12 llama.cpp.

  • Apple Silicon: Leave it on MLX.

  • AMD/Intel GPU: Choose Vulkan llama.cpp.

  • CPU Only: Choose CPU llama.cpp (if you don't have a dedicated GPU).

4
Adjust GPU Offload and Context
~1 minute

If you're using a discrete GPU (like NVIDIA), locate the GPU Offload slider. Toggle it to Max to push as many layers of the model into your VRAM as possible.

Set your Context Length next (start with 4096 or 8192 tokens). Higher context lengths use exponentially more VRAM.

5
Load and Chat
Instant

At the very top of the window, click the "Select a model to load" dropdown and select your downloaded model. Once the progress bar fills, type your prompt in the bottom text box and enjoy 100% private, offline AI.

3. Power-User Features to Explore Later

Once you have basic chat working, LM Studio has major features designed for software development and local workflows:

Local OpenAI-Compatible Server

Click the Developer tab (Code brackets icon) on the left menu. Here, you can click Start Server to spin up a local API endpoint on localhost:1234. Because it is fully OpenAI-compatible, you can drop this endpoint straight into developer setups, IDE extensions (like Continue or VS Code Copilot alternatives), or local scripts using the standard OpenAI SDK format:

Python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="local-model", # It automatically targets whatever model is currently loaded
    messages=[{"role": "user", "content": "Write a quick Python sort algorithm."}]
)
print(response.choices[0].message.content)

Chat with Documents (Local RAG)

You can attach local text files, PDFs, or code repositories directly into your chat. LM Studio handles the text extraction and local embedding vectorization completely offline, allowing you to ask questions about your private files without data leaking to external servers.

LM Link (Remote Workloads)

If you have a powerful machine (like a desktop rig with a great GPU) but want to work from a lightweight laptop on your couch, you can turn on LM Link in your settings. It leverages a secure, end-to-end encrypted mesh network (powered by Tailscale) to let you stream your desktop's heavy model processing directly to your laptop as if it were running locally.

See All on GenAI    « Previously    Next »
Tags: Large Language Models,Generative AI,Agentic AI,