Showing posts with label Large Language Models. Show all posts
Showing posts with label Large Language Models. Show all posts

Tuesday, June 16, 2026

Day out with LM Studio (for running local LLMs)

See All on GenAI    « Previously    Next »

LM Studio is widely considered the absolute gold standard for running local LLMs if you prefer a clean, visual interface over a terminal window. It abstracts away all the complex command-line arguments of tools like llama.cpp while still giving you deep developer controls under the hood.

Setting it up and getting your first model running takes less than 10 minutes.

1. System Check (What Fits?)

Before downloading a massive model that locks up your computer, check your hardware specs. LM Studio relies heavily on VRAM (GPU Memory), with system RAM as a fallback.

Total Available VRAM Recommended Model Size Best Quantization Format
8 GB 7B - 8B models (e.g., Llama 3 8B) Q4_K_M (Practical baseline)
12 GB - 16 GB 12B - 14B models (e.g., Gemma 4 12B, Qwen 3.6 14B) Q4_K_M or Q6_K
24 GB 32B - 35B models (e.g., Qwen 3.6 35B MoE) Q4_K_M or Q6_K (The sweet spot)
48 GB+ 70B+ models Full 8-bit (Q8_0) or unquantized (BF16)

💡 Apple Silicon Note: If you are running an M-series Mac, LM Studio automatically defaults to Apple's MLX runtime. Because Mac uses unified memory, your system RAM handles the heavy lifting directly.

2. Step-by-Step Setup Guide

1
Download and Install
~2 minutes

Go to lmstudio.ai and download the installer matching your OS (Windows x64/ARM, macOS M-series, or Linux AppImage). Run the installer to open the GUI.

2
Discover and Download a Model
~3-5 minutes

Click the Search/Discover icon (Magnifying Glass) on the left sidebar. Type in a popular open model like Gemma 4 12B or Qwen 3.6 Coder.

LM Studio will display a list of available Hugging Face files. Look for the green rocket icon next to the files—this indicates the model quantization will comfortably fit your hardware profile. Click Download.

3
Configure Your Hardware Engine
~1 minute

Head to the AI Chat view (Bubble icon) and look at the right-hand settings panel. Under Hardware Settings, select your runtime engine:

  • NVIDIA: Choose CUDA 12 llama.cpp.

  • Apple Silicon: Leave it on MLX.

  • AMD/Intel GPU: Choose Vulkan llama.cpp.

  • CPU Only: Choose CPU llama.cpp (if you don't have a dedicated GPU).

4
Adjust GPU Offload and Context
~1 minute

If you're using a discrete GPU (like NVIDIA), locate the GPU Offload slider. Toggle it to Max to push as many layers of the model into your VRAM as possible.

Set your Context Length next (start with 4096 or 8192 tokens). Higher context lengths use exponentially more VRAM.

5
Load and Chat
Instant

At the very top of the window, click the "Select a model to load" dropdown and select your downloaded model. Once the progress bar fills, type your prompt in the bottom text box and enjoy 100% private, offline AI.

3. Power-User Features to Explore Later

Once you have basic chat working, LM Studio has major features designed for software development and local workflows:

Local OpenAI-Compatible Server

Click the Developer tab (Code brackets icon) on the left menu. Here, you can click Start Server to spin up a local API endpoint on localhost:1234. Because it is fully OpenAI-compatible, you can drop this endpoint straight into developer setups, IDE extensions (like Continue or VS Code Copilot alternatives), or local scripts using the standard OpenAI SDK format:

Python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="local-model", # It automatically targets whatever model is currently loaded
    messages=[{"role": "user", "content": "Write a quick Python sort algorithm."}]
)
print(response.choices[0].message.content)

Chat with Documents (Local RAG)

You can attach local text files, PDFs, or code repositories directly into your chat. LM Studio handles the text extraction and local embedding vectorization completely offline, allowing you to ask questions about your private files without data leaking to external servers.

LM Link (Remote Workloads)

If you have a powerful machine (like a desktop rig with a great GPU) but want to work from a lightweight laptop on your couch, you can turn on LM Link in your settings. It leverages a secure, end-to-end encrypted mesh network (powered by Tailscale) to let you stream your desktop's heavy model processing directly to your laptop as if it were running locally.

See All on GenAI    « Previously    Next »
Tags: Large Language Models,Generative AI,Agentic AI,

Saturday, June 13, 2026

Using DeepSeek With GitHub Copilot

See All on GenAI    « Previously    Next »



Important Links


DeepSeek API Docs
DeepSeek V4 for Copilot Chat (By: Vizards)
Integrate with GitHub Copilot
Awesome DeepSeek Integrations

*** IMPORTANT ***

Optional: Vision Support DeepSeek V4 is text-only, but the extension handles images automatically. Drop a screenshot into chat and it proxies through another installed Copilot model (Claude, GPT-4o) to describe the image before sending to DeepSeek. Run DeepSeek: Set Vision Proxy Model to pick which model handles image descriptions.

Help From Gemini

I don't have any other model for vision proxy to use with DeepSeek in GitHub Copilot. How can I use screenshot with DeepSeek V4 in GitHub Copilot?

Because GitHub Copilot recently integrated DeepSeek V4 (both Pro and Flash models) directly into its ecosystem via the official "DeepSeek V4 for Copilot Chat" extension, there is a built-in architectural catch: DeepSeek V4 is entirely text-only, but Copilot handles screenshots using a transparent proxy system.

If you are using this extension or an OpenAI-compatible third-party bridge, you do not need a separate, external vision proxy model like your own Claude or OpenAI API key. GitHub Copilot uses its own built-in models (like GPT-4o or Claude 3.5 Sonnet) to act as that proxy entirely for free behind the scenes.

If it isn't working or you are getting an error (like unknown variant image_url or the screenshot is simply ignored), use the following steps to properly configure and use screenshots with DeepSeek V4 in Copilot:

The Setup Method

1
Verify Extension Prerequisites
Required version

Ensure your VS Code is updated to the latest version (1.116 or later) and your GitHub Copilot extension is active. Open Copilot Chat and ensure you can see DeepSeek V4 Pro or DeepSeek V4 Flash in your model picker.

2
Configure the Built-in Copilot Vision Proxy
Command Palette

You don't need an external API key for vision. Open the Command Palette (Cmd+Shift+P on Mac or Ctrl+Shift+P on Windows) and run:

DeepSeek: Set Vision Proxy Model

3
Select an Installed Copilot Model
Choose fallback

A dropdown will appear listing the native models your GitHub Copilot subscription already provides (e.g., gpt-4o or claude-3.5-sonnet). Pick one. This does not change your main chat model; it simply tells Copilot which of its native models to use to look at your screenshot.

4
Paste and Reference the Screenshot
In the Chat Window

Drag and drop your screenshot into the Copilot Chat panel, or copy the image and press Ctrl+V / Cmd+V. Add your text prompt (e.g., "Fix the styling issue in this screenshot") and hit enter.

How it Works Behind the Scenes

When you drop a screenshot into the chat while DeepSeek V4 is active, the workflow bypasses DeepSeek's text-only limitation seamlessly:

  1. The Interception: Copilot intercepts the image file before it hits the DeepSeek API.

  2. The Description: It routes the image directly to the local native Copilot model you selected in Step 3 (like GPT-4o).

  3. The Text Injection: That model generates a highly detailed text description of what is happening in the screenshot (UI layout, error message, code snippet).

  4. The DeepSeek Execution: Copilot invisibly appends that text description to your original prompt and forwards it to DeepSeek V4. DeepSeek processes the instructions as if it "saw" the image.

Troubleshooting Tip: If you encounter a 400 Error (unknown variant image_url) when pasting a screenshot, it means your VS Code extension is trying to pass the raw image straight to DeepSeek's text endpoint. Update the DeepSeek V4 for Copilot Chat extension to the newest patch release, which fixes the format translation layer.


See All on GenAI    « Previously
   Next » Tags: Agentic AI,Generative AI,Large Language Models,

Friday, December 26, 2025

Accenture Skill Proficiency Test - Large Language Models - Dec 2025


See All: Miscellaneous Interviews @ Accenture

📘 Accenture Proficiency Test on LLMs


Q1. Few-shot Learning with GPT-3

Question (Cleaned)

You are developing a large language model using GPT-3 and want to apply few-shot learning techniques. You have a limited dataset for a specific task and want the model to generalize well. Which approach would be most effective?

Options:
a) Train the model on the entire dataset, then fine-tune on a small subset
b) Provide examples of the task in the input and let the model generate
c) LLMs are unable to handle single tasks

Correct Answer

b) Provide examples of the task in the input and let the model generate

💡 Hint

  • Few-shot learning = prompt engineering

  • No retraining required

  • You show the task via examples in the prompt


Q2. Edge AI with LLMs

Question

You are using an LLM for an edge AI application requiring real-time object detection. Which approach is most efficient?

Options:
a) Cloud-based LLM processing
b) Use a complex LLM regardless of compute
c) Use an LLM optimized for edge devices balancing accuracy and efficiency
d) Store data and process later
e) Manual input-based LLM

Correct Answer

c) Use an LLM optimized for edge devices

💡 Hint

  • Edge AI prioritizes low latency + low compute

  • Cloud = latency bottleneck

  • “Optimized” is the keyword


Q3. Improving Fine-tuned LLM Performance (Select Two)

Question

You fine-tuned a pre-trained LLM but performance is poor. What steps improve it?

Options:
a) Gather more annotated Q&A data with better supervision
b) Change architecture to Mixture of Experts / Mixture of Tokens
c) Simplify the task definition
d) Smaller chunks reduce retrieval complexity
e) Smaller chunks improve generation accuracy

Correct Answers

a) Gather more annotated data
c) Simplify the task definition

💡 Hint

  • First fix data quality & task framing

  • Architecture changes come later

  • Accenture favors practical ML hygiene


Q4. Chunking in RAG Systems

Question

Why do smaller chunks improve a RAG pipeline?

Correct Statements:
✔ Smaller chunks reduce retrieval complexity
✔ Smaller chunks improve generation accuracy

💡 Hint

  • Retrieval works better with semantic focus

  • Too large chunks dilute meaning


Q5. Challenges of Local LLMs in Chatbots

Question

What is a potential challenge local LLMs face in long-term task planning?

Options:
a) Unable to adjust plans when errors occur
b) Unable to handle complex tasks
c) Unable to handle multiple queries
d) Unable to use task decomposition

Correct Answer

a) Unable to adjust plans when errors occur

💡 Hint

  • Local models lack persistent memory & feedback loops

  • Planning correction is the real limitation


Q6. RAG Pipeline – Poor Semantic Representation

Question

Why might embeddings not represent semantic meaning correctly?

Options:
a) Encoder not trained on similar data
b) Text chunks too large
c) Encoder incompatible with RAG
d) Incorrect chunk splitting
e) Encoder not initialized

Correct Answers

a) Domain mismatch in training data
b) Chunk size too large

💡 Hint

  • Embeddings fail due to domain shift or context overflow

  • Initialization issues are rare in practice


Q7. Designing Advanced RAG – Chunking Decision

Question

Which is NOT a valid reason for splitting documents into smaller chunks?

Options:
a) Large chunks are harder to search
b) Small chunks won’t fit in context window
c) Smaller chunks improve indexing efficiency

Correct Answer

b) Small chunks won’t fit in context window

💡 Hint

  • Smaller chunks fit better, not worse

  • This is a classic reverse-logic trap


Q8. Intent in Chatbots

Question

What is the purpose of an intent in a chatbot?

Correct Answer

To determine the user’s goal

💡 Hint

  • Intent ≠ entity

  • Intent answers “what does the user want?”


Q9. Healthcare LLM Security

Question

Which strategy best ensures privacy and compliance for patient data?

Options:
a) Public API & public cloud
b) Layered security: encryption, access control, audits, private network
c) No security changes
d) Plaintext storage
e) Unverified 3rd party services

Correct Answer

b) Layered security approach

💡 Hint

  • Healthcare = defense in depth

  • Accenture loves encryption + audits + private infra


Q10. Edge AI Programming Language

Question

Which language is commonly used for developing ML models in Edge AI?

Options:

  • Java

  • Python

  • C++

  • JavaScript

  • Ruby

Correct Answer

Python

💡 Hint

  • Python dominates ML tooling

  • C++ is used for deployment, not modeling


Q11. Customizing METEOR Scoring

Question

How do you customize METEOR’s scoring function?

Correct Answer

Modify tool configuration or run with command-line flags

💡 Hint

  • METEOR supports custom weighting

  • Not hardcoded, no paid version needed


Q12. Bias Mitigation in LLMs

Question

First step when an LLM is found biased?

Correct Answer

Identify the source of bias

💡 Hint

  • Diagnosis before correction

  • Retraining comes later


Q13. DeepEval Tool – Advanced Features

Question

Which statement is correct about DeepEval advanced usage?

Correct Answer

Configure advanced features in Python scripts

💡 Hint

  • Tool-level configuration

  • No paid version needed


Tags: Interview Preparation,Generative AI,Large Language Models,