LM Studio

LM Studio

GUI-first local model runner. Downloads and runs open-weight models via a desktop interface. Targets people who want local inference without command-line setup. Complementary to ollama, not a direct replacement — they serve different workflows.

Current version: 0.4.15. Available for Apple Silicon macOS, x64/ARM64 Windows, and x64 Linux.

Core Features

  • Model browser: Search and download from Hugging Face directly in the UI
  • Chat interface: Built-in chat UI for interactive use
  • Local server: OpenAI-compatible API server at http://localhost:1234/v1 — base URL swap makes it compatible with most tooling
  • GGUF support: Runs quantized models in GGUF format, the standard for local inference
  • Headless CLI: llmster provides a CLI interface for headless/server use cases

Vs. Ollama

LM Studio Ollama
Interface GUI + API CLI + API
Model discovery In-app browser Pull by name
Scripting Awkward Native
Setup ease Easier for non-CLI users Easier for developers
Customization Limited More configurable

Note: LM Studio's API runs on port 1234 (http://localhost:1234/v1), not 11434 — that's ollama's port. Keep this straight when configuring tooling.

Choose LM Studio when: you're evaluating models interactively and want a visual interface. Choose ollama when: you're building pipelines or need reliable scripting integration.

Practical Use

Good for: testing model capabilities before committing to a choice, comparing responses from multiple models side-by-side, non-technical users who need local inference, macOS with Apple Silicon (LM Studio has good MLX support for Apple Silicon).

The local API server makes it usable as an ollama substitute in scripts if you prefer the GUI for management:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

Hardware Notes

Same hardware requirements as Ollama — VRAM determines what you can run. LM Studio does a good job surfacing model requirements before download and warns if a model won't fit in available VRAM.

Apple Silicon support is notably good — the unified memory architecture (up to 192GB on M3 Ultra) lets you run larger models than discrete GPU setups with equivalent VRAM. LM Studio uses Apple's MLX framework on Apple Silicon for optimized inference.

Strengths

  • Lowest friction for getting started with local models
  • Good model discovery UI
  • Strong macOS/Apple Silicon optimization (MLX)
  • OpenAI-compatible API

Weaknesses

  • Not scriptable for pipelines — use Ollama for that
  • GUI adds overhead that doesn't make sense for headless/server use
  • Slower to update with new model support compared to Ollama

Related

ollama · gemma-4 · deepseek-r1 · rag

Sources