# app
# Welcome to LM Studio Docs!
> Learn how to run Llama, DeepSeek, Qwen, Phi, and other LLMs locally with LM Studio.
To get LM Studio, head over to the [Downloads page](/download) and download an installer for your operating system.
LM Studio is available for macOS, Windows, and Linux.
## What can I do with LM Studio?
1. Download and run local LLMs like gpt-oss or Llama, Qwen
2. Use a simple and flexible chat interface
3. Connect MCP servers and use them with local models
4. Search & download functionality (via Hugging Face š¤)
5. Serve local models on OpenAI-like endpoints, locally and on the network
6. Manage your local models, prompts, and configurations
## System requirements
LM Studio generally supports Apple Silicon Macs, x64/ARM64 Windows PCs, and x64 Linux PCs.
Consult the [System Requirements](app/system-requirements) page for more detailed information.
## Run llama.cpp (GGUF) or MLX models
LM Studio supports running LLMs on Mac, Windows, and Linux using [`llama.cpp`](https://github.com/ggerganov/llama.cpp).
On Apple Silicon Macs, LM Studio also supports running LLMs using Apple's [`MLX`](https://github.com/ml-explore/mlx).
To install or manage LM Runtimes, press `ā` `Shift` `R` on Mac or `Ctrl` `Shift` `R` on Windows/Linux.
## LM Studio as an MCP client
You can install MCP servers in LM Studio and use them with your local models.
See the docs for more: [Use MCP server](/docs/app/plugins/mcp).
If you're develping an MCP server, check out [Add to LM Studio Button](/docs/app/plugins/mcp/deeplink).
## Run an LLM like `gpt-oss`, `Llama`, `Qwen`, `Mistral`, or `DeepSeek R1` on your computer
To run an LLM on your computer you first need to download the model weights.
You can do this right within LM Studio! See [Download an LLM](app/basics/download-model) for guidance.
## Chat with documents entirely offline on your computer
You can attach documents to your chat messages and interact with them entirely offline, also known as "RAG".
Read more about how to use this feature in the [Chat with Documents](app/basics/rag) guide.
## Use LM Studio's API from your own apps and scripts
LM Studio provides a REST API that you can use to interact with your local models from your own apps and scripts.
- [OpenAI Compatibility API](api/openai-api)
- [LM Studio REST API (beta)](api/rest-api)
## Community
Join the LM Studio community on [Discord](https://discord.gg/aPQfnNkxGC) to ask questions, share knowledge, and get help from other users and the LM Studio team.
## System Requirements
> Supported CPU, GPU types for LM Studio on Mac (M1/M2/M3/M4), Windows (x64/ARM), and Linux (x64/ARM64)
## macOS
- Chip: Apple Silicon (M1/M2/M3/M4).
- macOS 13.4 or newer is required.
- For MLX models, macOS 14.0 or newer is required.
- 16GB+ RAM recommended.
- You may still be able to use LM Studio on 8GB Macs, but stick to smaller models and modest context sizes.
- Intel-based Macs are currently not supported. Chime in [here](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/9) if you are interested in this.
## Windows
LM Studio is supported on both x64 and ARM (Snapdragon X Elite) based systems.
- CPU: AVX2 instruction set support is required (for x64)
- RAM: LLMs can consume a lot of RAM. At least 16GB of RAM is recommended.
- GPU: at least 4GB of dedicated VRAM is recommended.
## Linux
LM Studio is supported on both x64 and ARM64 (aarch64) based systems.
- LM Studio for Linux is distributed as an AppImage.
- Ubuntu 20.04 or newer is required
- Ubuntu versions newer than 22 are not well tested. Let us know if you're running into issues by opening a bug [here](https://github.com/lmstudio-ai/lmstudio-bug-tracker).
- CPU:
- On x64, LM Studio ships with AVX2 support by default
## Offline Operation
> LM Studio can operate entirely offline, just make sure to get some model files first.
```lms_notice
In general, LM Studio does not require the internet in order to work. This includes core functions like chatting with models, chatting with documents, or running a local server, none of which require the internet.
```
### Operations that do NOT require connectivity
#### Using downloaded LLMs
Once you have an LLM onto your machine, the model will run locally and you should be good to go entirely offline. Nothing you enter into LM Studio when chatting with LLMs leaves your device.
#### Chatting with documents (RAG)
When you drag and drop a document into LM Studio to chat with it or perform RAG, that document stays on your machine. All document processing is done locally, and nothing you upload into LM Studio leaves the application.
#### Running a local server
LM Studio can be used as a server to provide LLM inferencing on localhost or the local network. Requests to LM Studio use OpenAI endpoints and return OpenAI-like response objects, but stay local.
### Operations that require connectivity
Several operations, described below, rely on internet connectivity. Once you get an LLM onto your machine, you should be good to go entirely offline.
#### Searching for models
When you search for models in the Discover tab, LM Studio makes network requests (e.g. to huggingface.co). Search will not work without internet connection.
#### Downloading new models
In order to download models you need a stable (and decently fast) internet connection. You can also 'sideload' models (use models that were procured outside the app). See instructions for [sideloading models](/docs/advanced/sideload).
#### Discover tab's model catalog
Any given version of LM Studio ships with an initial model catalog built-in. The entries in the catalog are typically the state of the online catalog near the moment we cut the release. However, in order to show stats and download options for each model, we need to make network requests (e.g. to huggingface.co).
#### Downloading runtimes
[LM Runtimes](advanced/lm-runtimes) are individually packaged software libraries, or LLM engines, that allow running certain formats of models (e.g. `llama.cpp`). As of LM Studio 0.3.0 (read the [announcement](https://lmstudio.ai/blog/lmstudio-v0.3.0)) it's easy to download and even hot-swap runtimes without a full LM Studio update. To check for available runtimes, and to download them, we need to make network requests.
#### Checking for app updates
On macOS and Windows, LM Studio has a built-in app updater that's capable. The linux in-app updater [is in the works](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/89). When you open LM Studio, the app updater will make a network request to check if there are any new updates available. If there's a new version, the app will show you a notification to update now or later.
Without internet connectivity you will not be able to update the app via the in-app updater.
## basics
## Get started with LM Studio
> Download and run Large Language Models like Qwen, Mistral, Gemma, or gpt-oss in LM Studio.
Double check computer meets the minimum [system requirements](/docs/system-requirements).
```lms_info
You might sometimes see terms such as `open-source models` or `open-weights models`. Different models might be released under different licenses and varying degrees of 'openness'. In order to run a model locally, you need to be able to get access to its "weights", often distributed as one or more files that end with `.gguf`, `.safetensors` etc.
```
## Getting up and running
First, **install the latest version of LM Studio**. You can get it from [here](/download).
Once you're all set up, you need to **download your first LLM**.
### 1. Download an LLM to your computer
Head over to the Discover tab to download models. Pick one of the curated options or search for models by search query (e.g. `"Llama"`). See more in-depth information about downloading models [here](/docs/basics/download-models).
### 2. Load a model to memory
Head over to the **Chat** tab, and
1. Open the model loader
2. Select one of the models you downloaded (or [sideloaded](/docs/advanced/sideload)).
3. Optionally, choose load configuration parameters.
##### What does loading a model mean?
Loading a model typically means allocating memory to be able to accommodate the model's weights and other parameters in your computer's RAM.
### 3. Chat!
Once the model is loaded, you can start a back-and-forth conversation with the model in the Chat tab.
### Community
Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Manage chats
> Manage conversation threads with LLMs
LM Studio has a ChatGPT-like interface for chatting with local LLMs. You can create many different conversation threads and manage them in folders.
### Create a new chat
You can create a new chat by clicking the "+" button or by using a keyboard shortcut: `ā` + `N` on Mac, or `ctrl` + `N` on Windows / Linux.
### Create a folder
Create a new folder by clicking the new folder button or by pressing: `ā` + `shift` + `N` on Mac, or `ctrl` + `shift` + `N` on Windows / Linux.
### Drag and drop
You can drag and drop chats in and out of folders, and even drag folders into folders!
### Duplicate chats
You can duplicate a whole chat conversation by clicking the `ā¢ā¢ā¢` menu and selecting "Duplicate". If the chat has any files in it, they will be duplicated too.
## FAQ
#### Where are chats stored in the file system?
Right-click on a chat and choose "Reveal in Finder" / "Show in File Explorer".
Conversations are stored in JSON format. It is NOT recommended to edit them manually, nor to rely on their structure.
#### Does the model learn from chats?
The model doesn't 'learn' from chats. The model only 'knows' the content that is present in the chat or is provided to it via configuration options such as the "system prompt".
## Conversations folder filesystem path
Mac / Linux:
```shell
~/.lmstudio/conversations/
```
Windows:
```ps
%USERPROFILE%\.lmstudio\conversations
```
### Community
Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Download an LLM
> Discover and download supported LLMs in LM Studio
LM Studio comes with a built-in model downloader that let's you download any supported model from [Hugging Face](https://huggingface.co).
### Searching for models
You can search for models by keyword (e.g. `llama`, `gemma`, `lmstudio`), or by providing a specific `user/model` string. You can even insert full Hugging Face URLs into the search bar!
###### Pro tip: you can jump to the Discover tab from anywhere by pressing `ā` + `2` on Mac, or `ctrl` + `2` on Windows / Linux.
### Which download option to choose?
You will often see several options for any given model named things like `Q3_K_S`, `Q_8` etc. These are all copies of the same model, provided in varying degrees of fidelity. The `Q` represents a technique called "Quantization", which roughly means compressing model files in size, while giving up some degree of quality.
Choose a 4-bit option or higher if your machine is capable enough for running it.
`Advanced`
### Changing the models directory
You can change the models directory by heading to My Models
### Community
Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Chat with Documents
> How to provide local documents to an LLM as additional context
You can attach document files (`.docx`, `.pdf`, `.txt`) to chat sessions in LM Studio.
This will provide additional context to LLMs you chat with through the app.
## Terminology
- **Retrieval**: Identifying relevant portion of a long source document
- **Query**: The input to the retrieval operation
- **RAG**: Retrieval-Augmented Generation\*
- **Context**: the 'working memory' of an LLM. Has a maximum size
###### \* In this context, 'Generation' means the output of the LLM.
###### Context sizes are measured in "tokens". One token is often about 3/4 of a word.
## RAG vs. Full document 'in context'
If the document is short enough (i.e., if it fits in the model's context), LM Studio will add the file contents to the conversation in full. This is particularly useful for models that support longer context sizes such as Meta's Llama 3.1 and Mistral Nemo.
If the document is very long, LM Studio will opt into using "Retrieval Augmented Generation", frequently referred to as "RAG". RAG means attempting to fish out relevant bits of a very long document (or several documents) and providing them to the model for reference. This technique sometimes works really well, but sometimes it requires some tuning and experimentation.
## Tip for successful RAG
provide as much context in your query as possible. Mention terms, ideas, and words you expect to be in the relevant source material. This will often increase the chance the system will provide useful context to the LLM. As always, experimentation is the best way to find what works best.
## mcp
## Use MCP Servers
> Connect MCP servers to LM Studio
Starting LM Studio 0.3.17, LM Studio acts as an **Model Context Protocol (MCP) Host**. This means you can connect MCP servers to the app and make them available to your models.
### Be cautious
Never install MCPs from untrusted sources.
```lms_warning
Some MCP servers can run arbitrary code, access your local files, and use your network connection. Always be cautious when installing and using MCP servers. If you don't trust the source, don't install it.
```
# Use MCP servers in LM Studio
Starting 0.3.17 (b10), LM Studio supports both local and remote MCP servers. You can add MCPs by editing the app's `mcp.json` file or via the ["Add to LM Studio" Button](mcp/deeplink), when available. LM Studio currently follows Cursor's `mcp.json` notation.
## Install new servers: `mcp.json`
Switch to the "Program" tab in the right hand sidebar. Click `Install > Edit mcp.json`.
This will open the `mcp.json` file in the in-app editor. You can add MCP servers by editing this file.
### Example MCP to try: Hugging Face MCP Server
This MCP server provides access to functions like model and dataset search.
```json
{
"mcpServers": {
"hf-mcp-server": {
"url": "https://huggingface.co/mcp",
"headers": {
"Authorization": "Bearer "
}
}
}
}
```
###### You will need to replace `` with your actual Hugging Face token. Learn more [here](https://huggingface.co/docs/hub/en/security-tokens).
Use the [deeplink button](mcp/deeplink), or copy the JSON snippet above and paste it into your `mcp.json` file.
---
## Gotchas and Troubleshooting
- Never install MCP servers from untrusted sources. Some MCPs can have far reaching access to your system.
- Some MCP servers were designed to be used with Claude, ChatGPT, Gemini and might use excessive amounts of tokens.
- Watch out for this. It may quickly bog down your local model and trigger frequent context overflows.
- When adding MCP servers manually, copy only the content after `"mcpServers": {` and before the closing `}`.
### `Add to LM Studio` Button
> Add MCP servers to LM Studio using a deeplink
You can install MCP servers in LM Studio with one click using a deeplink.
Starting with version 0.3.17 (10), LM Studio can act as an MCP host. Learn more about it [here](../mcp).
---
# Generate your own MCP install link
Enter your MCP JSON entry to generate a deeplink for the `Add to LM Studio` button.
```lms_mcp_deep_link_generator
```
## Try an example
Try to copy and paste the following into the link generator above.
```json
{
"hf-mcp-server": {
"url": "https://huggingface.co/mcp",
"headers": {
"Authorization": "Bearer "
}
}
}
```
### Deeplink format
```bash
lmstudio://add_mcp?name=hf-mcp-server&config=eyJ1cmwiOiJodHRwczovL2h1Z2dpbmdmYWNlLmNvL21jcCIsImhlYWRlcnMiOnsiQXV0aG9yaXphdGlvbiI6IkJlYXJlciA8WU9VUl9IRl9UT0tFTj4ifX0%3D
```
#### Parameters
```lms_params
- name: "lmstudio://"
type: "protocol"
description: "The protocol scheme to open LM Studio"
- name: "add_mcp"
type: "path"
description: "The action to install an MCP server"
- name: "name"
type: "query parameter"
description: "The name of the MCP server to install"
- name: "config"
type: "query parameter"
description: "Base64 encoded JSON configuration for the MCP server"
```
## modelyaml
## Introduction to `model.yaml`
> Describe models with the cross-platform `model.yaml` specification.
`Draft`
[`model.yaml`](https://modelyaml.org) describes a model and all of its variants in a single portable file. Models in LM Studio's [model catalog](https://lmstudio.ai/models) are all implemented using model.yaml.
This allows abstracting away the underlying format (GGUF, MLX, etc) and presenting a single entry point for a given model. Furthermore, the model.yaml file supports baking in additional metadata, load and inference options, and even custom logic (e.g. enable/disable thinking).
**You can clone existing model.yaml files on the LM Studio Hub and even [publish your own](./modelyaml/publish)!**
## Core fields
### `model`
The canonical identifier in the form `publisher/model`.
```yaml
model: qwen/qwen3-8b
```
### `base`
Points to the "concrete" model files or other virtual models. Each entry uses a unique `key` and one or more `sources` from which the file can be fetched.
The snippet below demonstrates a case where the model (`qwen/qwen3-8b`) can resolve to one of 3 different concrete models.
```yaml
model: qwen/qwen3-8b
base:
- key: lmstudio-community/qwen3-8b-gguf
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-GGUF
- key: lmstudio-community/qwen3-8b-mlx-4bit
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-MLX-4bit
- key: lmstudio-community/qwen3-8b-mlx-8bit
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-MLX-8bit
```
Concrete model files refer to the actual weights.
### `metadataOverrides`
Overrides the base model's metadata. This is useful for presentation purposes, for example in LM Studio's model catalog or in app model search. It is not used for any functional changes to the model.
```yaml
metadataOverrides:
domain: llm
architectures:
- qwen3
compatibilityTypes:
- gguf
- safetensors
paramsStrings:
- 8B
minMemoryUsageBytes: 4600000000
contextLengths:
- 40960
vision: false
reasoning: true
trainedForToolUse: true
```
### `config`
Use this to "bake in" default runtime settings (such as sampling parameters) and even load time options.
This works similarly to [Per Model Defaults](/docs/app/advanced/per-model).
- `operation:` inference time parameters
- `load:` load time parameters
```yaml
config:
operation:
fields:
- key: llm.prediction.topKSampling
value: 20
- key: llm.prediction.temperature
value: 0.7
load:
fields:
- key: llm.load.contextLength
value: 42690
```
### `customFields`
Define model-specific custom fields.
```yaml
customFields:
- key: enableThinking
displayName: Enable Thinking
description: Controls whether the model will think before replying
type: boolean
defaultValue: true
effects:
- type: setJinjaVariable
variable: enable_thinking
```
In order for the above example to work, the jinja template needs to have a variable named `enable_thinking`.
## Complete example
Taken from https://lmstudio.ai/models/qwen/qwen3-8b
```yaml
# model.yaml is an open standard for defining cross-platform, composable AI models
# Learn more at https://modelyaml.org
model: qwen/qwen3-8b
base:
- key: lmstudio-community/qwen3-8b-gguf
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-GGUF
- key: lmstudio-community/qwen3-8b-mlx-4bit
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-MLX-4bit
- key: lmstudio-community/qwen3-8b-mlx-8bit
sources:
- type: huggingface
user: lmstudio-community
repo: Qwen3-8B-MLX-8bit
metadataOverrides:
domain: llm
architectures:
- qwen3
compatibilityTypes:
- gguf
- safetensors
paramsStrings:
- 8B
minMemoryUsageBytes: 4600000000
contextLengths:
- 40960
vision: false
reasoning: true
trainedForToolUse: true
config:
operation:
fields:
- key: llm.prediction.topKSampling
value: 20
- key: llm.prediction.minPSampling
value:
checked: true
value: 0
customFields:
- key: enableThinking
displayName: Enable Thinking
description: Controls whether the model will think before replying
type: boolean
defaultValue: true
effects:
- type: setJinjaVariable
variable: enable_thinking
```
The [GitHub specification](https://github.com/modelyaml/modelyaml) contains further details and the latest schema.
### Publish a `model.yaml`
> Upload your model definition to the LM Studio Hub.
Share portable models by uploading a [`model.yaml`](./) to your page on the LM Studio Hub.
After you publish a model.yaml to the LM Studio Hub, it will be available for other users to download with `lms get`.
###### Note: `model.yaml` refers to metadata only. This means it does not include the actual model weights.
# Quickstart
The easiest way to get started is by cloning an existing model, modifying it, and then running `lms push`.
For example, you can clone the Qwen 3 8B model:
```shell
lms clone qwen/qwen3-8b
```
This will result in a local copy `model.yaml`, `README` and other metadata files. Importantly, this does NOT download the model weights.
```lms_terminal
$ ls
README.md manifest.json model.yaml thumbnail.png
```
## Change the publisher to your user
The first part in the `model:` field should be the username of the publisher. Change it to a username of a user or organization for which you have write access.
```diff
- model: qwen/qwen3-8b
+ model: your-user-here/qwen3-8b
base:
- key: lmstudio-community/qwen3-8b-gguf
sources:
# ... the rest of the file
```
## Sign in
Authenticate with the Hub from the command line:
```shell
lms login
```
The CLI will print an authentication URL. After you approve access, the session token is saved locally so you can publish models.
## Publish your model
Run the push command in the directory containing `model.yaml`:
```shell
lms push
```
The command packages the file, uploads it, and prints a revision number for the new version.
### Override metadata at publish time
Use `--overrides` to tweak fields without editing the file:
```shell
lms push --overrides '{"description": "Qwen 3 8B model"}'
```
## Downloading a model and using it in LM Studio
After publishing, the model appears under your user or organization profile on the LM Studio Hub.
It can then be downloaded with:
```shell
lms get my-user/my-model
```
## presets
## Config Presets
> Save your system prompts and other parameters as Presets for easy reuse across chats.
Presets are a way to bundle together a system prompt and other parameters into a single configuration that can be easily reused across different chats.
New in 0.3.15: You can [import](/docs/app/presets/import) Presets from file or URL, and even [publish](/docs/app/presets/publish) your own Presets to share with others on to the LM Studio Hub.
## Saving, resetting, and deselecting Presets
Below is the anatomy of the Preset manager:
## Importing, Publishing, and Updating Downloaded Presets
Presets are JSON files. You can share them by sending around the JSON, or you can share them by publishing them to the LM Studio Hub.
You can also import Presets from other users by URL. See the [Import](/docs/app/presets/import) and [Publish](/docs/app/presets/publish) sections for more details.
## Example: Build your own Prompt Library
You can create your own prompt library by using Presets.
In addition to system prompts, every parameter under the Advanced Configuration sidebar can be recorded in a named Preset.
For example, you might want to always use a certain Temperature, Top P, or Max Tokens for a particular use case. You can save these settings as a Preset (with or without a system prompt) and easily switch between them.
#### The Use Case for Presets
- Save your system prompts, inference parameters as a named `Preset`.
- Easily switch between different use cases, such as reasoning, creative writing, multi-turn conversations, or brainstorming.
## Where Presets are stored
Presets are stored in the following directory:
#### macOS or Linux
```xml
~/.lmstudio/config-presets
```
#### Windows
```xml
%USERPROFILE%\.lmstudio\config-presets
```
### Migration from LM Studio 0.2.\* Presets
- Presets you've saved in LM Studio 0.2.\* are automatically readable in 0.3.3 with no migration step needed.
- If you save **new changes** in a **legacy preset**, it'll be **copied** to a new format upon save.
- The old files are NOT deleted.
- Notable difference: Load parameters are not included in the new preset format.
- Favor editing the model's default config in My Models. See [how to do it here](/docs/configuration/per-model).
### Community
Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Importing and Sharing
> You can import preset files directly from disk, or pull presets made by others via URL.
You can import preset by file or URL. This is useful for sharing presets with others, or for importing presets from other users.
# Import Presets
First, click the presets dropdown in the sidebar. You will see a list of your presets along with 2 buttons: `+ New Preset` and `Import`.
Click the `Import` button to import a preset.
## Import Presets from File
Once you click the Import button, you can select the source of the preset you want to import. You can either import from a file or from a URL.
## Import Presets from URL
Presets that are [published](/docs/app/presets/publish) to the LM Studio Hub can be imported by providing their URL.
Importing public presets does not require logging in within LM Studio.
### Using `lms` CLI
You can also use the CLI to import presets from URL. This is useful for sharing presets with others.
```
lms get {author}/{preset-name}
```
Example:
```bash
lms get neil/qwen3-thinking
```
### Find your config-presets directory
LM Studio manages config presets on disk. Presets are local and private by default. You or others can choose to share them by sharing the file.
Click on the `ā¢ā¢ā¢` button in the Preset dropdown and select "Reveal in Finder" (or "Show in Explorer" on Windows).
This will download the preset file and automatically surface it in the preset dropdown in the app.
### Where Hub shared presets are stored
Presets you share, and ones you download from the LM Studio Hub are saved in `~/.lmstudio/hub` on macOS and Linux, or `%USERPROFILE%\.lmstudio\hub` on Windows.
### Publish Your Presets
> Publish your Presets to the LM Studio Hub. Share your Presets with the community or with your colleagues.
`Feature In Preview`
Starting LM Studio 0.3.15, you can publish your Presets to the LM Studio community. This allows you to share your Presets with others and import Presets from other users.
This feature is early and we would love to hear your feedback. Please report bugs and feedback to bugs@lmstudio.ai.
---
## Step 1: Click the Publish Button
Identify the Preset you want to publish in the Preset dropdown. Click the `ā¢ā¢ā¢` button and select "Publish" from the menu.
## Step 2: Set the Preset Details
You will be prompted to set the details of your Preset. This includes the name (slug) and optional description.
Community presets are public and can be used by anyone on the internet!
#### Privacy and Terms
For good measure, visit the [Privacy Policy](https://lmstudio.ai/hub-privacy) and [Terms of Service](https://lmstudio.ai/hub-terms) to understand what's suitable to share on the Hub, and how data is handled. Community presets are public and visible to everyone. Make sure you agree to what these documents say before publishing your Preset.
### Pull Updates
> How to pull the latest revisions of your Presets, or presets you have imported from others.
`Feature In Preview`
You can pull the latest revisions of your Presets, or presets you have imported from others. This is useful for keeping your Presets up to date with the latest changes.
## How to Pull Updates
Click the `ā¢ā¢ā¢` button in the Preset dropdown and select "Pull" from the menu.
## Your Presets vs Others'
Both your published Presets and other downloaded Presets can be pulled and updated the same way.
### Push New Revisions
> Publish new revisions of your Presets to the LM Studio Hub.
`Feature In Preview`
Starting LM Studio 0.3.15, you can publish your Presets to the LM Studio community. This allows you to share your Presets with others and import Presets from other users.
This feature is early and we would love to hear your feedback. Please report bugs and feedback to bugs@lmstudio.ai.
---
## Published Presets
Presets you share on the LM Studio Hub can be updated.
## Step 1: Make Changes and Commit
Make any changes to your Preset, both in parameters that are already included in the Preset, or by adding new parameters.
## Step 2: Click the Push Button
Once changes are committed, you will see a `Push` button. Click it to push your changes to the Hub.
Pushing changes will result in a new revision of your Preset on the Hub.
## advanced
### Speculative Decoding
> Speed up generation with a draft model
`Advanced`
Speculative decoding is a technique that can substantially increase the generation speed of large language models (LLMs) without reducing response quality.
## What is Speculative Decoding
Speculative decoding relies on the collaboration of two models:
- A larger, "main" model
- A smaller, faster "draft" model
During generation, the draft model rapidly proposes potential tokens (subwords), which the main model can verify faster than it would take it to generate them from scratch. To maintain quality, the main model only accepts tokens that match what it would have generated. After the last accepted draft token, the main model always generates one additional token.
For a model to be used as a draft model, it must have the same "vocabulary" as the main model.
## How to enable Speculative Decoding
On `Power User` mode or higher, load a model, then select a `Draft Model` within the `Speculative Decoding` section of the chat sidebar:
### Finding compatible draft models
You might see the following when you open the dropdown:
Try to download a lower parameter variant of the model you have loaded, if it exists. If no smaller versions of your model exist, find a pairing that does.
For example:
Once you have both a main and draft model loaded, simply begin chatting to enable speculative decoding.
## Key factors affecting performance
Speculative decoding speed-up is generally dependent on two things:
1. How small and fast the _draft model_ is compared with the _main model_
2. How often the draft model is able to make "good" suggestions
In simple terms, you want to choose a draft model that's much smaller than the main model. And some prompts will work better than others.
### An important trade-off
Running a draft model alongside a main model to enable speculative decoding requires more **computation and resources** than running the main model on its own.
The key to faster generation of the main model is choosing a draft model that's both small and capable enough.
Here are general guidelines for the **maximum** draft model size you should select based on main model size (in parameters):
| Main Model Size | Max Draft Model Size to Expect Speed-Ups |
| :-------------: | :--------------------------------------: |
| 3B | - |
| 7B | 1B |
| 14B | 3B |
| 32B | 7B |
Generally, the larger the size difference is between the main model and the draft model, the greater the speed-up.
Note: if the draft model is not fast enough or effective enough at making "good" suggestions to the main model, the generation speed will not increase, and could actually decrease.
### Prompt dependent
One thing you will likely notice when using speculative decoding is that the generation speed is not consistent across all prompts.
The reason that the speed-up is not consistent across all prompts is because for some prompts, the draft model is less likely to make "good" suggestions to the main model.
Here are some extreme examples that illustrate this concept:
#### 1. Discrete Example: Mathematical Question
Prompt: "What is the quadratic equation formula?"
In this case, both a 70B model and a 0.5B model are both very likely to give the standard formula `x = (-b ± ā(b² - 4ac))/(2a)`. So if the draft model suggested this formula as the next tokens, the target model would likely accept it, making this an ideal case for speculative decoding to work efficiently.
#### 2. Creative Example: Story Generation
Prompt: "Write a story that begins: 'The door creaked open...'"
In this case, the smaller model's draft tokens are likely be rejected more often by the larger model, as each next word could branch into countless valid possibilities.
While "4" is the only reasonable answer to "2+2", this story could continue with "revealing a monster", "as the wind howled", "and Sarah froze", or hundreds of other perfectly valid continuations, making the smaller model's specific word predictions much less likely to match the larger
model's choices.
### Import Models
> Use model files you've downloaded outside of LM Studio
You can use compatible models you've downloaded outside of LM Studio by placing them in the expected directory structure.
### Use `lms import` (experimental)
To import a `GGUF` model you've downloaded outside of LM Studio, run the following command in your terminal:
```bash
lms import
```
###### Follow the interactive prompt to complete the import process.
### LM Studio's expected models directory structure
LM Studio aims to preserves the directory structure of models downloaded from Hugging Face. The expected directory structure is as follows:
```xml
~/.lmstudio/models/
āāā publisher/
āāā model/
āāā model-file.gguf
```
For example, if you have a model named `ocelot-v1` published by `infra-ai`, the structure would look like this:
```xml
~/.lmstudio/models/
āāā infra-ai/
āāā ocelot-v1/
āāā ocelot-v1-instruct-q4_0.gguf
```
### Community
Chat with other LM Studio users, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Per-model Defaults
> You can set default settings for each model in LM Studio
`Advanced`
You can set default load settings for each model in LM Studio.
When the model is loaded anywhere in the app (including through [`lms load`](/docs/cli#load-a-model-with-options)) these settings will be used.
### Setting default parameters for a model
Head to the My Models tab and click on the gear āļø icon to edit the model's default parameters.
This will open a dialog where you can set the default parameters for the model.
Next time you load the model, these settings will be used.
```lms_protip
#### Reasons to set default load parameters (not required, totally optional)
- Set a particular GPU offload settings for a given model
- Set a particular context size for a given model
- Whether or not to utilize Flash Attention for a given model
```
## Advanced Topics
### Changing load settings before loading a model
When you load a model, you can optionally change the default load settings.
### Saving your changes as the default settings for a model
If you make changes to load settings when you load a model, you can save them as the default settings for that model.
### Community
Chat with other LM Studio power users, discuss configs, models, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
### Prompt Template
> Optionally set or modify the model's prompt template
`Advanced`
By default, LM Studio will automatically configure the prompt template based on the model file's metadata.
However, you can customize the prompt template for any model.
### Overriding the Prompt Template for a Specific Model
Head over to the My Models tab and click on the gear āļø icon to edit the model's default parameters.
###### Pro tip: you can jump to the My Models tab from anywhere by pressing `ā` + `3` on Mac, or `ctrl` + `3` on Windows / Linux.
### Customize the Prompt Template
###### š” In most cases you don't need to change the prompt template
When a model doesn't come with a prompt template information, LM Studio will surface the `Prompt Template` config box in the **š§Ŗ Advanced Configuration** sidebar.
You can make this config box always show up by right clicking the sidebar and selecting **Always Show Prompt Template**.
### Prompt template options
#### Jinja Template
You can express the prompt template in Jinja.
###### š” [Jinja](https://en.wikipedia.org/wiki/Jinja_(template_engine)) is a templating engine used to encode the prompt template in several popular LLM model file formats.
#### Manual
You can also express the prompt template manually by specifying message role prefixes and suffixes.
#### Reasons you might want to edit the prompt template:
1. The model's metadata is incorrect, incomplete, or LM Studio doesn't recognize it
2. The model does not have a prompt template in its metadata (e.g. custom or older models)
3. You want to customize the prompt template for a specific use case
## user-interface
### LM Studio in your language
> LM Studio is available in English, Chinese, Spanish, French, German, Korean, Russian, and 26+ more languages.
LM Studio is available in `English`, `Spanish`, `Japanese`, `Chinese`, `German`, `Norwegian`, `Turkish`, `Russian`, `Korean`, `Polish`, `Vietnamese`, `Czech`, `Ukrainian`, `Portuguese (BR,PT)` and many more languages thanks to incredible community localizers.
### Selecting a Language
You can choose a language in the Settings tab.
Use the dropdown menu under Preferences > Language.
```lms_protip
You can jump to Settings from anywhere in the app by pressing `cmd` + `,` on macOS or `ctrl` + `,` on Windows/Linux.
```
###### To get to the Settings page, you need to be on [Power User mode](/docs/modes) or higher.
#### Big thank you to community localizers š
- Spanish [@xtianpaiva](https://github.com/xtianpaiva), [@AlexisGross](https://github.com/AlexisGross), [@Tonband](https://github.com/Tonband)
- Norwegian [@Exlo84](https://github.com/Exlo84)
- German [@marcelMaier](https://github.com/marcelMaier), [@Goekdeniz-Guelmez](https://github.com/Goekdeniz-Guelmez)
- Romanian (ro) [@alexandrughinea](https://github.com/alexandrughinea)
- Turkish (tr) [@progesor](https://github.com/progesor), [@nossbar](https://github.com/nossbar)
- Russian [@shelomitsky](https://github.com/shelomitsky), [@mlatysh](https://github.com/mlatysh), [@Adjacentai](https://github.com/Adjacentai), [@HostFly](https://github.com/HostFly), [@MotyaDev](https://github.com/MotyaDev), [@Autumn-Whisper](https://github.com/Autumn-Whisper), [@seropheem](https://github.com/seropheem)
- Korean [@williamjeong2](https://github.com/williamjeong2)
- Polish [@danieltechdev](https://github.com/danieltechdev)
- Czech [@ladislavsulc](https://github.com/ladislavsulc)
- Vietnamese [@trinhvanminh](https://github.com/trinhvanminh), [@godkyo98](https://github.com/godkyo98)
- Portuguese (BR) [@Sm1g00l](https://github.com/Sm1g00l), [@altiereslima](https://github.com/altiereslima)
- Portuguese (PT) [@catarino](https://github.com/catarino)
- Chinese (zh-CN) [@neotan](https://github.com/neotan), [@SweetDream0256](https://github.com/SweetDream0256), [@enKl03B](https://github.com/enKl03B), [@evansrrr](https://github.com/evansrrr), [@xkonglong](https://github.com/xkonglong), [@shadow01a](https://github.com/shadow01a)
- Chinese (zh-HK), (zh-TW) [@neotan](https://github.com/neotan), [ceshizhuanyong895](https://github.com/ceshizhuanyong895), [@BrassaiKao](https://github.com/BrassaiKao)
- Chinese (zh-Hant) [@kywarai](https://github.com/kywarai), [ceshizhuanyong895](https://github.com/ceshizhuanyong895)
- Ukrainian (uk) [@hmelenok](https://github.com/hmelenok)
- Japanese (ja) [@digitalsp](https://github.com/digitalsp)
- Dutch (nl) [@alaaf11](https://github.com/alaaf11)
- Italian (it) [@fralapo](https://github.com/fralapo), [@Bl4ck-D0g](https://github.com/Bl4ck-D0g), [@nikypalma](https://github.com/nikypalma)
- Indonesian (id) [@dwirx](https://github.com/dwirx)
- Greek (gr) [@ilikecatgirls](https://github.com/ilikecatgirls)
- Swedish (sv) [@reinew](https://github.com/reinew)
- Catalan (ca) [@Gopro3010](https://github.com/Gopro3010)
- French [@Plexi09](https://github.com/Plexi09)
- Finnish (fi) [@divergentti](https://github.com/divergentti)
- Bengali (bn) [@AbiruzzamanMolla](https://github.com/AbiruzzamanMolla)
- Malayalam (ml) [@prasanthc41m](https://github.com/prasanthc41m)
- Thai (th) [@gnoparus](https://github.com/gnoparus)
- Bosnian (bs) [@0haris0](https://github.com/0haris0)
- Bulgarian (bg) [@DenisZekiria](https://github.com/DenisZekiria)
- Hindi (hi) [@suhailtajshaik](https://github.com/suhailtajshaik)
- Hungarian (hu) [@Mekemoka](https://github.com/Mekemoka)
- Persian (Farsi) (fa) [@mohammad007kh](https://github.com/mohammad007kh), [@darwindev](https://github.com/darwindev)
- Arabic (ar) [@haqbany](https://github.com/haqbany)
Still under development (due to lack of RTL support in LM Studio)
- Hebrew: [@NHLOCAL](https://github.com/NHLOCAL)
#### Contributing to LM Studio localization
If you want to improve existing translations or contribute new ones, you're more than welcome to jump in.
LM Studio strings are maintained in https://github.com/lmstudio-ai/localization.
See instructions for contributing [here](https://github.com/lmstudio-ai/localization/blob/main/README.md).
### User, Power User, or Developer
> Hide or reveal advanced features
### Selecting a UI Mode
You can configure LM Studio to run in increasing levels of configurability.
Select between User, Power User, and Developer.
### Which mode should I choose?
#### `User`
Show only the chat interface, and auto-configure everything. This is the best choice for beginners or anyone who's happy with the default settings.
#### `Power User`
Use LM Studio in this mode if you want access to configurable [load](/docs/configuration/load) and [inference](/docs/configuration/inference) parameters as well as advanced chat features such as [insert, edit, & continue](/docs/advanced/context) (for either role, user or assistant).
#### `Developer`
Full access to all aspects in LM Studio. This includes keyboard shortcuts and development features. Check out the Developer section under Settings for more.
### Color Themes
> Customize LM Studio's color theme
### Selecting a Theme
Press `Cmd` + `K` then `T` (macOS) or `Ctrl` + `K` then `T` (Windows/Linux) to open the theme selector.
You can also choose a theme in the Settings tab (`Cmd` + `,` on macOS or `Ctrl` + `,` on Windows/Linux).
Choosing the "Auto" option will automatically switch between Light and Dark themes based on your system settings.
# developer
# LM Studio Developer Docs
> Build with LM Studio's local APIs and SDKs ā TypeScript, Python, REST, and OpenAIācompatible endpoints.
```lms_hstack
## Get to know the stack
- TypeScript SDK: [lmstudio-js](/docs/typescript)
- Python SDK: [lmstudio-python](/docs/python)
- LM Studio REST API: [Stateful Chats, MCPs via API](/docs/developer/rest)
- OpenAIācompatible: [Chat, Responses, Embeddings](/docs/developer/openai-compat)
- LM Studio CLI: [`lms`](/docs/cli)
:::split:::
## What you can build
- Chat and text generation with streaming
- Tool calling and local agents with MCP
- Structured output (JSON schema)
- Embeddings and tokenization
- Model management (load, download, list)
```
## Super quick start
### TypeScript (`lmstudio-js`)
```bash
npm install @lmstudio/sdk
```
```ts
import { LMStudioClient } from "@lmstudio/sdk";
const client = new LMStudioClient();
const model = await client.llm.model("openai/gpt-oss-20b");
const result = await model.respond("Who are you, and what can you do?");
console.info(result.content);
```
Full docs: [lmstudio-js](/docs/typescript), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-js)
### Python (`lmstudio-python`)
```bash
pip install lmstudio
```
```python
import lmstudio as lms
with lms.Client() as client:
model = client.llm.model("openai/gpt-oss-20b")
result = model.respond("Who are you, and what can you do?")
print(result)
```
Full docs: [lmstudio-python](/docs/python), Source: [GitHub](https://github.com/lmstudio-ai/lmstudio-python)
### HTTP (LM Studio REST API)
```bash
lms server start --port 1234
```
```bash
curl http://localhost:1234/api/v1/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LM_API_TOKEN" \
-d '{
"model": "openai/gpt-oss-20b",
"input": "Who are you, and what can you do?"
}'
```
Full docs: [LM Studio REST API](/docs/developer/rest)
## Helpful links
- [API Changelog](/docs/developer/api-changelog)
- [Local server basics](/docs/developer/core)
- [CLI reference](/docs/cli)
- [Discord Community](https://discord.gg/lmstudio)
## API Changelog
> Updates and changes to the LM Studio API.
---
###### LM Studio 0.4.0
### LM Studio native v1 REST API
- Official release of LM Studio's native v1 REST API at `/api/v1/*` endpoints.
- [MCP via API](/docs/developer/core/mcp)
- [Stateful chats](/docs/developer/rest/stateful-chats)
- [Authentication](/docs/developer/core/authentication) configuration with API tokens
- Model [download](/docs/developer/rest/download) and [load](/docs/developer/rest/load) endpoints
- See [overview](/docs/developer/rest) page for more details and [comparison](/docs/developer/rest#inference-endpoint-comparison) with OpenAI-compatible endpoints.
---
###### LM Studio 0.3.29 ā¢Ā 2025ā10ā06
### OpenAI `/v1/responses` and variant listing
- New OpenAIācompatible endpoint: `POST /v1/responses`.
- Stateful interactions via `previous_response_id`.
- Custom tool calling and Remote MCP support (optāin).
- Reasoning support with `reasoning.effort` for `openai/gptāossā20b`.
- Streaming via SSE when `stream: true`.
- CLI: `lms ls --variants` lists all variants for multiāvariant models.
- Docs: [/docs/developer/openai-compat](/docs/developer/openai-compat). Full release notes: [/blog/lmstudio-v0.3.29](/blog/lmstudio-v0.3.29).
---
###### LM Studio 0.3.27 ā¢Ā 2025ā09ā24
### CLI: model resource estimates, status, and interrupts
- New: `lms load --estimate-only ` prints estimated GPU and total memory before loading. Honors `--context-length` and `--gpu`, and uses an improved estimator that now accounts for flash attention and vision models.
- `lms chat`: press `Ctrl+C` to interrupt an ongoing prediction.
- `lms ps --json` now reports each model's generation status and the number of queued prediction requests.
- CLI color contrast improved for light mode.
- See docs: [/docs/cli/load](/docs/cli/load). Full release notes: [/blog/lmstudio-v0.3.27](/blog/lmstudio-v0.3.27).
---
###### LM Studio 0.3.26 ā¢Ā 2025ā09ā15
### CLI log streaming: server + model
- `lms log stream` now supports multiple sources and filters.
- `--source server` streams HTTP server logs (startup, endpoints, status)
- `--source model --filter input,output` streams formatted user input and model output
- Append `--json` for machineāreadable logs; `--stats` adds tokens/sec and related metrics (model source)
- See usage and examples: [/docs/cli/log-stream](/docs/cli/log-stream). Full release notes: [/blog/lmstudio-v0.3.26](/blog/lmstudio-v0.3.26).
---
###### LM Studio 0.3.25 ā¢Ā 2025ā09ā04
### New model support (API)
- Added support for NVIDIA NemotronāNanoāv2 with toolācalling via the OpenAIācompatible endpoints [ā”](/blog/lmstudio-v0.3.25).
- Added support for Google EmbeddingGemma for the `/v1/embeddings` endpoint [ā”](/blog/lmstudio-v0.3.25).
---
###### LM Studio 0.3.24 ā¢Ā 2025ā08ā28
### SeedāOSS toolācalling and template fixes
- Added support for ByteDance/SeedāOSS including toolācalling and promptātemplate compatibility fixes in the OpenAIācompatible API [ā”](/blog/lmstudio-v0.3.24).
- Fixed cases where tool calls were not parsed for certain prompt templates [ā”](/blog/lmstudio-v0.3.24).
---
###### LM Studio 0.3.23 ā¢Ā 2025ā08ā12
### Reasoning content and toolācalling reliability
- For `gptāoss` on `POST /v1/chat/completions`, reasoning content moves out of `message.content` and into `choices.message.reasoning` (nonāstreaming) and `choices.delta.reasoning` (streaming), aligning with `o3āmini` [ā”](/blog/lmstudio-v0.3.23).
- Tool names are normalized (e.g., snake_case) before being provided to the model to improve toolācalling reliability [ā”](/blog/lmstudio-v0.3.23).
- Fixed errors for certain toolsācontaining requests to `POST /v1/chat/completions` (e.g., "reading 'properties'") and nonāstreaming toolācall failures [ā”](/blog/lmstudio-v0.3.23).
---
###### LM Studio 0.3.19 ā¢Ā 2025ā07ā21
### Bug fixes for streaming and tool calls
- Corrected usage statistics returned by OpenAIācompatible streaming responses [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.19#:~:text=,OpenAI%20streaming%20responses%20were%20incorrect).
- Improved handling of parallel tool calls via the streaming API [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.19#:~:text=,API%20were%20not%20handled%20correctly).
- Fixed parsing of correct tool calls for certain Mistral models [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.19#:~:text=,Ryzen%20AI%20PRO%20300%20series).
---
###### LM Studio 0.3.18 ā¢Ā 2025ā07ā10
### Streaming options and toolācalling improvements
- Added support for the `stream_options` object on OpenAIācompatible endpoints. Setting `stream_options.include_usage` to `true` returns prompt and completion token usage during streaming [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.18#:~:text=%2A%20Added%20support%20for%20%60,to%20support%20more%20prompt%20templates).
- Errors returned from streaming endpoints now follow the correct format expected by OpenAI clients [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.18#:~:text=,with%20proper%20chat%20templates).
- Toolācalling support added for MistralĀ v13 tokenizer models, using proper chat templates [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.18#:~:text=,with%20proper%20chat%20templates).
- The `response_format.type` field now accepts `"text"` in chatācompletion requests [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.18#:~:text=,that%20are%20split%20across%20multiple).
- Fixed bugs where parallel tool calls split across multiple chunks were dropped and where rootālevel `$defs` in tool definitions were stripped [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.18#:~:text=,being%20stripped%20in%20tool%20definitions).
---
###### LM Studio 0.3.17 ā¢Ā 2025ā06ā25
### Toolācalling reliability and tokenācount updates
- Token counts now include the system prompt and tool definitions [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.17#:~:text=,have%20a%20URL%20in%20the). This makes usage reporting more accurate for both the UI and the API.
- Toolācall argument tokens are streamed as they are generated [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.17#:~:text=Build%206), improving responsiveness when using streamed function calls.
- Various fixes improve MCP and toolācalling reliability, including correct handling of tools that omit a `parameters` object and preventing hangs when an MCP server reloads [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.17#:~:text=,tool%20calls%20would%20hang%20indefinitely).
---
###### LM Studio 0.3.16 ā¢Ā 2025ā05ā23
### Model capabilities in `GETĀ /models`
- The OpenAIācompatible REST API (`/api/v0`) now returns a `capabilities` array in the `GETĀ /models` response. Each model lists its supported capabilities (e.g. `"tool_use"`) [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.16#:~:text=,response) so clients can programmatically discover toolāenabled models.
- Fixed a streaming bug where an empty function name string was appended after the first packet of streamed tool calls [ā”](https://lmstudio.ai/blog/lmstudio-v0.3.16#:~:text=%2A%20Bugfix%3A%20%5BOpenAI,packet%20of%20streamed%20function%20calls).
---
###### [š¾ LM Studio 0.3.15](/blog/lmstudio-v0.3.15) ⢠2025-04-24
### Improved Tool Use API Support
OpenAI-like REST API now supports the `tool_choice` parameter:
```json
{
"tool_choice": "auto" // or "none", "required"
}
```
- `"tool_choice": "none"` ā Model will not call tools
- `"tool_choice": "auto"` ā Model decides
- `"tool_choice": "required"` ā Model must call tools (llama.cpp only)
Chunked responses now set `"finish_reason": "tool_calls"` when appropriate.
---
###### [š¾ LM Studio 0.3.14](/blog/lmstudio-v0.3.14) ⢠2025-03-27
### [API/SDK] Preset Support
RESTful API and SDKs support specifying presets in requests.
_(example needed)_
###### [š¾ LM Studio 0.3.10](/blog/lmstudio-v0.3.10) ⢠2025-02-18
### Speculative Decoding API
Enable speculative decoding in API requests with `"draft_model"`:
```json
{
"model": "deepseek-r1-distill-qwen-7b",
"draft_model": "deepseek-r1-distill-qwen-0.5b",
"messages": [ ... ]
}
```
Responses now include a `stats` object for speculative decoding:
```json
"stats": {
"tokens_per_second": ...,
"draft_model": "...",
"total_draft_tokens_count": ...,
"accepted_draft_tokens_count": ...,
"rejected_draft_tokens_count": ...,
"ignored_draft_tokens_count": ...
}
```
---
###### [š¾ LM Studio 0.3.9](blog/lmstudio-v0.3.9) ⢠2025-01-30
### Idle TTL and Auto Evict
Set a TTL (in seconds) for models loaded via API requests (docs article: [Idle TTL and Auto-Evict](/docs/developer/core/ttl-and-auto-evict))
```diff
curl http://localhost:1234/api/v0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-qwen-7b",
"messages": [ ... ]
+ "ttl": 300,
}'
```
With `lms`:
```
lms load --ttl
```
### Separate `reasoning_content` in Chat Completion responses
For DeepSeek R1 models, get reasoning content in a separate field. See more [here](/blog/lmstudio-v0.3.9#separate-reasoningcontent-in-chat-completion-responses).
Turn this on in App Settings > Developer.
---
###### [š¾ LM Studio 0.3.6](blog/lmstudio-v0.3.6) ⢠2025-01-06
### Tool and Function Calling API
Use any LLM that supports Tool Use and Function Calling through the OpenAI-like API.
Docs: [Tool Use and Function Calling](/docs/developer/core/tools).
---
###### [š¾ LM Studio 0.3.5](blog/lmstudio-v0.3.5) ⢠2024-10-22
### Introducing `lms get`: download models from the terminal
You can now download models directly from the terminal using a keyword
```bash
lms get deepseek-r1
```
or a full Hugging Face URL
```bash
lms get
```
To filter for MLX models only, add `--mlx` to the command.
```bash
lms get deepseek-r1 --mlx
```
## core
### Authentication
> Using API Tokens in LM Studio
##### Requires [LM Studio 0.4.0](/download) or newer.
LM Studio supports API Tokens for authentication, providing a secure and convenient way to access the LM Studio API.
### Require Authentication for each request
By default, LM Studio does not require authentication for API requests. To enable authentication so that only requests with a valid API Token are accepted, toggle the switch in the Developers Page > Server Settings.
```lms_info
Once enabled, all requests made through the REST API, Python SDK, or Typescript SDK will need to include a valid API Token. See usage [below](#api-token-usage).
```
### Creating API Tokens
To create API Tokens, click on Manage Tokens in the Server Settings. It will open the API Tokens modal where you can create, view, and delete API Tokens.
Create a token by clicking on the Create Token button. Provide a name for the token and select the desired permissions.
Once created, make sure to copy the token as it will not be shown again.
### Configuring API Token Permissions
To edit the permissions of an existing API Token, click on the Edit button next to the token in the API Tokens modal. You can modify the name and permissions of the token.
## API Token Usage
### Using API Tokens with REST API:
```lms_noticechill
The example below requires [allowing calling servers from mcp.json](/docs/developer/core/server/settings) to be enabled and the [tiktoken MCP](https://gitmcp.io/openai/tiktoken) in mcp.json.
```
```bash
curl -X POST \
http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the first line in the tiktoken documentation?",
"integrations": [
{
"type": "plugin",
"id": "mcp/tiktoken",
"allowed_tools": [
"fetch_tiktoken_documentation"
]
}
]
}'
```
### Using API Tokens with Python SDK
To use API tokens with the Python SDK, see the [Python SDK guide](/docs/python/getting-started/authentication).
### Using API Tokens with TypeScript SDK
To use API tokens with the TypeScript SDK, see the [TS SDK guide](/docs/typescript/authentication).
### Run LM Studio as a service (headless)
> GUI-less operation of LM Studio: run in the background, start on machine login, and load models on demand
LM Studio can be run as a service without the GUI. This is useful for running LM Studio on a server or in the background on your local machine. This works on Mac, Windows, and Linux machines with a graphical user interface.
## Run LM Studio as a service
Running LM Studio as a service consists of several new features intended to make it more efficient to use LM Studio as a developer tool.
1. The ability to run LM Studio without the GUI
2. The ability to start the LM Studio LLM server on machine login, headlessly
3. On-demand model loading
## Run the LLM service on machine login
To enable this, head to app settings (`Cmd` / `Ctrl` + `,`) and check the box to run the LLM server on login.
When this setting is enabled, exiting the app will minimize it to the system tray, and the LLM server will continue to run in the background.
## Just-In-Time (JIT) model loading for REST endpoints
Useful when utilizing LM Studio as an LLM service with other frontends or applications.
#### When JIT loading is ON:
- Calls to OpenAI-compatible `/v1/models` will return all downloaded models, not only the ones loaded into memory
- Calls to inference endpoints will load the model into memory if it's not already loaded
#### When JIT loading is OFF:
- Calls to OpenAI-compatible `/v1/models` will return only the models loaded into memory
- You have to first load the model into memory before being able to use it
#### What about auto unloading?
JIT loaded models will be auto-unloaded from memory by default after a set period of inactivity ([learn more](/docs/developer/core/ttl-and-auto-evict)).
## Auto Server Start
Your last server state will be saved and restored on app or service launch.
To achieve this programmatically, you can use the following command:
```bash
lms server start
```
```lms_protip
If you haven't already, bootstrap `lms` on your machine by following the instructions [here](/docs/cli).
```
### Community
Chat with other LM Studio developers, discuss LLMs, hardware, and more on the [LM Studio Discord server](https://discord.gg/aPQfnNkxGC).
Please report bugs and issues in the [lmstudio-bug-tracker](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues) GitHub repository.
### Using MCP via API
> Learn how to use Model Control Protocol (MCP) servers with LM Studio API.
##### Requires [LM Studio 0.4.0](/download) or newer.
LM Studio supports Model Control Protocol (MCP) usage via API. MCP allows models to interact with external tools and services through standardized servers.
## How it works
MCP servers provide tools that models can call during chat requests. You can enable MCP servers in two ways: as ephemeral servers defined per-request, or as pre-configured servers in your `mcp.json` file.
## Ephemeral vs mcp.json servers
Feature
Ephemeral
mcp.json
How to specify in request
integrations -> "type": "ephemeral_mcp"
integrations -> "type": "plugin"
Configuration
Only defined per-request
Pre-configured in mcp.json
Use case
One-off requests, remote MCP tool execution
MCP servers that require command, frequently used servers
Server ID
Specified via server_label in integration
Specified via id (e.g., mcp/playwright) in integration
Custom headers
Supported via headers field
Configured in mcp.json
## Ephemeral MCP servers
Ephemeral MCP servers are defined on-the-fly in each request. This is useful for testing or when you don't want to pre-configure servers.
```lms_info
Ephemeral MCP servers require the "Allow per-request MCPs" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
```
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
});
const data = await response.json();
console.log(data);
```
The model can now call tools from the specified MCP server:
```lms_code_snippet
variants:
response:
language: json
code: |
{
"model_instance_id": "ibm/granite-4-micro",
"output": [
{
"type": "reasoning",
"content": "..."
},
{
"type": "message",
"content": "..."
},
{
"type": "tool_call",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"limit": 1
},
"output": "...",
"provider_info": {
"server_label": "huggingface",
"type": "ephemeral_mcp"
}
},
{
"type": "reasoning",
"content": "\n"
},
{
"type": "message",
"content": "The top trending model is ..."
}
],
"stats": {
"input_tokens": 419,
"total_output_tokens": 362,
"reasoning_output_tokens": 195,
"tokens_per_second": 27.620159487314744,
"time_to_first_token_seconds": 1.437
},
"response_id": "resp_7c1a08e3d6e279efcfecb02df9de7cbd316e93422d0bb5cb"
}
```
## MCP servers from mcp.json
MCP servers can be pre-configured in your `mcp.json` file. This is the recommended approach for using MCP servers that take actions on your computer (like [microsoft/playwright-mcp](https://github.com/microsoft/playwright-mcp)) and servers that you use frequently.
```lms_info
MCP servers from mcp.json require the "Allow calling servers from mcp.json" setting to be enabled in [Server Settings](/docs/developer/core/server/settings).
```
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Open lmstudio.ai",
"integrations": ["mcp/playwright"],
"context_length": 8000,
"temperature": 0
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "Open lmstudio.ai",
"integrations": ["mcp/playwright"],
"context_length": 8000,
"temperature": 0
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "Open lmstudio.ai",
integrations: ["mcp/playwright"],
context_length: 8000,
temperature: 0
})
});
const data = await response.json();
console.log(data);
```
The response includes tool calls from the configured MCP server:
```lms_code_snippet
variants:
response:
language: json
code: |
{
"model_instance_id": "ibm/granite-4-micro",
"output": [
{
"type": "reasoning",
"content": "..."
},
{
"type": "message",
"content": "..."
},
{
"type": "tool_call",
"tool": "browser_navigate",
"arguments": {
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
},
"output": "...",
"provider_info": {
"plugin_id": "mcp/playwright",
"type": "plugin"
}
},
{
"type": "reasoning",
"content": "..."
},
{
"type": "message",
"content": "The YouTube video page for ..."
}
],
"stats": {
"input_tokens": 2614,
"total_output_tokens": 594,
"reasoning_output_tokens": 389,
"tokens_per_second": 26.293245822877495,
"time_to_first_token_seconds": 0.154
},
"response_id": "resp_cdac6a9b5e2a40027112e441ce6189db18c9040f96736407"
}
```
## Restricting tool access
For both ephemeral and mcp.json servers, you can limit which tools the model can call using the `allowed_tools` field. This is useful if you do not want certain tools from an MCP server to be used, and can speed up prompt processing due to the model receiving fewer tool definitions.
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "What is the top trending model on hugging face?",
integrations: [
{
type: "ephemeral_mcp",
server_label: "huggingface",
server_url: "https://huggingface.co/mcp",
allowed_tools: ["model_search"]
}
],
context_length: 8000
})
});
const data = await response.json();
console.log(data);
```
If `allowed_tools` is not provided, all tools from the server are available to the model.
## Custom headers for ephemeral servers
When using ephemeral MCP servers that require authentication, you can pass custom headers:
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"],
"headers": {
"Authorization": "Bearer "
}
}
],
"context_length": 8000
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"],
"headers": {
"Authorization": "Bearer "
}
}
],
"context_length": 8000
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "Give me details about my SUPER-SECRET-PRIVATE Hugging face model",
integrations: [
{
type: "ephemeral_mcp",
server_label: "huggingface",
server_url: "https://huggingface.co/mcp",
allowed_tools: ["model_search"],
headers: {
Authorization: "Bearer "
}
}
],
context_length: 8000
})
const data = await response.json();
console.log(data);
```
### Idle TTL and Auto-Evict
> Optionally auto-unload idle models after a certain amount of time (TTL)
## Background
- `JIT loading` makes it easy to use your LM Studio models in other apps: you don't need to manually load the model first before being able to use it. However, this also means that models can stay loaded in memory even when they're not being used. `[Default: enabled]`
- (New) `Idle TTL` (technically: Time-To-Live) defines how long a model can stay loaded in memory without receiving any requests. When the TTL expires, the model is automatically unloaded from memory. You can set a TTL using the `ttl` field in your request payload. `[Default: 60 minutes]`
- (New) `Auto-Evict` is a feature that unloads previously JIT loaded models before loading new ones. This enables easy switching between models from client apps without having to manually unload them first. You can enable or disable this feature in Developer tab > Server Settings. `[Default: enabled]`
## Idle TTL
**Use case**: imagine you're using an app like [Zed](https://github.com/zed-industries/zed/blob/main/crates/lmstudio/src/lmstudio.rs#L340), [Cline](https://github.com/cline/cline/blob/main/src/api/providers/lmstudio.ts), or [Continue.dev](https://docs.continue.dev/customize/model-providers/more/lmstudio) to interact with LLMs served by LM Studio. These apps leverage JIT to load models on-demand the first time you use them.
**Problem**: When you're not actively using a model, you might don't want it to remain loaded in memory.
**Solution**: Set a TTL for models loaded via API requests. The idle timer resets every time the model receives a request, so it won't disappear while you use it. A model is considered idle if it's not doing any work. When the idle TTL expires, the model is automatically unloaded from memory.
### Set App-default Idle TTL
By default, JIT-loaded models have a TTL of 60 minutes. You can configure a default TTL value for any model loaded via JIT like so:
### Set per-model TTL-model in API requests
When JIT loading is enabled, the **first request** to a model will load it into memory. You can specify a TTL for that model in the request payload.
This works for requests targeting both the [OpenAI compatibility API](/docs/developer/openai-api) and the [LM Studio's REST API](/docs/developer/rest):
```diff
curl http://localhost:1234/api/v0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-qwen-7b",
+ "ttl": 300,
"messages": [ ... ]
}'
```
###### This will set a TTL of 5 minutes (300 seconds) for this model if it is JIT loaded.
### Set TTL for models loaded with `lms`
By default, models loaded with `lms load` do not have a TTL, and will remain loaded in memory until you manually unload them.
You can set a TTL for a model loaded with `lms` like so:
```bash
lms load --ttl 3600
```
###### Load a `` with a TTL of 1 hour (3600 seconds)
### Specify TTL when loading models in the server tab
You can also set a TTL when loading a model in the server tab like so
## Configure Auto-Evict for JIT loaded models
With this setting, you can ensure new models loaded via JIT automatically unload previously loaded models first.
This is useful when you want to switch between models from another app without worrying about memory building up with unused models.
**When Auto-Evict is ON** (default):
- At most `1` model is kept loaded in memory at a time (when loaded via JIT)
- Non-JIT loaded models are not affected
**When Auto-Evict is OFF**:
- Switching models from an external app will keep previous models loaded in memory
- Models will remain loaded until either:
- Their TTL expires
- You manually unload them
This feature works in tandem with TTL to provide better memory management for your workflow.
### Nomenclature
`TTL`: Time-To-Live, is a term borrowed from networking protocols and cache systems. It defines how long a resource can remain allocated before it's considered stale and evicted.
### server
### LM Studio as a Local LLM API Server
> Run an LLM API server on `localhost` with LM Studio
You can serve local LLMs from LM Studio's Developer tab, either on `localhost` or on the network.
LM Studio's APIs can be used through [REST API](/docs/developer/rest), client libraries like [lmstudio-js](/docs/typescript) and [lmstudio-python](/docs/python), and [OpenAI compatibility endpoints](/docs/developer/openai-compat)
### Running the server
To run the server, go to the Developer tab in LM Studio, and toggle the "Start server" switch to start the API server.
Alternatively, you can use `lms` ([LM Studio's CLI](/docs/cli)) to start the server from your terminal:
```bash
lms server start
```
### API options
- [LM Studio REST API](/docs/developer/rest)
- [TypeScript SDK](/docs/typescript) - `lmstudio-js`
- [Python SDK](/docs/python) - `lmstudio-python`
- [OpenAI compatibility endpoints](/docs/developer/openai-compat)
#### Server Settings
> Configure server settings for LM Studio API Server
You can configure server settings, such as the port number, whether to allow other API clients to access the server and MCP features.
### Settings information
```lms_params
- name: Server Port
type: Integer
optional: false
description: Port number on which the LM Studio API server listens for incoming connections.
unstyledName: true
- name: Require Authentication
type: Switch
description: Require API clients to provide a valid API token via the `Authorization` header. Learn more in the [Authentication](/docs/developer/core/authentication) section.
unstyledName: true
- name: Serve on Local Network
type: Switch
description: Allow other devices on the same local network to access the API server. Learn more in the [Serve on Local Network](/docs/developer/core/server/serve-on-network) section.
unstyledName: true
- name: Allow per-request MCPs
type: Switch
description: Allow API clients to use MCP (Model Control Protocol) servers that are not in your mcp.json. These MCP connections are ephemeral, only existing as long as the request. At the moment, only remote MCPs are supported.
unstyledName: true
- name: Allow calling servers from mcp.json
type: Switch
description: Allow API clients to use servers you defined in your mcp.json in LM Studio. This can be a security risk if you've defined MCP servers that have access to your file system or private data. This option requires "Require Authentication" to be enabled.
unstyledName: true
- name: Enable CORS
type: Switch
description: Enable Cross-Origin Resource Sharing (CORS) to allow applications from different origins to access the API.
unstyledName: true
- name: Just in Time Model Loading
type: Switch
description: Load models dynamically at request time to save memory.
unstyledName: true
- name: Auto Unload Unused JIT Models
type: Switch
description: Automatically unload JIT-loaded models from memory when they are no longer in use.
unstyledName: true
- name: Only Keep Last JIT Loaded Model
type: Switch
description: Keep only the most recently used JIT-loaded model in memory to minimize RAM usag
unstyledName: true
```
#### Serve on Local Network
> Allow other devices in your network use this LM Studio API server
Enabling the "Serve on Local Network" option allows the LM Studio API server running on your machine to be accessible by other devices connected to the same local network.
This is useful for scenarios where you want to:
- Use a local LLM on your other less powerful devices by connecting them to a more powerful machine running LM Studio.
- Let multiple people use a single LM Studio instance on the network.
- Use the API from IoT devices, edge computing units, or other services in your local setup.
Once enabled, the server will bind to your local network IP address instead of localhost. The API access URL will be updated accordingly which you can use in your applications.
## rest
## LM Studio API
> LM Studio's REST API for local inference and model management
LM Studio offers a powerful REST API with first-class support for local inference and model management. In addition to our native API, we provide full OpenAI compatibility mode ([learn more](/docs/developer/openai-compat)).
## What's new
Previously, there was a [v0 REST API](/docs/developer/rest/endpoints). That API has since been deprecated in favor of the v1 REST API.
The v1 REST API includes enhanced features such as:
- [MCP via API](/docs/developer/core/mcp)
- [Stateful chats](/docs/developer/rest/stateful-chats)
- [Authentication](/docs/developer/core/authentication) configuration with API tokens
- Model [download](/docs/developer/rest/download) and [load](/docs/developer/rest/load) endpoints
## Supported endpoints
The following endpoints are available in LM Studio's v1 REST API.
## Inference endpoint comparison
The table below compares the features of LM Studio's `/api/v1/chat` endpoint with the OpenAI-compatible `/v1/responses` and `/v1/chat/completions` endpoints.
Feature
/api/v1/chat
/v1/responses
/v1/chat/completions
Stateful chat
ā
ā
ā
Remote MCPs
ā
ā
ā
MCPs you have in LM Studio
ā
ā
ā
Custom tools
ā
ā
ā
Model load streaming events
ā
ā
ā
Prompt processing streaming events
ā
ā
ā
Specify context length in the request
ā
ā
ā
---
Please report bugs by opening an issue on [Github](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues).
### Get up and running with the LM Studio API
> Download a model and start a simple Chat session using the REST API
## Start the server
[Install](/download) and launch LM Studio.
Then ensure the server is running through the toggle at the top left of the Developer page, or through [lms](/docs/cli) in the terminal:
```bash
lms server start
```
By default, the server is available at `http://localhost:1234`.
If you don't have a model downloaded yet, you can download the model:
```bash
lms get ibm/granite-4-micro
```
## API Authentication
By default, the LM Studio API server does **not** require authentication. You can configure the server to require authentication by API token in the [server settings](/docs/developer/core/server/settings) for added security.
To authenticate API requests, generate an API token from the Developer page in LM Studio, and include it in the `Authorization` header of your requests as follows: `Authorization: Bearer $LM_API_TOKEN`. Read more about authentication [here](/docs/developer/core/authentication).
## Chat with a model
Use the chat endpoint to send a message to a model. By default, the model will be automatically loaded if it is not already.
The `/api/v1/chat` endpoint is stateful, which means you do not need to pass the full history in every request. Read more about it [here](/docs/developer/rest/stateful-chats).
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Write a short haiku about sunrise."
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "Write a short haiku about sunrise."
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "Write a short haiku about sunrise."
})
});
const data = await response.json();
console.log(data);
```
See the full [chat](/docs/developer/rest/chat) docs for more details.
## Use MCP servers via API
Enable the model interact with ephemeral Model Context Protocol (MCP) servers in `/api/v1/chat` by specifying servers in the `integrations` field.
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "What is the top trending model on hugging face?",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": ["model_search"]
}
],
"context_length": 8000
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "What is the top trending model on hugging face?",
integrations: [
{
type: "ephemeral_mcp",
server_label: "huggingface",
server_url: "https://huggingface.co/mcp",
allowed_tools: ["model_search"]
}
],
context_length: 8000
})
const data = await response.json();
console.log(data);
```
You can also use locally configured MCP plugins (from your `mcp.json`) via the `integrations` field. Using locally run MCP plugins requires authentication via an API token passed through the `Authorization` header. Read more about authentication [here](/docs/developer/core/authentication).
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Open lmstudio.ai",
"integrations": [
{
"type": "plugin",
"id": "mcp/playwright",
"allowed_tools": ["browser_navigate"]
}
],
"context_length": 8000
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/chat",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={
"model": "ibm/granite-4-micro",
"input": "Open lmstudio.ai",
"integrations": [
{
"type": "plugin",
"id": "mcp/playwright".
"allowed_tools": ["browser_navigate"]
}
],
"context_length": 8000
}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro",
input: "Open lmstudio.ai",
integrations: [
{
type: "plugin",
id: "mcp/playwright",
allowed_tools: ["browser_navigate"]
}
],
context_length: 8000
})
});
const data = await response.json();
console.log(data);
```
See the full [chat](/docs/developer/rest/chat) docs for more details.
## Download a model
Use the download endpoint to download models by identifier from the [LM Studio model catalog](https://lmstudio.ai/models), or by Hugging Face model URL.
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/models/download \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro"
}'
Python:
language: python
code: |
import os
import requests
import json
response = requests.post(
"http://localhost:1234/api/v1/models/download",
headers={
"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}",
"Content-Type": "application/json"
},
json={"model": "ibm/granite-4-micro"}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const response = await fetch("http://localhost:1234/api/v1/models/download", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "ibm/granite-4-micro"
})
});
const data = await response.json();
console.log(data);
```
The response will return a `job_id` that you can use to track download progress.
```lms_code_snippet
variants:
curl:
language: bash
code: |
curl -H "Authorization: Bearer $LM_API_TOKEN" \
http://localhost:1234/api/v1/models/download/status/{job_id}
Python:
language: python
code: |
import os
import requests
import json
job_id = "your-job-id"
response = requests.get(
f"http://localhost:1234/api/v1/models/download/status/{job_id}",
headers={"Authorization": f"Bearer {os.environ['LM_API_TOKEN']}"}
)
print(json.dumps(response.json(), indent=2))
TypeScript:
language: typescript
code: |
const jobId = "your-job-id";
const response = await fetch(
`http://localhost:1234/api/v1/models/download/status/${jobId}`,
{
headers: {
"Authorization": `Bearer ${process.env.LM_API_TOKEN}`
}
}
);
const data = await response.json();
console.log(data);
```
See the [download](/docs/developer/rest/download) and [download status](/docs/developer/rest/download-status) docs for more details.
### Stateful Chats
> Learn how to maintain conversation context across multiple requests
The `/api/v1/chat` endpoint is stateful by default. This means you don't need to pass the full conversation history in every request ā LM Studio automatically stores and manages the context for you.
## How it works
When you send a chat request, LM Studio stores the conversation in a chat thread and returns a `response_id` in the response. Use this `response_id` in subsequent requests to continue the conversation.
```lms_code_snippet
title: Start a new conversation
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "My favorite color is blue."
}'
```
The response includes a `response_id`:
```lms_info
Every response includes an unique `response_id` that you can use to reference that specific point in the conversation for future requests. This allows you to branch conversations.
```
```lms_code_snippet
title: Response
variants:
response:
language: json
code: |
{
"model_instance_id": "ibm/granite-4-micro",
"output": [
{
"type": "message",
"content": "That's great! Blue is a beautiful color..."
}
],
"response_id": "resp_abc123xyz..."
}
```
## Continue a conversation
Pass the `previous_response_id` in your next request to continue the conversation. The model will remember the previous context.
```lms_code_snippet
title: Continue the conversation
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "What color did I just mention?",
"previous_response_id": "resp_abc123xyz..."
}'
```
The model can reference the previous message without you needing to resend it and will return a new `response_id` for further continuation.
## Disable stateful storage
If you don't want to store the conversation, set `store` to `false`. The response will not include a `response_id`.
```lms_code_snippet
title: Stateless chat
variants:
curl:
language: bash
code: |
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Tell me a joke.",
"store": false
}'
```
This is useful for one-off requests where you don't need to maintain context.
### Streaming events
> When you chat with a model with `stream` set to `true`, the response is sent as a stream of events using Server-Sent Events (SSE).
Streaming events let you render chat responses incrementally over ServerāSent Events (SSE). When you call `POST /api/v1/chat` with `stream: true`, the server emits a series of named events that you can consume. These events arrive in order and may include multiple deltas (for reasoning and message content), tool call boundaries and payloads, and any errors encountered. The stream always begins with `chat.start` and concludes with `chat.end`, which contains the aggregated result equivalent to a nonāstreaming response.
List of event types that can be sent in an `/api/v1/chat` response stream:
- `chat.start`
- `model_load.start`
- `model_load.progress`
- `model_load.end`
- `prompt_processing.start`
- `prompt_processing.progress`
- `prompt_processing.end`
- `reasoning.start`
- `reasoning.delta`
- `reasoning.end`
- `tool_call.start`
- `tool_call.arguments`
- `tool_call.success`
- `tool_call.failure`
- `message.start`
- `message.delta`
- `message.end`
- `error`
- `chat.end`
Events will be streamed out in the following raw format:
```bash
event:
data:
```
### `chat.start`
````lms_hstack
An event that is emitted at the start of a chat response stream.
```lms_params
- name: model_instance_id
type: string
description: Unique identifier for the loaded model instance that will generate the response.
- name: type
type: '"chat.start"'
description: The type of the event. Always `chat.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "chat.start",
"model_instance_id": "openai/gpt-oss-20b"
}
```
````
### `model_load.start`
````lms_hstack
Signals the start of a model being loaded to fulfill the chat request. Will not be emitted if the requested model is already loaded.
```lms_params
- name: model_instance_id
type: string
description: Unique identifier for the model instance being loaded.
- name: type
type: '"model_load.start"'
description: The type of the event. Always `model_load.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "model_load.start",
"model_instance_id": "openai/gpt-oss-20b"
}
```
````
### `model_load.progress`
````lms_hstack
Progress of the model load.
```lms_params
- name: model_instance_id
type: string
description: Unique identifier for the model instance being loaded.
- name: progress
type: number
description: Progress of the model load as a float between `0` and `1`.
- name: type
type: '"model_load.progress"'
description: The type of the event. Always `model_load.progress`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "model_load.progress",
"model_instance_id": "openai/gpt-oss-20b",
"progress": 0.65
}
```
````
### `model_load.end`
````lms_hstack
Signals a successfully completed model load.
```lms_params
- name: model_instance_id
type: string
description: Unique identifier for the model instance that was loaded.
- name: load_time_seconds
type: number
description: Time taken to load the model in seconds.
- name: type
type: '"model_load.end"'
description: The type of the event. Always `model_load.end`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "model_load.end",
"model_instance_id": "openai/gpt-oss-20b",
"load_time_seconds": 12.34
}
```
````
### `prompt_processing.start`
````lms_hstack
Signals the start of the model processing a prompt.
```lms_params
- name: type
type: '"prompt_processing.start"'
description: The type of the event. Always `prompt_processing.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "prompt_processing.start"
}
```
````
### `prompt_processing.progress`
````lms_hstack
Progress of the model processing a prompt.
```lms_params
- name: progress
type: number
description: Progress of the prompt processing as a float between `0` and `1`.
- name: type
type: '"prompt_processing.progress"'
description: The type of the event. Always `prompt_processing.progress`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "prompt_processing.progress",
"progress": 0.5
}
```
````
### `prompt_processing.end`
````lms_hstack
Signals the end of the model processing a prompt.
```lms_params
- name: type
type: '"prompt_processing.end"'
description: The type of the event. Always `prompt_processing.end`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "prompt_processing.end"
}
```
````
### `reasoning.start`
````lms_hstack
Signals the model is starting to stream reasoning content.
```lms_params
- name: type
type: '"reasoning.start"'
description: The type of the event. Always `reasoning.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "reasoning.start"
}
```
````
### `reasoning.delta`
````lms_hstack
A chunk of reasoning content. Multiple deltas may arrive.
```lms_params
- name: content
type: string
description: Reasoning text fragment.
- name: type
type: '"reasoning.delta"'
description: The type of the event. Always `reasoning.delta`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "reasoning.delta",
"content": "Need to"
}
```
````
### `reasoning.end`
````lms_hstack
Signals the end of the reasoning stream.
```lms_params
- name: type
type: '"reasoning.end"'
description: The type of the event. Always `reasoning.end`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "reasoning.end"
}
```
````
### `tool_call.start`
````lms_hstack
Emitted when the model starts a tool call.
```lms_params
- name: tool
type: string
description: Name of the tool being called.
- name: provider_info
type: object
description: Information about the tool provider. Discriminated union upon possible provider types.
children:
- name: Plugin provider info
type: object
description: Present when the tool is provided by a plugin.
children:
- name: type
type: '"plugin"'
description: Provider type.
- name: plugin_id
type: string
description: Identifier of the plugin.
- name: Ephemeral MCP provider info
type: object
description: Present when the tool is provided by a ephemeral MCP server.
children:
- name: type
type: '"ephemeral_mcp"'
description: Provider type.
- name: server_label
type: string
description: Label of the MCP server.
- name: type
type: '"tool_call.start"'
description: The type of the event. Always `tool_call.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "tool_call.start",
"tool": "model_search",
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}
```
````
### `tool_call.arguments`
````lms_hstack
Arguments streamed for the current tool call.
```lms_params
- name: tool
type: string
description: Name of the tool being called.
- name: arguments
type: object
description: Arguments passed to the tool. Can have any keys/values depending on the tool definition.
- name: provider_info
type: object
description: Information about the tool provider. Discriminated union upon possible provider types.
children:
- name: Plugin provider info
type: object
description: Present when the tool is provided by a plugin.
children:
- name: type
type: '"plugin"'
description: Provider type.
- name: plugin_id
type: string
description: Identifier of the plugin.
- name: Ephemeral MCP provider info
type: object
description: Present when the tool is provided by a ephemeral MCP server.
children:
- name: type
type: '"ephemeral_mcp"'
description: Provider type.
- name: server_label
type: string
description: Label of the MCP server.
- name: type
type: '"tool_call.arguments"'
description: The type of the event. Always `tool_call.arguments`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "tool_call.arguments",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"limit": 1
},
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}
```
````
### `tool_call.success`
````lms_hstack
Result of the tool call, along with the arguments used.
```lms_params
- name: tool
type: string
description: Name of the tool that was called.
- name: arguments
type: object
description: Arguments that were passed to the tool.
- name: output
type: string
description: Raw tool output string.
- name: provider_info
type: object
description: Information about the tool provider. Discriminated union upon possible provider types.
children:
- name: Plugin provider info
type: object
description: Present when the tool is provided by a plugin.
children:
- name: type
type: '"plugin"'
description: Provider type.
- name: plugin_id
type: string
description: Identifier of the plugin.
- name: Ephemeral MCP provider info
type: object
description: Present when the tool is provided by a ephemeral MCP server.
children:
- name: type
type: '"ephemeral_mcp"'
description: Provider type.
- name: server_label
type: string
description: Label of the MCP server.
- name: type
type: '"tool_call.success"'
description: The type of the event. Always `tool_call.success`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "tool_call.success",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"limit": 1
},
"output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
"provider_info": {
"type": "ephemeral_mcp",
"server_label": "huggingface"
}
}
```
````
### `tool_call.failure`
````lms_hstack
Indicates that the tool call failed.
```lms_params
- name: reason
type: string
description: Reason for the tool call failure.
- name: metadata
type: object
description: Metadata about the invalid tool call.
children:
- name: type
type: '"invalid_name" | "invalid_arguments"'
description: Type of error that occurred.
- name: tool_name
type: string
description: Name of the tool that was attempted to be called.
- name: arguments
type: object
optional: true
description: Arguments that were passed to the tool (only present for `invalid_arguments` errors).
- name: provider_info
type: object
optional: true
description: Information about the tool provider (only present for `invalid_arguments` errors).
children:
- name: type
type: '"plugin" | "ephemeral_mcp"'
description: Provider type.
- name: plugin_id
type: string
optional: true
description: Identifier of the plugin (when `type` is `"plugin"`).
- name: server_label
type: string
optional: true
description: Label of the MCP server (when `type` is `"ephemeral_mcp"`).
- name: type
type: '"tool_call.failure"'
description: The type of the event. Always `tool_call.failure`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "tool_call.failure",
"reason": "Cannot find tool with name open_browser.",
"metadata": {
"type": "invalid_name",
"tool_name": "open_browser"
}
}
```
````
### `message.start`
````lms_hstack
Signals the model is about to stream a message.
```lms_params
- name: type
type: '"message.start"'
description: The type of the event. Always `message.start`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "message.start"
}
```
````
### `message.delta`
````lms_hstack
A chunk of message content. Multiple deltas may arrive.
```lms_params
- name: content
type: string
description: Message text fragment.
- name: type
type: '"message.delta"'
description: The type of the event. Always `message.delta`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "message.delta",
"content": "The current"
}
```
````
### `message.end`
````lms_hstack
Signals the end of the message stream.
```lms_params
- name: type
type: '"message.end"'
description: The type of the event. Always `message.end`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "message.end"
}
```
````
### `error`
````lms_hstack
An error occurred during streaming. The final payload will still be sent in `chat.end` with whatever was generated.
```lms_params
- name: error
type: object
description: Error information.
children:
- name: type
type: '"invalid_request" | "unknown" | "mcp_connection_error" | "plugin_connection_error" | "not_implemented" | "model_not_found" | "job_not_found" | "internal_error"'
description: High-level error type.
- name: message
type: string
description: Human-readable error message.
- name: code
type: string
optional: true
description: More detailed error code (e.g., validation issue code).
- name: param
type: string
optional: true
description: Parameter associated with the error, if applicable.
- name: type
type: '"error"'
description: The type of the event. Always `error`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "error",
"error": {
"type": "invalid_request",
"message": "\"model\" is required",
"code": "missing_required_parameter",
"param": "model"
}
}
```
````
### `chat.end`
````lms_hstack
Final event containing the full aggregated response, equivalent to the non-streaming `POST /api/v1/chat` response body.
```lms_params
- name: result
type: object
description: Final response with `model_instance_id`, `output`, `stats`, and optional `response_id`. See [non-streaming chat docs](/docs/developer/rest/chat) for more details.
- name: type
type: '"chat.end"'
description: The type of the event. Always `chat.end`.
```
:::split:::
```lms_code_snippet
title: Example Event Data
variants:
json:
language: json
code: |
{
"type": "chat.end",
"result": {
"model_instance_id": "openai/gpt-oss-20b",
"output": [
{ "type": "reasoning", "content": "Need to call function." },
{
"type": "tool_call",
"tool": "model_search",
"arguments": { "sort": "trendingScore", "limit": 1 },
"output": "[{\"type\":\"text\",\"text\":\"Showing first 1 models...\"}]",
"provider_info": { "type": "ephemeral_mcp", "server_label": "huggingface" }
},
{ "type": "message", "content": "The current topātrending model is..." }
],
"stats": {
"input_tokens": 329,
"total_output_tokens": 268,
"reasoning_output_tokens": 5,
"tokens_per_second": 43.73,
"time_to_first_token_seconds": 0.781
},
"response_id": "resp_02b2017dbc06c12bfc353a2ed6c2b802f8cc682884bb5716"
}
}
```
````
### Chat with a model
> Send a message to a model and receive a response. Supports MCP integration.
````lms_hstack
`POST /api/v1/chat`
**Request body**
```lms_params
- name: model
type: string
optional: false
description: Unique identifier for the model to use.
- name: input
type: string | array