Documentation
LM Studio REST API
LM Studio REST API
Send a message to a model and receive a response. Supports MCP integration.
POST /api/v1/chat
Request body
model : string
Unique identifier for the model to use.
input : string | array<object>
Message to send to the model.
Input text : string
Text content of the message.
Input object : object
Object representing a message with additional metadata.
Text Input (optional) : object
Text input to provide user messages
type : "message"
Type of input item.
content : string
Text content of the message.
Image Input (optional) : object
Image input to provide user messages
type : "image"
Type of input item.
data_url : string
Image data as a base64-encoded data URL.
system_prompt (optional) : string
System message that sets model behavior or instructions.
integrations (optional) : array<string | object>
List of integrations (plugins, ephemeral MCP servers, etc...) to enable for this request.
Plugin id : string
Unique identifier of a plugin to use. Plugins contain mcp.json installed MCP servers (id mcp/<server_label>). Shorthand for plugin object with no custom configuration.
Plugin : object
Specification of a plugin to use. Plugins contain mcp.json installed MCP servers (id mcp/<server_label>).
type : "plugin"
Type of integration.
id : string
Unique identifier of the plugin.
allowed_tools (optional) : array<string>
List of tool names the model can call from this plugin. If not provided, all tools from the plugin are allowed.
Ephemeral MCP server specification : object
Specification of an ephemeral MCP server. Allows defining MCP servers on-the-fly without needing to pre-configure them in your mcp.json.
type : "ephemeral_mcp"
Type of integration.
server_label : string
Label to identify the MCP server.
server_url : string
URL of the MCP server.
allowed_tools (optional) : array<string>
List of tool names the model can call from this server. If not provided, all tools from the server are allowed.
headers (optional) : object
Custom HTTP headers to send with requests to the server.
stream (optional) : boolean
Whether to stream partial outputs via SSE. Default false. See streaming events for more information.
temperature (optional) : number
Randomness in token selection. 0 is deterministic, higher values increase creativity [0,1].
top_p (optional) : number
Minimum cumulative probability for the possible next tokens [0,1].
top_k (optional) : integer
Limits next token selection to top-k most probable tokens.
min_p (optional) : number
Minimum base probability for a token to be selected for output [0,1].
repeat_penalty (optional) : number
Penalty for repeating token sequences. 1 is no penalty, higher values discourage repetition.
max_output_tokens (optional) : integer
Maximum number of tokens to generate.
reasoning (optional) : "off" | "low" | "medium" | "high" | "on"
Reasoning setting. Will error if the model being used does not support the reasoning setting using. Defaults to the automatically chosen setting for the model.
context_length (optional) : integer
Number of tokens to consider as context. Higher values recommended for MCP usage.
store (optional) : boolean
Whether to store the chat. If set, response will return a "response_id" field. Default true.
previous_response_id (optional) : string
Identifier of existing response to append to. Must start with "resp_".
curl http://localhost:1234/api/v1/chat \
-H "Authorization: Bearer $LM_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "ibm/granite-4-micro",
"input": "Tell me the top trending model on hugging face and navigate to https://lmstudio.ai",
"integrations": [
{
"type": "ephemeral_mcp",
"server_label": "huggingface",
"server_url": "https://huggingface.co/mcp",
"allowed_tools": [
"model_search"
]
},
{
"type": "plugin",
"id": "mcp/playwright",
"allowed_tools": [
"browser_navigate"
]
}
],
"context_length": 8000,
"temperature": 0
}'
Response fields
model_instance_id : string
Unique identifier for the loaded model instance that generated the response.
output : array<object>
Array of output items generated. Each item can be one of three types.
Message : object
A text message from the model.
type : "message"
Type of output item.
content : string
Text content of the message.
Tool call : object
A tool call made by the model.
type : "tool_call"
Type of output item.
tool : string
Name of the tool called.
arguments : object
Arguments passed to the tool. Can have any keys/values depending on the tool definition.
output : string
Result returned from the tool.
provider_info : object
Information about the tool provider.
type : "plugin" | "ephemeral_mcp"
Provider type.
plugin_id (optional) : string
Identifier of the plugin (when type is "plugin").
server_label (optional) : string
Label of the MCP server (when type is "ephemeral_mcp").
Reasoning : object
Reasoning content from the model.
type : "reasoning"
Type of output item.
content : string
Text content of the reasoning.
Invalid tool call : object
An invalid tool call made by the model - due to invalid tool name or tool arguments.
type : "invalid_tool_call"
Type of output item.
reason : string
Reason why the tool call was invalid.
metadata : object
Metadata about the invalid tool call.
type : "invalid_name" | "invalid_arguments"
Type of error that occurred.
tool_name : string
Name of the tool that was attempted to be called.
arguments (optional) : object
Arguments that were passed to the tool (only present for invalid_arguments errors).
provider_info (optional) : object
Information about the tool provider (only present for invalid_arguments errors).
type : "plugin" | "ephemeral_mcp"
Provider type.
plugin_id (optional) : string
Identifier of the plugin (when type is "plugin").
server_label (optional) : string
Label of the MCP server (when type is "ephemeral_mcp").
stats : object
Token usage and performance metrics.
input_tokens : number
Number of input tokens. Includes formatting, tool definitions, and prior messages in the chat.
total_output_tokens : number
Total number of output tokens generated.
reasoning_output_tokens : number
Number of tokens used for reasoning.
tokens_per_second : number
Generation speed in tokens per second.
time_to_first_token_seconds : number
Time in seconds to generate the first token.
model_load_time_seconds (optional) : number
Time taken to load the model for this request in seconds. Present only if the model was not already loaded.
response_id (optional) : string
Identifier of the response for subsequent requests. Starts with "resp_". Present when store is true.
{
"model_instance_id": "ibm/granite-4-micro",
"output": [
{
"type": "tool_call",
"tool": "model_search",
"arguments": {
"sort": "trendingScore",
"query": "",
"limit": 1
},
"output": "...",
"provider_info": {
"server_label": "huggingface",
"type": "ephemeral_mcp"
}
},
{
"type": "message",
"content": "..."
},
{
"type": "tool_call",
"tool": "browser_navigate",
"arguments": {
"url": "https://lmstudio.ai"
},
"output": "...",
"provider_info": {
"plugin_id": "mcp/playwright",
"type": "plugin"
}
},
{
"type": "message",
"content": "**Top Trending Model on Hugging Face** ... Below is a quick snapshot of what’s on the landing page ... more details on the model or LM Studio itself!"
}
],
"stats": {
"input_tokens": 646,
"total_output_tokens": 586,
"reasoning_output_tokens": 0,
"tokens_per_second": 29.753900615398926,
"time_to_first_token_seconds": 1.088,
"model_load_time_seconds": 2.656
},
"response_id": "resp_4ef013eba0def1ed23f19dde72b67974c579113f544086de"
}
This page's source is available on GitHub