Documentation
LM Studio REST API
LM Studio REST API
Get a list of available models on your system, including both LLMs and embedding models.
GET /api/v1/models
This endpoint has no request parameters.
curl http://localhost:1234/api/v1/models \
-H "Authorization: Bearer $LM_API_TOKEN"
Response fields
models : array
List of available models (both LLMs and embedding models).
type : "llm" | "embedding"
Type of model.
publisher : string
Model publisher name.
key : string
Unique identifier for the model.
display_name : string
Human-readable model name.
architecture (optional) : string | null
Model architecture (e.g., "llama", "mistral"). Absent for embedding models.
quantization : object | null
Quantization information for the model.
name : string | null
Quantization method name.
bits_per_weight : number | null
Bits per weight for the quantization.
size_bytes : number
Size of the model in bytes.
params_string : string | null
Human-readable parameter count (e.g., "7B", "13B").
loaded_instances : array
List of currently loaded instances of this model.
id : string
Unique identifier for the loaded model instance.
config : object
Configuration for the loaded instance.
context_length : number
The maximum context length for the model in number of tokens.
eval_batch_size (optional) : number
Number of input tokens to process together in a single batch during evaluation. Absent for embedding models.
flash_attention (optional) : boolean
Whether Flash Attention is enabled for optimized attention computation. Absent for embedding models.
num_experts (optional) : number
Number of experts for MoE (Mixture of Experts) models. Absent for embedding models.
offload_kv_cache_to_gpu (optional) : boolean
Whether KV cache is offloaded to GPU memory. Absent for embedding models.
max_context_length : number
Maximum context length supported by the model in number of tokens.
format : "gguf" | "mlx" | null
Model file format.
capabilities (optional) : object
Model capabilities. Absent for embedding models.
vision : boolean
Whether the model supports vision/image inputs.
trained_for_tool_use : boolean
Whether the model was trained for tool/function calling.
description (optional) : string | null
Model description. Absent for embedding models.
{
"models": [
{
"type": "llm",
"publisher": "lmstudio-community",
"key": "gemma-3-270m-it-qat",
"display_name": "Gemma 3 270m Instruct Qat",
"architecture": "gemma3",
"quantization": {
"name": "Q4_0",
"bits_per_weight": 4
},
"size_bytes": 241410208,
"params_string": "270M",
"loaded_instances": [
{
"id": "gemma-3-270m-it-qat",
"config": {
"context_length": 4096,
"eval_batch_size": 512,
"flash_attention": false,
"num_experts": 0,
"offload_kv_cache_to_gpu": true
}
}
],
"max_context_length": 32768,
"format": "gguf",
"capabilities": {
"vision": false,
"trained_for_tool_use": false
},
"description": null
},
{
"type": "embedding",
"publisher": "gaianet",
"key": "text-embedding-nomic-embed-text-v1.5-embedding",
"display_name": "Nomic Embed Text v1.5",
"quantization": {
"name": "F16",
"bits_per_weight": 16
},
"size_bytes": 274290560,
"params_string": null,
"loaded_instances": [],
"max_context_length": 2048,
"format": "gguf"
}
]
}
This page's source is available on GitHub