Documentation

LM Studio REST API

List your models

Get a list of available models on your system, including both LLMs and embedding models.

GET /api/v1/models

This endpoint has no request parameters.

curl http://localhost:1234/api/v1/models \
  -H "Authorization: Bearer $LM_API_TOKEN"

Response fields

models : array

List of available models (both LLMs and embedding models).

type : "llm" | "embedding"

Type of model.

publisher : string

Model publisher name.

key : string

Unique identifier for the model.

display_name : string

Human-readable model name.

architecture (optional) : string | null

Model architecture (e.g., "llama", "mistral"). Absent for embedding models.

quantization : object | null

Quantization information for the model.

name : string | null

Quantization method name.

bits_per_weight : number | null

Bits per weight for the quantization.

size_bytes : number

Size of the model in bytes.

params_string : string | null

Human-readable parameter count (e.g., "7B", "13B").

loaded_instances : array

List of currently loaded instances of this model.

id : string

Unique identifier for the loaded model instance.

config : object

Configuration for the loaded instance.

context_length : number

The maximum context length for the model in number of tokens.

eval_batch_size (optional) : number

Number of input tokens to process together in a single batch during evaluation. Absent for embedding models.

flash_attention (optional) : boolean

Whether Flash Attention is enabled for optimized attention computation. Absent for embedding models.

num_experts (optional) : number

Number of experts for MoE (Mixture of Experts) models. Absent for embedding models.

offload_kv_cache_to_gpu (optional) : boolean

Whether KV cache is offloaded to GPU memory. Absent for embedding models.

max_context_length : number

Maximum context length supported by the model in number of tokens.

format : "gguf" | "mlx" | null

Model file format.

capabilities (optional) : object

Model capabilities. Absent for embedding models.

vision : boolean

Whether the model supports vision/image inputs.

trained_for_tool_use : boolean

Whether the model was trained for tool/function calling.

description (optional) : string | null

Model description. Absent for embedding models.

{
  "models": [
    {
      "type": "llm",
      "publisher": "lmstudio-community",
      "key": "gemma-3-270m-it-qat",
      "display_name": "Gemma 3 270m Instruct Qat",
      "architecture": "gemma3",
      "quantization": {
        "name": "Q4_0",
        "bits_per_weight": 4
      },
      "size_bytes": 241410208,
      "params_string": "270M",
      "loaded_instances": [
        {
          "id": "gemma-3-270m-it-qat",
          "config": {
            "context_length": 4096,
            "eval_batch_size": 512,
            "flash_attention": false,
            "num_experts": 0,
            "offload_kv_cache_to_gpu": true
          }
        }
      ],
      "max_context_length": 32768,
      "format": "gguf",
      "capabilities": {
        "vision": false,
        "trained_for_tool_use": false
      },
      "description": null
    },
    {
      "type": "embedding",
      "publisher": "gaianet",
      "key": "text-embedding-nomic-embed-text-v1.5-embedding",
      "display_name": "Nomic Embed Text v1.5",
      "quantization": {
        "name": "F16",
        "bits_per_weight": 16
      },
      "size_bytes": 274290560,
      "params_string": null,
      "loaded_instances": [],
      "max_context_length": 2048,
      "format": "gguf"
    }
  ]
}

This page's source is available on GitHub