Create Chat Completion

Create a model response for the given chat conversation.

Parameter support can differ depending on the model used to generate the response, particularly for newer reasoning models. Parameters that are only supported for reasoning models are noted below.

Header Parameters

OpenAI-Organization?Openai-Organization

OpenAI-Project?Openai-Project

Request Body

messagesMessages

A list of messages comprising the conversation so far. Depending on the model you use,different message types (modalities) are supported, like text, images, and audio.

modelModel

Model ID used to generate the response, like llama3 or deepseek-r1. Deepfellow supports a wide range of models with different capabilities, and performance characteristics.

tools?Tools

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

tool_choice?Tool Choice

Controls which (if any) tool is called by the model. null means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.

none is the default when no tools are present. auto is the default if tools are present.

Default"auto"

temperature?Temperature

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default0.7

top_p?Top P

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

Default1

n?N

How many chat completion choices to generate for each input message. Keep n as 1 to receive one choice per input message.

Default1

stream?Stream

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

Defaultfalse

max_completion_tokens?Max Completion Tokens

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

max_tokens?Max TokensDeprecated

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

This value is now deprecated in favor of max_completion_tokens, and is not compatible with o-series models.

response_format?ResponseFormat | null

An object specifying the format that the model must output.

Setting to {"type": "json_schema", "json_schema": {...}} enables Structured Outputs which ensures the model will match your supplied JSON schema.

Setting to {"type": "json_object"} enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.

seed?Seed

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

stop?Stop

Might not be supported with latest reasoning models like o3 and o4-mini.

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

user?User

A stable identifier for your end-users. Used to boost cache hit rates by better bucketing similar requests.

safety_identifier?Safety Identifier

A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.

prompt_cache_key?Prompt Cache Key

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.

Response Body

curl -X POST "https://loading/v1/chat/completions" \
  -H "OpenAI-Organization: 5eb7cf5a86d9755df3a6c593" \
  -H "OpenAI-Project: 5eb7cf5a86d9755df3a6c593" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "content": "You are a helpful assistant.",
        "role": "system"
      },
      {
        "content": "Hello!",
        "role": "user"
      }
    ],
    "model": "llama3.1:8b",
    "stream": false
  }'

curl -X POST "https://loading/v1/chat/completions" \  -H "OpenAI-Organization: 5eb7cf5a86d9755df3a6c593" \  -H "OpenAI-Project: 5eb7cf5a86d9755df3a6c593" \  -H "Content-Type: application/json" \  -d '{    "messages": [      {        "content": "You are a helpful assistant.",        "role": "system"      },      {        "content": "Hello!",        "role": "user"      }    ],    "model": "llama3.1:8b",    "stream": false  }'

{
  "id": "chatcmpl-395",
  "object": "chat.completion",
  "created": 1747751038,
  "model": "llama3.1:8b",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "It's nice to meet you. Is there something I can help you with?",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "total_tokens": 47
  },
  "system_fingerprint": "fp_ollama"
}

{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Authorization

Header Parameters

Request Body

Response Body

200

422