Create Chat Completion
Create a model response for the given chat conversation.
Parameter support can differ depending on the model used to generate the response, particularly for newer reasoning models. Parameters that are only supported for reasoning models are noted below.
In: header
Header Parameters
A list of messages comprising the conversation so far. Depending on the model you use,different message types (modalities) are supported, like text, images, and audio.
Model ID used to generate the response, like llama3 or deepseek-r1. Deepfellow supports a wide range of models with different capabilities, and performance characteristics.
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
Controls which (if any) tool is called by the model. null means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
none is the default when no tools are present. auto is the default if tools are present.
"auto"What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
0.7An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
1How many chat completion choices to generate for each input message. Keep n as 1 to receive one choice per input message.
1If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
This value is now deprecated in favor of max_completion_tokens, and is not compatible with o-series models.
An object specifying the format that the model must output.
Setting to {"type": "json_schema", "json_schema": {...}} enables Structured Outputs which ensures the model will match your supplied JSON schema.
Setting to {"type": "json_object"} enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
Might not be supported with latest reasoning models like o3 and o4-mini.
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
A stable identifier for your end-users. Used to boost cache hit rates by better bucketing similar requests.
A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.
Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.
Response Body
curl -X POST "https://loading/v1/chat/completions" \ -H "OpenAI-Organization: 5eb7cf5a86d9755df3a6c593" \ -H "OpenAI-Project: 5eb7cf5a86d9755df3a6c593" \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "content": "You are a helpful assistant.", "role": "system" }, { "content": "Hello!", "role": "user" } ], "model": "llama3.1:8b", "stream": false }'{
"id": "chatcmpl-395",
"object": "chat.completion",
"created": 1747751038,
"model": "llama3.1:8b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "It's nice to meet you. Is there something I can help you with?",
"role": "assistant"
}
}
],
"usage": {
"prompt_tokens": 23,
"total_tokens": 47
},
"system_fingerprint": "fp_ollama"
}{
"detail": [
{
"loc": [
"string"
],
"msg": "string",
"type": "string"
}
]
}We use cookies on our website. We use them to ensure proper functioning of the site and, if you agree, for purposes such as analytics, marketing, and targeting ads.