Create Chat Completion

Create a model response for the given prompt.

Given a prompt, the model will return one or more predicted completions along with the probabilities of alternative tokens at each position.

Proxy validated completion request to defined API. Needs an user bearer token in the authorization header.

Authorization

AuthorizationBearer <token>

In: header

Header Parameters

OpenAI-Organization?Openai-Organization

OpenAI-Project?Openai-Project

Request Body

modelModel

ID of the model to use.

promptPrompt

The prompt(s) to generate completions for - can be string, array of strings, tokens, or token arrays.

best_of?Best Of

Generate best_of completions and return the best one. Must be greater than n if both are set.

Default1

echo?Echo

Echo back the prompt in addition to the completion.

Defaultfalse

frequency_penalty?Frequency Penalty

Penalize new tokens based on their existing frequency (-2.0 to 2.0).

Default0

logit_bias?Logit Bias

Modify likelihood of specified tokens appearing (token ID -> bias from -100 to 100).

logprobs?Logprobs

Include log probabilities on the most likely tokens (max 5).

max_tokens?Max Tokens

Maximum number of tokens to generate in the completion.

Default16

n?N

How many completions to generate for each prompt.

Default1

presence_penalty?Presence Penalty

Penalize new tokens based on whether they appear in the text so far (-2.0 to 2.0).

Default0

seed?Seed

Seed for deterministic sampling (best effort, not guaranteed).

stop?Stop

Up to 4 sequences where the API will stop generating.

stream?Stream

Stream back partial progress as server-sent events.

Defaultfalse

stream_options?StreamOptions | null

Options for streaming response (only used when stream=True).

suffix?Suffix

Suffix after completion of inserted text.

temperature?Temperature

Sampling temperature (0-2). Higher = more random, lower = more deterministic.

Default1

top_p?Top P

Nucleus sampling - consider tokens with top_p probability mass (0-1).

Default1

user?User

Unique identifier for end-user to help monitor and detect abuse.

Response Body

curl -X POST "https://loading/v1/completions" \
  -H "OpenAI-Organization: 5eb7cf5a86d9755df3a6c593" \
  -H "OpenAI-Project: 5eb7cf5a86d9755df3a6c593" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "prompt": "Hello World",
    "stream": false
  }'

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "text_offset": [
          0
        ],
        "token_logprobs": [
          0
        ],
        "tokens": [
          "string"
        ],
        "top_logprobs": [
          {
            "property1": 0,
            "property2": 0
          }
        ]
      },
      "text": "string"
    }
  ],
  "created": 0,
  "id": "string",
  "model": "string",
  "object": "text_completion",
  "system_fingerprint": "string",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  }
}

{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Authorization

Header Parameters

Request Body

Response Body

200

422