Custom Models

DeepFellow allows you to install, configure, and use custom models in different DeepFellow Infra services. This guide is divided into sections describing how to use custom model with each service.

All custom models are installed using Infra Web Panel.

Choosing Services

Available LLM services – ollama, llamacpp, and vllm differ in their level of hardware integration, including dependencies on specific CPU instruction sets.

To minimize hardware compatibility issues, consider the following services:

ollama – Recommended for most users. Automatically adapts to your hardware configuration with minimal setup required.
llamacpp – Supports models outside the ollama repository and the GGUF model format. May require extra configuration due to a higher chance of hardware compatibility issues.
vllm – Offers the highest performance but carries the highest risk of hardware-related complications. Recommended for experienced users who are confident in troubleshooting and system configuration.

Recommendation: If you're not sure which service to choose, start with ollama.

Ollama Models

You can install any model available in Ollama library.

Install

services view showing ollama tab with two buttons: "models" and "uninstall"

As an example, if you want to add qwen3-embedding:0.6b to your 'ollama' service:

In services view, locate "ollama" service.
Click "Models".
Click "Add custom model".
In the pop-up window enter Model ID: qwen3-embedding:0.6b.
Enter Size: 639MB.
Choose embedding Model type from the drop-down.
Click "Add custom model". New model tab will appear.
Optionally, enter model alias and its idle timeout (i.e., how long should this model last when it isn't used, e.g. "5m" for five minutes).
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Pop-up showing fields to fill: "model ID", "size", "model type".

Verify

To use qwen3-embedding:0.6b – call POST v1/embeddings endpoint, since it's an embedding model.

curl -X 'POST' \
  'https://deepfellow-server-host/v1/embeddings' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "encoding_format": "float",
  "input": "Hello, how are you?",
  "model": "qwen3-embedding:0.6b" # here you can pass model alias instead
}'

import requests

response = requests.post(
    'https://deepfellow-server-host/v1/embeddings',
    json={
        "encoding_format": "float",
        "input": "Hello, how are you?",
        "model": "qwen3-embedding:0.6b" # here you can pass model alias instead
    },
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/v1/embeddings', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    },
    body: JSON.stringify({
        encoding_format: 'float',
        input: 'Hello, how are you?',
        model: 'qwen3-embedding:0.6b' // here you can pass model alias instead
    })
});

const data = await response.json();
console.log(data);

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.001106513,
        -0.0033979767,
        -0.004260577,
        ...
        -0.008845862,
        0.004314483
      ],
      "index": 0
    }
  ],
  "model": "qwen3-embedding:0.6b",
  "usage": {
    "prompt_tokens": 6,
    "total_tokens": 6
  }
}

Uninstall

To uninstall model, go to the 'ollama' service model view, find model ID qwen3-embedding:0.6b, click "Uninstall" and finally click "Remove custom model".

GGUF Models

You can install any model available in HuggingFace in GGUF format.

Use llamacpp service.

Install

As an example, if you want to add ggml-org/gemma-3-4b-it-GGUF to your 'llamacpp' service:

In services view, locate "llamacpp" service.
Click "Models".
Click "Add custom model".
In the pop-up window enter Model ID: gemma-3-4b-it.
Enter Model URL: https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_K_M.gguf?download=true.
Enter Size: 2.49GB,
Click "Add custom model". New model tab will appear.
Optionally, enter model alias.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Verify

To use gemma-3-4b-it – call POST v1/chat/completions endpoint, since it's a chat model.

curl -X 'POST' \
  'https://deepfellow-server-host/v1/chat/completions' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_completion_tokens": 50,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "hello!",
      "role": "user"
    }
  ],
  "model": "gemma-3-4b-it" # here you can pass model alias instead
}'

import requests

response = requests.post(
    'https://deepfellow-server-host/v1/chat/completions',
    json={
      "max_completion_tokens": 50,
      "messages": [
        {
          "content": "You are a helpful assistant.",
          "role": "system"
        },
        {
          "content": "hello!",
          "role": "user"
        }
      ],
      "model": "gemma-3-4b-it" # here you can pass model alias instead
    },
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    },
    body: JSON.stringify({
        max_completion_tokens: 50,
        messages: [
            {
                content: 'You are a helpful assistant.',
                role: 'system'
            },
            {
                content: 'hello!',
                role: 'user'
            }
        ],
        model: 'gemma-3-4b-it' // here you can pass model alias instead
    })
});

const data = await response.json();
console.log(data);

Response:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello there! How can I help you today? 😊 \n\nDo you have a question, need some information, or just want to chat? Let me know!"
            }
        }
    ],
    "created": 1760625765,
    "model": "gemma-3-4b-it",
    "system_fingerprint": "b6620-b887d2f3",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 34,
        "prompt_tokens": 18,
        "total_tokens": 52
    },
    "id": "chatcmpl-W7PglxNtJVdM3yAgUO3fvnR1uAyS8WU7",
    "timings": {
        "cache_n": 0,
        "prompt_n": 18,
        "prompt_ms": 26.931,
        "prompt_per_token_ms": 1.4961666666666666,
        "prompt_per_second": 668.3747354350005,
        "predicted_n": 34,
        "predicted_ms": 197.006,
        "predicted_per_token_ms": 5.794294117647059,
        "predicted_per_second": 172.583576134737
    }
}

Uninstall

To uninstall model, go to the 'llamacpp' service model view, find model ID gemma-3-4b-it, click "Uninstall" and finally click "Remove custom model".

vLLM Compatible Models

You can install any model available in HuggingFace supported by vLLM.

Use vllm service.

Install

As an example, if you want to add Qwen3-1.7B to your 'llamacpp' service:

In services view, locate "vllm" service.
Click "Models"
Click "Add custom model".
In the pop-up window enter Model ID: Qwen3-1.7B.
Enter HuggingFace model ID: Qwen/Qwen3-1.7B.
Enter Model Size: 4GB.
Click "Add custom model". New model tab will appear.
Optionally, enter model alias.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Verify

To use Qwen/Qwen3-1.7B – call POST v1/chat/completions endpoint, since it's a chat model.

curl -X 'POST' \
  'https://deepfellow-server-host/v1/chat/completions' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_completion_tokens": 50,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "hello!",
      "role": "user"
    }
  ],
  "model": "Qwen/Qwen3-1.7B" # here you can pass model alias instead
}'

import requests

response = requests.post(
    'https://deepfellow-server-host/v1/chat/completions',
    json={
      "max_completion_tokens": 50,
      "messages": [
        {
          "content": "You are a helpful assistant.",
          "role": "system"
        },
        {
          "content": "hello!",
          "role": "user"
        }
      ],
      "model": "Qwen/Qwen3-1.7B" # here you can pass model alias instead
    },
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    },
    body: JSON.stringify({
        max_completion_tokens: 50,
        messages: [
            {
                content: 'You are a helpful assistant.',
                role: 'system'
            },
            {
                content: 'hello!',
                role: 'user'
            }
        ],
        model: 'Qwen/Qwen3-1.7B' // here you can pass model alias instead
    })
});

const data = await response.json();
console.log(data);

Response:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I help you?"
            }
        }
    ],
    "created": 1760629812,
    "model": "Qwen/Qwen3-1.7B",
    "system_fingerprint": "b6620-b887d2f3",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 34,
        "prompt_tokens": 18,
        "total_tokens": 52
    },
    "id": "chatcmpl-uZSktK05yOus1NDtftpT8wFkuV9jHNrN",
    "timings": {
        "cache_n": 17,
        "prompt_n": 1,
        "prompt_ms": 7.488,
        "prompt_per_token_ms": 7.488,
        "prompt_per_second": 133.54700854700855,
        "predicted_n": 34,
        "predicted_ms": 191.626,
        "predicted_per_token_ms": 5.636058823529412,
        "predicted_per_second": 177.42895014246503
    }
}

Uninstall

To uninstall model, go to the 'vllm' service model view, find model ID Qwen/Qwen3-1.7B, click "Uninstall" and finally click "Remove custom model".

Stable-Diffusion Models

You can install any image generation model compatible with stable diffusion (e.g., from Civitai, HuggingFace).

Use stable-diffusion service.

Install

As an example, if you want to add SDVN6-RealXL to your 'llamacpp' service:

In services view, locate "stable-diffusion" service.
Click "Models".
Click "Add custom model".
In the pop-up window enter Model ID: SDVN6-RealXL.
Choose file type: Stable-diffusion.
Enter Model File URL (Download link): https://civitai.com/api/download/models/134461?type=Model&format=SafeTensor&size=full&fp=fp16.
Enter Model filename: sdvn6Realxl_detailface.safetensors
Enter Model Size: 4GB.
Click "Add custom model". New model tab will appear.
Optionally enter model alias.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Verify

See Image Generation guide to learn how to use stable-diffusion models.

Uninstall

To uninstall model, go to the 'stable-diffusion' service model view, find model ID stable-diffusion, click "Uninstall" and finally click "Remove custom model".

Docker Image Service

You can connect any containerized model, application, or microservice – literally anything that provides Web API.

Install

For example, if you want to add your custom application with ID my-app and providing /api endpoint:

In services view, locate "custom" service.
Click "Models".
Click "Add custom model" button.
In the pop-up window enter Model ID: my-app.
Check "Model is private" box, if you want to use our anonymization plugin to filter your traffic to/from the app.
Enter Model endpoint, e.g. /api.
Enter Model Size.
In Docker image enter ID of the image, e.g. company/my-app (you can find this ID in registry, e.g., DockerHub).
Enter Docker image port number, e.g. 8080.
Optionally, configure Docker command, Docker volumes, and Docker environment variables.
Click "Add custom model". New model tab will appear.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Pop-up showing configuration form for docker image

Verify

Use /custom/{path} endpoint to communicate to your image. For example, if you put /abc in the model endpoint configuration, use /custom/abc:

curl -X 'GET' \
  'https://deepfellow-server-host/custom/api' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json'

import requests

response = requests.post(
    'https://deepfellow-server-host/custom/api',
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/custom/api', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    }
});

const data = await response.json();
console.log(data);

Uninstall

To uninstall model, go to the custom service model view, find model ID my-app, click "Uninstall" and finally click "Remove custom model".

OpenAI Models

DeepFellow allows you to add your custom models stored at OpenAI, e.g. your fine-tuned models.

To create your own optimized OpenAI model read supervised fine-tuning guide in OpenAI documentation.

Install

Go to your OpenAI Finetuning Dashboard.
Select model from the list.
Copy the value at "Output model" field, e.g. ft:gpt-3.5-turbo-0125:personal::95X23ObX.
Open DeepFellow Infra Web Panel.
In services view, locate "openai" service.
Click "Models".
Click "Add custom model" .
In the pop-up window enter the copied value to Model ID field: ft:gpt-3.5-turbo-0125:personal::95X23ObX.
Choose Model type from the drop-down (e.g. llm).
Check boxes relevant to the supported endpoints by the added model.
Click "Add custom model". New model tab will appear.
Optionally, enter model alias.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

OpenAI dashboard showing list of user personal fine-tuned models and training statistics Pop-up with "Add custom model for openai" text, model ID field, model type field, checkbox saying "support v1/chat/completions", and checkbox saying "support v1/completions".

Verify

The example model's ID is ft:gpt-3.5-turbo-0125:personal::95X23ObX, its type is llm and it supports /v1/chat/completions. To verify it, do the following:

curl -X 'POST' \
  'https://deepfellow-server-host/v1/chat/completions' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_completion_tokens": 50,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "hello!",
      "role": "user"
    }
  ],
  "model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX" # here you can pass model alias instead
}'

import requests

response = requests.post(
    'https://deepfellow-server-host/v1/chat/completions',
    json={
      "max_completion_tokens": 50,
      "messages": [
        {
          "content": "You are a helpful assistant.",
          "role": "system"
        },
        {
          "content": "hello!",
          "role": "user"
        }
      ],
      "model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX" # here you can pass model alias instead
    },
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    },
    body: JSON.stringify({
        max_completion_tokens: 50,
        messages: [
            {
                content: 'You are a helpful assistant.',
                role: 'system'
            },
            {
                content: 'hello!',
                role: 'user'
            }
        ],
        model: 'ft:gpt-3.5-turbo-0125:personal::95X23ObX' // here you can pass model alias instead
    })
});

const data = await response.json();
console.log(data);

Response:

{
    "id": "chatcmpl-CRg4goc3tD0uFr3lPdZi2ttaGBA8c",
    "object": "chat.completion",
    "created": 1760712626,
    "model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 9,
        "total_tokens": 28,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": null
}

In services view, locate "google" service.
Click "Models".
Click "Add custom model".
In the pop-up window enter the copied value to Model ID field.
Choose Model type from the drop-down (e.g. llm).
Check boxes relevant to the supported endpoints by the added model.
Click "Add custom model". New model tab will appear.
Optionally, enter model alias.
Finally click "Install".
After a while your model will appear in the models list with green label "Installed".

Verify

Assuming your model is compatible with /v1/chat/completions, verify it by doing the following:

curl -X 'POST' \
  'https://deepfellow-server-host/v1/chat/completions' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_completion_tokens": 50,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "hello!",
      "role": "user"
    }
  ],
  "model": "MODEL-ID" # here you can pass model alias instead
}'

import requests

response = requests.post(
    'https://deepfellow-server-host/v1/chat/completions',
    json={
      "max_completion_tokens": 50,
      "messages": [
        {
          "content": "You are a helpful assistant.",
          "role": "system"
        },
        {
          "content": "hello!",
          "role": "user"
        }
      ],
      "model": "MODEL-ID" # here you can pass model alias instead
    },
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
    }
)

data = response.json()
print(data)

const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
    },
    body: JSON.stringify({
        max_completion_tokens: 50,
        messages: [
            {
                content: 'You are a helpful assistant.',
                role: 'system'
            },
            {
                content: 'hello!',
                role: 'user'
            }
        ],
        model: 'MODEL-ID' // here you can pass model alias instead
    })
});

const data = await response.json();
console.log(data);

You will get a similar response:

{
    "id": "chatcmpl-CRg4goc3tD0uFr3lPdZi2ttaGBA8c",
    "object": "chat.completion",
    "created": 1760712628,
    "model": "MODEL-ID",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 19,
        "completion_tokens": 9,
        "total_tokens": 28,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": null
}

Uninstall

To uninstall model, go to the 'google' service model view, find your model ID, click "Uninstall" and finally click "Remove custom model".

Ollama Models

Install

Verify

Uninstall

GGUF Models

Install

Verify

Uninstall

vLLM Compatible Models

Install

Verify

Uninstall

Stable-Diffusion Models

Install

Verify

Uninstall

Docker Image Service

Install

Verify

Uninstall

OpenAI Models

Install

Verify

Uninstall

Google AI Models

Install

Verify

Uninstall

On this page