Custom Models
DeepFellow allows you to install, configure, and use custom models in different DeepFellow Infra services. This guide is divided into sections describing how to use custom model with each service.
All custom models are installed using Infra Web Panel.
Choosing Services
Available LLM services – ollama, llamacpp, and vllm differ in their level of hardware integration, including dependencies on specific CPU instruction sets.
To minimize hardware compatibility issues, consider the following services:
- ollama – Recommended for most users. Automatically adapts to your hardware configuration with minimal setup required.
- llamacpp – Supports models outside the ollama repository and the GGUF model format. May require extra configuration due to a higher chance of hardware compatibility issues.
- vllm – Offers the highest performance but carries the highest risk of hardware-related complications. Recommended for experienced users who are confident in troubleshooting and system configuration.
Recommendation: If you're not sure which service to choose, start with ollama.
Ollama Models
You can install any model available in Ollama library.
Install

As an example, if you want to add qwen3-embedding:0.6b to your 'ollama' service:
- In services view, locate "ollama" service.
- Click "Models".
- Click "Add custom model".
- In the pop-up window enter Model ID:
qwen3-embedding:0.6b. - Enter Size:
639MB. - Choose
embeddingModel type from the drop-down. - Click "Add custom model". New model tab will appear.
- Optionally, enter model alias and its idle timeout (i.e., how long should this model last when it isn't used, e.g. "5m" for five minutes).
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".

Verify
To use qwen3-embedding:0.6b – call POST v1/embeddings endpoint, since it's an embedding model.
curl -X 'POST' \
'https://deepfellow-server-host/v1/embeddings' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"encoding_format": "float",
"input": "Hello, how are you?",
"model": "qwen3-embedding:0.6b" # here you can pass model alias instead
}'import requests
response = requests.post(
'https://deepfellow-server-host/v1/embeddings',
json={
"encoding_format": "float",
"input": "Hello, how are you?",
"model": "qwen3-embedding:0.6b" # here you can pass model alias instead
},
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/v1/embeddings', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
},
body: JSON.stringify({
encoding_format: 'float',
input: 'Hello, how are you?',
model: 'qwen3-embedding:0.6b' // here you can pass model alias instead
})
});
const data = await response.json();
console.log(data);Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
-0.001106513,
-0.0033979767,
-0.004260577,
...
-0.008845862,
0.004314483
],
"index": 0
}
],
"model": "qwen3-embedding:0.6b",
"usage": {
"prompt_tokens": 6,
"total_tokens": 6
}
}Uninstall
To uninstall model, go to the 'ollama' service model view, find model ID qwen3-embedding:0.6b, click "Uninstall" and finally click "Remove custom model".
GGUF Models
You can install any model available in HuggingFace in GGUF format.
Use llamacpp service.
Install
As an example, if you want to add ggml-org/gemma-3-4b-it-GGUF to your 'llamacpp' service:
- In services view, locate "llamacpp" service.
- Click "Models".
- Click "Add custom model".
- In the pop-up window enter Model ID:
gemma-3-4b-it. - Enter Model URL:
https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_K_M.gguf?download=true. - Enter Size:
2.49GB, - Click "Add custom model". New model tab will appear.
- Optionally, enter model alias.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".
Verify
To use gemma-3-4b-it – call POST v1/chat/completions endpoint, since it's a chat model.
curl -X 'POST' \
'https://deepfellow-server-host/v1/chat/completions' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "gemma-3-4b-it" # here you can pass model alias instead
}'import requests
response = requests.post(
'https://deepfellow-server-host/v1/chat/completions',
json={
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "gemma-3-4b-it" # here you can pass model alias instead
},
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
},
body: JSON.stringify({
max_completion_tokens: 50,
messages: [
{
content: 'You are a helpful assistant.',
role: 'system'
},
{
content: 'hello!',
role: 'user'
}
],
model: 'gemma-3-4b-it' // here you can pass model alias instead
})
});
const data = await response.json();
console.log(data);Response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello there! How can I help you today? 😊 \n\nDo you have a question, need some information, or just want to chat? Let me know!"
}
}
],
"created": 1760625765,
"model": "gemma-3-4b-it",
"system_fingerprint": "b6620-b887d2f3",
"object": "chat.completion",
"usage": {
"completion_tokens": 34,
"prompt_tokens": 18,
"total_tokens": 52
},
"id": "chatcmpl-W7PglxNtJVdM3yAgUO3fvnR1uAyS8WU7",
"timings": {
"cache_n": 0,
"prompt_n": 18,
"prompt_ms": 26.931,
"prompt_per_token_ms": 1.4961666666666666,
"prompt_per_second": 668.3747354350005,
"predicted_n": 34,
"predicted_ms": 197.006,
"predicted_per_token_ms": 5.794294117647059,
"predicted_per_second": 172.583576134737
}
}Uninstall
To uninstall model, go to the 'llamacpp' service model view, find model ID gemma-3-4b-it, click "Uninstall" and finally click "Remove custom model".
vLLM Compatible Models
You can install any model available in HuggingFace supported by vLLM.
Use vllm service.
Install
As an example, if you want to add Qwen3-1.7B to your 'llamacpp' service:
- In services view, locate "vllm" service.
- Click "Models"
- Click "Add custom model".
- In the pop-up window enter Model ID:
Qwen3-1.7B. - Enter HuggingFace model ID:
Qwen/Qwen3-1.7B. - Enter Model Size:
4GB. - Click "Add custom model". New model tab will appear.
- Optionally, enter model alias.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".
Verify
To use Qwen/Qwen3-1.7B – call POST v1/chat/completions endpoint, since it's a chat model.
curl -X 'POST' \
'https://deepfellow-server-host/v1/chat/completions' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "Qwen/Qwen3-1.7B" # here you can pass model alias instead
}'import requests
response = requests.post(
'https://deepfellow-server-host/v1/chat/completions',
json={
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "Qwen/Qwen3-1.7B" # here you can pass model alias instead
},
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
},
body: JSON.stringify({
max_completion_tokens: 50,
messages: [
{
content: 'You are a helpful assistant.',
role: 'system'
},
{
content: 'hello!',
role: 'user'
}
],
model: 'Qwen/Qwen3-1.7B' // here you can pass model alias instead
})
});
const data = await response.json();
console.log(data);Response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
}
}
],
"created": 1760629812,
"model": "Qwen/Qwen3-1.7B",
"system_fingerprint": "b6620-b887d2f3",
"object": "chat.completion",
"usage": {
"completion_tokens": 34,
"prompt_tokens": 18,
"total_tokens": 52
},
"id": "chatcmpl-uZSktK05yOus1NDtftpT8wFkuV9jHNrN",
"timings": {
"cache_n": 17,
"prompt_n": 1,
"prompt_ms": 7.488,
"prompt_per_token_ms": 7.488,
"prompt_per_second": 133.54700854700855,
"predicted_n": 34,
"predicted_ms": 191.626,
"predicted_per_token_ms": 5.636058823529412,
"predicted_per_second": 177.42895014246503
}
}Uninstall
To uninstall model, go to the 'vllm' service model view, find model ID Qwen/Qwen3-1.7B, click "Uninstall" and finally click "Remove custom model".
Stable-Diffusion Models
You can install any image generation model compatible with stable diffusion (e.g., from Civitai, HuggingFace).
Use stable-diffusion service.
Install
As an example, if you want to add SDVN6-RealXL to your 'llamacpp' service:
- In services view, locate "stable-diffusion" service.
- Click "Models".
- Click "Add custom model".
- In the pop-up window enter Model ID:
SDVN6-RealXL. - Choose file type:
Stable-diffusion. - Enter Model File URL (Download link):
https://civitai.com/api/download/models/134461?type=Model&format=SafeTensor&size=full&fp=fp16. - Enter Model filename:
sdvn6Realxl_detailface.safetensors - Enter Model Size:
4GB. - Click "Add custom model". New model tab will appear.
- Optionally enter model alias.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".
Verify
See Image Generation guide to learn how to use stable-diffusion models.
Uninstall
To uninstall model, go to the 'stable-diffusion' service model view, find model ID stable-diffusion, click "Uninstall" and finally click "Remove custom model".
Docker Image Service
You can connect any containerized model, application, or microservice – literally anything that provides Web API.
Install
For example, if you want to add your custom application with ID my-app and providing /api endpoint:
- In services view, locate "custom" service.
- Click "Models".
- Click "Add custom model" button.
- In the pop-up window enter Model ID:
my-app. - Check "Model is private" box, if you want to use our anonymization plugin to filter your traffic to/from the app.
- Enter Model endpoint, e.g.
/api. - Enter Model Size.
- In Docker image enter ID of the image, e.g.
company/my-app(you can find this ID in registry, e.g., DockerHub). - Enter Docker image port number, e.g.
8080. - Optionally, configure Docker command, Docker volumes, and Docker environment variables.
- Click "Add custom model". New model tab will appear.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".

Verify
Use /custom/{path} endpoint to communicate to your image. For example, if you put /abc in the model endpoint configuration, use /custom/abc:
curl -X 'GET' \
'https://deepfellow-server-host/custom/api' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json'import requests
response = requests.post(
'https://deepfellow-server-host/custom/api',
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/custom/api', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
}
});
const data = await response.json();
console.log(data);Uninstall
To uninstall model, go to the custom service model view, find model ID my-app, click "Uninstall" and finally click "Remove custom model".
OpenAI Models
DeepFellow allows you to add your custom models stored at OpenAI, e.g. your fine-tuned models.
To create your own optimized OpenAI model read supervised fine-tuning guide in OpenAI documentation.
Install
- Go to your OpenAI Finetuning Dashboard.
- Select model from the list.
- Copy the value at "Output model" field, e.g.
ft:gpt-3.5-turbo-0125:personal::95X23ObX. - Open DeepFellow Infra Web Panel.
- In services view, locate "openai" service.
- Click "Models".
- Click "Add custom model" .
- In the pop-up window enter the copied value to Model ID field:
ft:gpt-3.5-turbo-0125:personal::95X23ObX. - Choose Model type from the drop-down (e.g.
llm). - Check boxes relevant to the supported endpoints by the added model.
- Click "Add custom model". New model tab will appear.
- Optionally, enter model alias.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".

Verify
The example model's ID is ft:gpt-3.5-turbo-0125:personal::95X23ObX, its type is llm and it supports /v1/chat/completions. To verify it, do the following:
curl -X 'POST' \
'https://deepfellow-server-host/v1/chat/completions' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX" # here you can pass model alias instead
}'import requests
response = requests.post(
'https://deepfellow-server-host/v1/chat/completions',
json={
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX" # here you can pass model alias instead
},
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
},
body: JSON.stringify({
max_completion_tokens: 50,
messages: [
{
content: 'You are a helpful assistant.',
role: 'system'
},
{
content: 'hello!',
role: 'user'
}
],
model: 'ft:gpt-3.5-turbo-0125:personal::95X23ObX' // here you can pass model alias instead
})
});
const data = await response.json();
console.log(data);Response:
{
"id": "chatcmpl-CRg4goc3tD0uFr3lPdZi2ttaGBA8c",
"object": "chat.completion",
"created": 1760712626,
"model": "ft:gpt-3.5-turbo-0125:personal::95X23ObX",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 9,
"total_tokens": 28,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": null
}Uninstall
To uninstall model, go to the 'openai' service model view, find your model ID, click "Uninstall" and finally click "Remove custom model".
Google AI Models
DeepFellow Infra offers a variety of models to install in 'google' service view in Infra Web Panel. However, if you need to access a model that is not listed there, you can add it easily.
Install
- In services view, locate "google" service.
- Click "Models".
- Click "Add custom model".
- In the pop-up window enter the copied value to Model ID field.
- Choose Model type from the drop-down (e.g.
llm). - Check boxes relevant to the supported endpoints by the added model.
- Click "Add custom model". New model tab will appear.
- Optionally, enter model alias.
- Finally click "Install".
- After a while your model will appear in the models list with green label "Installed".
Verify
Assuming your model is compatible with /v1/chat/completions, verify it by doing the following:
curl -X 'POST' \
'https://deepfellow-server-host/v1/chat/completions' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "MODEL-ID" # here you can pass model alias instead
}'import requests
response = requests.post(
'https://deepfellow-server-host/v1/chat/completions',
json={
"max_completion_tokens": 50,
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "hello!",
"role": "user"
}
],
"model": "MODEL-ID" # here you can pass model alias instead
},
headers={
"Content-Type": "application/json",
"Authorization": "Bearer DEEPFELLOW-PROJECT-API-KEY"
}
)
data = response.json()
print(data)const response = await fetch('https://deepfellow-server-host/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer DEEPFELLOW-PROJECT-API-KEY'
},
body: JSON.stringify({
max_completion_tokens: 50,
messages: [
{
content: 'You are a helpful assistant.',
role: 'system'
},
{
content: 'hello!',
role: 'user'
}
],
model: 'MODEL-ID' // here you can pass model alias instead
})
});
const data = await response.json();
console.log(data);You will get a similar response:
{
"id": "chatcmpl-CRg4goc3tD0uFr3lPdZi2ttaGBA8c",
"object": "chat.completion",
"created": 1760712628,
"model": "MODEL-ID",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 9,
"total_tokens": 28,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": null
}Uninstall
To uninstall model, go to the 'google' service model view, find your model ID, click "Uninstall" and finally click "Remove custom model".
We use cookies on our website. We use them to ensure proper functioning of the site and, if you agree, for purposes such as analytics, marketing, and targeting ads.