Speech to Text

Quickstart

DeepFellow provides two speech to text endpoints: transcriptions and translations that can be used to transcribe audio and translate and transcribe audio into English. Currently supported input formats are: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Transcriptions

Transcriptions endpoint takes input audio file and output file format.

Supported input formats are: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm.

Supported output formats are: json, text, srt, verbose_json, and vtt.

Example request to transcribe audio:

curl -X 'POST' \
 "https://deepfellow-server-host/v1/audio/speech" \
  -H "Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY" \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@audio.mp3;type=audio/mpeg' \
  -F 'model=Systran/faster-whisper-medium'

from openai import OpenAI

client = OpenAI(
    base_url="https://deepfellow-server-host/v1",
    api_key="DEEPFELLOW-PROJECT-API-KEY" 
)
audio_file = open("audio.mp3", "rb")

transcription = client.audio.transcriptions.create(
    model="Systran/faster-whisper-medium",
    file=audio_file,
)

print(transcription.model_dump_json())

import * as fs from 'fs';
import OpenAI from 'openai';
import { toFile } from 'openai/uploads';

const client = new OpenAI({
    baseURL: 'https://deepfellow-server-host/v1',
    apiKey: 'DEEPFELLOW-PROJECT-API-KEY' 
});

const audioBuffer = await fs.promises.readFile('audio.mp3');
const audioFile = await toFile(audioBuffer, 'audio.mp3', {
    type: 'audio/mpeg'
});

const transcription = await client.audio.transcriptions.create({
    model: 'Systran/faster-whisper-medium',
    file: audioFile
});

console.log(transcription);

Response:

{ "text": "Hello, how are you?" }

See DeepFellow API Reference for the list of all available parameters.

Translations

Translations endpoint works similarly to Transcriptions, but it takes input audio file in any of the supported languages and transcribes it to English.

Example translation request:

curl -X 'POST' \
  'https://deepfellow-server-host/v1/audio/translations' \
  -H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@audio.mp3;type=audio/mpeg' \
  -F 'model=Systran/faster-whisper-medium'

from openai import OpenAI

client = OpenAI(
    api_key="DEEPFELLOW-PROJECT-API-KEY",
    base_url="https://deepfellow-server-host/v1"
)
audio_file = open("audio.mp3", "rb")

translation = client.audio.translations.create(
    model="Systran/faster-whisper-medium",
    file=audio_file,
)

print(translation.model_dump_json())

import * as fs from 'fs';
import OpenAI from 'openai';
import { toFile } from 'openai/uploads';

const client = new OpenAI({
    baseURL: 'https://deepfellow-server-host/v1',
    apiKey: 'DEEPFELLOW-PROJECT-API-KEY' 
});

const audioBuffer = await fs.promises.readFile('audio.mp3');
const audioFile = await toFile(audioBuffer, 'audio.mp3', {
    type: 'audio/mpeg'
});

const translation = await client.audio.translations.create({
    model: 'Systran/faster-whisper-medium',
    file: audioFile
});

console.log(translation);

Response:

{ "text": "Some text translated to English." }

The supported languages will depend on the model you choose.

To read more about speech to text, visit OpenAI API Documentation.

Quickstart

Transcriptions

Translations

On this page