Speech to Text
Turn audio into text.
Quickstart
DeepFellow provides two speech to text endpoints: transcriptions and translations that can be used to transcribe audio and translate and transcribe audio into English.
Currently supported input formats are: mp3, mp4, mpeg, mpga, m4a, wav, and webm.
Transcriptions
Transcriptions endpoint takes input audio file and output file format.
Supported input formats are: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm.
Supported output formats are: json, text, srt, verbose_json, and vtt.
Example request to transcribe audio:
curl -X 'POST' \
"https://deepfellow-server-host/v1/audio/speech" \
-H "Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY" \
-H 'Content-Type: multipart/form-data' \
-F 'file=@audio.mp3;type=audio/mpeg' \
-F 'model=Systran/faster-whisper-medium'from openai import OpenAI
client = OpenAI(
base_url="https://deepfellow-server-host/v1",
api_key="DEEPFELLOW-PROJECT-API-KEY"
)
audio_file = open("audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="Systran/faster-whisper-medium",
file=audio_file,
)
print(transcription.model_dump_json())import * as fs from 'fs';
import OpenAI from 'openai';
import { toFile } from 'openai/uploads';
const client = new OpenAI({
baseURL: 'https://deepfellow-server-host/v1',
apiKey: 'DEEPFELLOW-PROJECT-API-KEY'
});
const audioBuffer = await fs.promises.readFile('audio.mp3');
const audioFile = await toFile(audioBuffer, 'audio.mp3', {
type: 'audio/mpeg'
});
const transcription = await client.audio.transcriptions.create({
model: 'Systran/faster-whisper-medium',
file: audioFile
});
console.log(transcription);Response:
{ "text": "Hello, how are you?" }See DeepFellow API Reference for the list of all available parameters.
Translations
Translations endpoint works similarly to Transcriptions, but it takes input audio file in any of the supported languages and transcribes it to English.
Example translation request:
curl -X 'POST' \
'https://deepfellow-server-host/v1/audio/translations' \
-H 'Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@audio.mp3;type=audio/mpeg' \
-F 'model=Systran/faster-whisper-medium'from openai import OpenAI
client = OpenAI(
api_key="DEEPFELLOW-PROJECT-API-KEY",
base_url="https://deepfellow-server-host/v1"
)
audio_file = open("audio.mp3", "rb")
translation = client.audio.translations.create(
model="Systran/faster-whisper-medium",
file=audio_file,
)
print(translation.model_dump_json())import * as fs from 'fs';
import OpenAI from 'openai';
import { toFile } from 'openai/uploads';
const client = new OpenAI({
baseURL: 'https://deepfellow-server-host/v1',
apiKey: 'DEEPFELLOW-PROJECT-API-KEY'
});
const audioBuffer = await fs.promises.readFile('audio.mp3');
const audioFile = await toFile(audioBuffer, 'audio.mp3', {
type: 'audio/mpeg'
});
const translation = await client.audio.translations.create({
model: 'Systran/faster-whisper-medium',
file: audioFile
});
console.log(translation);Response:
{ "text": "Some text translated to English." }The supported languages will depend on the model you choose.
To read more about speech to text, visit OpenAI API Documentation.
We use cookies on our website. We use them to ensure proper functioning of the site and, if you agree, for purposes such as analytics, marketing, and targeting ads.