Text to Speech
Turn text to spoken audio.
Quickstart
DeepFellow provides v1/audio/speech endpoint that can be used to produce spoken audio in multiple languages.
The v1/audio/speech takes three key inputs:
- model you want to use
- text you want to turn to audio
- voice that will speak (depending on the model you use for audio tasks)
Example request:
curl -X POST \
"https://deepfellow-server-host/v1/audio/speech" \
-H "Authorization: Bearer DEEPFELLOW-PROJECT-API-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "speaches-ai/piper-en_US-ryan-high",
"voice": "ryan",
"input": "Today is a wonderful day to build something people love!",
"instructions": "Speak in a cheerful and positive tone."
}' \
--output speech.mp3from pathlib import Path
from openai import OpenAI
client = OpenAI(
base_url="https://deepfellow-server-host/v1",
api_key="DEEPFELLOW-PROJECT-API-KEY"
)
speech_file_path = Path(__file__).parent / "speech.mp3"
with client.audio.speech.with_streaming_response.create(
model="speaches-ai/piper-en_US-ryan-high",
voice="ryan",
input="Today is a wonderful day to build something people love!",
instructions="Speak in a cheerful and positive tone.",
) as response:
response.stream_to_file(speech_file_path)import * as fs from 'fs';
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://deepfellow-server-host/v1',
apiKey: 'DEEPFELLOW-PROJECT-API-KEY'
});
const response = await client.audio.speech.create({
model: 'speaches-ai/piper-en_US-ryan-high',
voice: 'ryan',
input: 'Today is a wonderful day to build something people love!',
instructions: 'Speak in a cheerful and positive tone.'
});
const buffer = Buffer.from(await response.arrayBuffer());
await fs.promises.writeFile('speech.mp3', buffer);The output is mp3 by default, but you can request any other supported format.
Supported Formats
- MP3: The default response format for general use cases.
- Opus: Low latency format for streaming and communication.
- AAC: Format for digital compression, preferred by popular platforms, e.g. YouTube.
- FLAC: Lossless compression.
- WAV: Uncompressed WAV audio, suitable for low-latency applications to avoid decoding overhead.
- PCM: Similar to WAV but contains the raw samples in 24kHz (16-bit signed, low-endian), without the header.
To read more about text to speech, visit OpenAI API Documentation.
We use cookies on our website. We use them to ensure proper functioning of the site and, if you agree, for purposes such as analytics, marketing, and targeting ads.