Developers
Documentation
A technical guide to integrating leading AI models through Oblion's unified API.
On this page
Python Speech-to-Text Guide
Overview
The audio API provides two main endpoints:
- Transcription: Audio to text
- Translation: Audio translated into English text
Supported Formats
- File size: 25 MB maximum
- Supported formats:
mp3,mp4,mpeg,mpg,m4a,wav,webm
Usage
1. Transcription
Convert audio into text in its original language.
from openai import OpenAI
client = OpenAI(
base_url="https://api.oblion.io/v1",
api_key=key
)
# Basic transcription
audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcription.text)
# Specify the output format
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
2. Translation
Convert audio in any language into English text.
from openai import OpenAI
client = OpenAI(
base_url="https://api.oblion.io/v1",
api_key=key
)
audio_file = open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
print(translation.text)
3. Timestamp Feature
from openai import OpenAI
client = OpenAI(
base_url="https://api.oblion.io/v1",
api_key=key
)
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
file=audio_file,
model="whisper-1",
response_format="verbose_json",
timestamp_granularities=["word"]
)
print(transcript.words)
4. Handling Large Files
Split files larger than 25 MB using PyDub:
from pydub import AudioSegment
song = AudioSegment.from_mp3("good_morning.mp3")
# Split into 10-minute segments
ten_minutes = 10 * 60 * 1000
first_10_minutes = song[:ten_minutes]
first_10_minutes.export("good_morning_10.mp3", format="mp3")
Optimization Tips
Tips for using the prompt parameter:
- Correct recognition of specific words.
- Preserve contextual coherence.
- Control punctuation in the output.
- Preserve filler words.
- Control the output text style (e.g., Simplified vs. Traditional Chinese).
Supported Languages
Supports 98 languages, including:
- Major Asian languages: Chinese, Japanese, Korean, and more.
- European languages: English, French, German, and more.
- Other languages: Arabic, Hindi, and more.
Note: Only languages with a Word Error Rate (WER) below 50% are listed above. Other languages are supported, but quality may be lower.