Developers

Documentation

A technical guide to integrating leading AI models through Oblion's unified API.

On this page

Python Speech-to-Text Guide

Overview

The audio API provides two main endpoints:

  • Transcription: Audio to text
  • Translation: Audio translated into English text

Supported Formats

  • File size: 25 MB maximum
  • Supported formats: mp3, mp4, mpeg, mpg, m4a, wav, webm

Usage

1. Transcription

Convert audio into text in its original language.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oblion.io/v1",
    api_key=key
)

# Basic transcription
audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file
)
print(transcription.text)

# Specify the output format
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="text"
)

2. Translation

Convert audio in any language into English text.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oblion.io/v1",
    api_key=key
)

audio_file = open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
  model="whisper-1",
  file=audio_file
)
print(translation.text)

3. Timestamp Feature

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oblion.io/v1",
    api_key=key
)

audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
  file=audio_file,
  model="whisper-1",
  response_format="verbose_json",
  timestamp_granularities=["word"]
)

print(transcript.words)

4. Handling Large Files

Split files larger than 25 MB using PyDub:

from pydub import AudioSegment

song = AudioSegment.from_mp3("good_morning.mp3")

# Split into 10-minute segments
ten_minutes = 10 * 60 * 1000
first_10_minutes = song[:ten_minutes]
first_10_minutes.export("good_morning_10.mp3", format="mp3")

Optimization Tips

Tips for using the prompt parameter:

  1. Correct recognition of specific words.
  2. Preserve contextual coherence.
  3. Control punctuation in the output.
  4. Preserve filler words.
  5. Control the output text style (e.g., Simplified vs. Traditional Chinese).

Supported Languages

Supports 98 languages, including:

  • Major Asian languages: Chinese, Japanese, Korean, and more.
  • European languages: English, French, German, and more.
  • Other languages: Arabic, Hindi, and more.

Note: Only languages with a Word Error Rate (WER) below 50% are listed above. Other languages are supported, but quality may be lower.