Skip to main content

Advanced Voice Synthesis Features

Discover the advanced features available in Voicelab’s voice synthesis engine.

Multispeaker Text to Speech

Create conversations and dialogues with multiple distinct voices in a single audio file.

Basic Multispeaker Usage

Generate conversations with different speakers:
import requests

response = requests.post(
    "https://api.vogent.ai/api/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "lines": [
            {
                "text": "Welcome to our store! How can I help you today?",
                "voiceId": "professional-female-1"
            },
            {
                "text": "Hi, I'm looking for a new laptop for work.",
                "voiceId": "conversational-male-1"
            },
            {
                "text": "Great! I can help you find the perfect laptop. What type of work do you do?",
                "voiceId": "professional-female-1"
            }
        ]
    }
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("Conversation saved successfully!")

Advanced Multispeaker Scenarios

Create complex dialogues for various use cases:

Customer Service Training

training_dialogue = {
    "lines": [
        {"text": "Thank you for calling customer support. How can I assist you today?", "voiceId": "professional-female-1"},
        {"text": "I'm having trouble with my recent order. It hasn't arrived yet.", "voiceId": "conversational-male-1"},
        {"text": "I understand your concern. Let me look up your order details. Can you provide your order number?", "voiceId": "professional-female-1"},
        {"text": "Sure, it's order number 12345.", "voiceId": "conversational-male-1"},
        {"text": "Thank you. I can see your order was shipped yesterday and should arrive by tomorrow.", "voiceId": "professional-female-1"}
    ]
}

response = requests.post("https://api.vogent.ai/tts/multispeaker", headers=headers, json=training_dialogue)

Educational Content

lesson_dialogue = {
    "lines": [
        {"text": "Today we're learning about photosynthesis. Can anyone tell me what plants need for photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They need sunlight, water, and carbon dioxide!", "voiceId": "conversational-female-1"},
        {"text": "Excellent! And what do plants produce during photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They produce oxygen and glucose!", "voiceId": "conversational-male-1"}
    ]
}

Voice Selection Best Practices

Choose appropriate voices for different scenarios:

Professional Contexts

  • Business presentations: Use professional-female-1 or professional-male-1
  • Corporate training: Professional voices for instructors, conversational for participants
  • Customer service: Professional voices for representatives

Casual and Creative Content

  • Podcasts: Mix of conversational and expressive voices
  • Audiobooks: Expressive voices for characters, conversational for narration
  • Social media content: Conversational voices for relatability

Voice Pairing Guidelines

When using multiple voices in a conversation:
  1. Contrast is key: Use different genders or voice types for clarity
  2. Consistency: Keep the same voice for each character throughout
  3. Context matching: Match voice formality to the scenario
# Good voice pairing example
customer_service_scenario = {
    "lines": [
        {"text": "Good morning, how may I help you?", "voiceId": "professional-female-1"},  # Formal representative
        {"text": "Hi, I need help with my account.", "voiceId": "conversational-male-1"},    # Casual customer
        {"text": "I'd be happy to assist you with that.", "voiceId": "professional-female-1"}  # Consistent representative
    ]
}

Streaming Audio Output

Voicelab automatically streams audio bytes as they are generated, providing optimal performance for real-time applications.

Benefits of Streaming

  • Lower latency: Audio starts playing before generation is complete
  • Memory efficiency: No need to buffer entire audio files
  • Better user experience: Immediate audio feedback

Implementation Tips

import requests

# Stream audio directly to a file
response = requests.post(
    "https://api.vogent.ai/api/tts",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"text": "This is a streaming audio example.", "voiceId": "professional-female-1"},
    stream=True
)

with open("streaming_audio.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Error Handling and Best Practices

Common Error Scenarios

import requests

def synthesize_text(text, voice_id):
    try:
        response = requests.post(
            "https://api.vogent.ai/api/tts",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"text": text, "voiceId": voice_id}
        )

        if response.status_code == 200:
            return response.content
        elif response.status_code == 400:
            print("Error: Invalid input - check your text and voice ID")
        elif response.status_code == 422:
            print("Error: Validation failed - invalid voice ID or text format")
        elif response.status_code == 429:
            print("Error: Rate limit exceeded - please wait before retrying")
        else:
            print(f"Error: Unexpected status code {response.status_code}")

    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")

    return None

Rate Limiting Best Practices

import time
import requests

def synthesize_with_retry(text, voice_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.vogent.ai/api/tts",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"text": text, "voiceId": voice_id}
            )

            if response.status_code == 200:
                return response.content
            elif response.status_code == 429:
                # Rate limited, wait and retry
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                print(f"Error: {response.status_code}")
                break

        except requests.exceptions.RequestException as e:
            print(f"Network error: {e}")
            time.sleep(2 ** attempt)

    return None

Performance Optimization

Text Length Considerations

  • Optimal length: 100-1000 characters per request
  • Maximum length: 5000 characters for single TTS
  • Multispeaker limits: 500 characters per line, 10 lines maximum

Batch Processing

For large amounts of text, split into smaller chunks:
def batch_synthesize(long_text, voice_id, chunk_size=1000):
    chunks = [long_text[i:i+chunk_size] for i in range(0, len(long_text), chunk_size)]
    audio_files = []

    for i, chunk in enumerate(chunks):
        audio_data = synthesize_text(chunk, voice_id)
        if audio_data:
            filename = f"chunk_{i}.wav"
            with open(filename, "wb") as f:
                f.write(audio_data)
            audio_files.append(filename)

    return audio_files

Integration Examples

Web Application Integration

// Frontend JavaScript example
async function synthesizeText(text, voiceId) {
    try {
        const response = await fetch('/api/tts', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({ text, voiceId })
        });

        if (response.ok) {
            const audioBlob = await response.blob();
            const audioUrl = URL.createObjectURL(audioBlob);

            const audio = new Audio(audioUrl);
            audio.play();
        }
    } catch (error) {
        console.error('TTS Error:', error);
    }
}

Mobile App Integration

# Python backend for mobile app
from flask import Flask, request, Response
import requests

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    data = request.json

    response = requests.post(
        "https://api.vogent.ai/tts",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=data
    )

    if response.status_code == 200:
        return Response(
            response.content,
            mimetype='audio/wav',
            headers={'Content-Disposition': 'attachment; filename=speech.wav'}
        )
    else:
        return {'error': 'TTS generation failed'}, 500