🔥 NEW: Convert Text to Speech & Vice Versa!

Voice Tools Suite: Complete Guide

Transform text into natural speech and convert audio to text with AI-powered precision

220+

AI Voices

Natural-sounding voices

40+

Languages

Global support

99.5%

Accuracy

Speech recognition

4h

Max Duration

Per audio file

📚 What You'll Learn

Text-to-Speech (TTS)

  1. 1 Creating your first voice-over
  2. 2 Customizing voice settings
  3. 3 Using SSML for advanced control

Speech-to-Text (STT)

  1. 4 Converting your first audio file
  2. 5 Handling multiple speakers
  3. 6 Exporting and formatting results
1

Creating Your First Voice-Over

Learn how to convert text into natural-sounding speech in 3 simple steps

Step 1: Choose Your Voice

1. Click "New Voice-Over" in the top menu

2. Select your preferred voice:

Google WaveNet

Best for general use

OpenAI

Premium quality

ElevenLabs

Most natural

💡 Tip: Listen to voice samples before choosing. Each voice has unique characteristics

Step 2: Enter Your Text

1. Type or paste your text in the editor

2. Use SSML tags for control:

<speak>
<prosody rate="slow">Welcome to our guide.</prosody>
<break time="1s"/>
<prosody pitch="high">Let\'s get started!</prosody>
</speak>

Example: Use pauses and emphasis to make your voice-over more engaging

Step 3: Generate & Export

1. Click "Generate" to create your audio

2. Preview the result

3. Choose export format:

MP3

Best for web

WAV

High quality

OGG

Compressed

2

Speech-to-Text: Complete Guide

Transform your audio into accurate text with AI-powered transcription

Step 1: Upload Your Audio

Start by uploading your audio file:

  • Supported formats: MP3, WAV, MP4, WebM, M4A
  • Maximum file size: 25MB
  • Maximum duration: 4 hours
  • Batch upload: Up to 10 files at once

💡 Pro Tip: For best results, use clear audio with minimal background noise. Consider using noise reduction software before uploading.

Step 2: Configure Settings

Language Settings

  • Auto-detect language
  • Manual language selection (40+ languages)
  • Multiple language support
  • Custom vocabulary

Transcription Options

  • Speaker diarization
  • Punctuation
  • Timestamps
  • Formatting preferences

Step 3: Handling Multiple Speakers

Configure speaker identification:

Speaker Detection

  • Enable "Multiple Speakers"
  • Set number of speakers (2-10)
  • Auto-detect speakers
  • Manual speaker assignment

Speaker Labels

  • Speaker 1, Speaker 2, etc.
  • Custom names (John, Sarah)
  • Role-based (Interviewer, Guest)
  • Custom labels

Example Output:

[00:00:15] Interviewer: Welcome to our podcast. Today we\'re discussing AI technology. [00:00:20] Guest: Thank you for having me. I\'m excited to share my insights. [00:00:25] Interviewer: Let\'s start with the basics. What is AI?

💡 Pro Tip: For best results with multiple speakers, ensure clear audio separation and minimal background noise. Consider using separate microphones for each speaker in live recordings.

Step 4: Review & Edit

Review and refine your transcription:

Editing Tools

  • Text correction
  • Speaker reassignment
  • Timestamp adjustment
  • Punctuation editing

Quality Checks

  • Accuracy verification
  • Speaker identification check
  • Format consistency
  • Language accuracy

💡 Pro Tip: Use the keyboard shortcuts (⌘ + E for edit, ⌘ + S for save) to speed up your review process. The AI will learn from your corrections to improve future transcriptions.

Step 5: Export & Integration

Choose your export format and integration options:

Export Formats

TXT Plain text with timestamps
SRT Subtitle format
VTT Web video subtitles
JSON Structured data

Integration Options

  • Direct download
  • Cloud storage (Google Drive, Dropbox)
  • API access for developers
  • Webhook notifications

Format Examples:

SRT Format:

1 00:00:15,000 --> 00:00:20,000 Interviewer: Welcome to our podcast. 2 00:00:20,000 --> 00:00:25,000 Guest: Thank you for having me.

JSON Format:

{ "segments": [ { "start": "00:00:15", "end": "00:00:20", "speaker": "Interviewer", "text": "Welcome to our podcast." } ] }

💡 Pro Tip: Use the JSON format for programmatic access or when you need to process the transcription data further. The SRT format is ideal for video subtitles, while VTT is perfect for web video players.

3

Advanced Features & Tips

Master advanced features to get the most from Voice Tools Suite

Advanced Text-to-Speech

Voice Cloning

Upload a voice sample to create a custom voice

Emotion Control

<prosody emotion="happy">I\'m excited to share this!</prosody>

Advanced Speech-to-Text

Speaker Diarization

Automatically identify different speakers

Custom Vocabulary

Add industry-specific terms for better accuracy

Keyboard Shortcuts

⌘ + N New project
⌘ + P Preview audio
⌘ + E Export
⌘ + / Show all shortcuts

Ready to Transform Your Content?

Start creating professional voice-overs and transcriptions today