Voice Tools Suite: Complete Guide
Transform text into natural speech and convert audio to text with AI-powered precision
AI Voices
Natural-sounding voices
Languages
Global support
Accuracy
Speech recognition
Max Duration
Per audio file
📚 What You'll Learn
Text-to-Speech (TTS)
- 1 Creating your first voice-over
- 2 Customizing voice settings
- 3 Using SSML for advanced control
Speech-to-Text (STT)
- 4 Converting your first audio file
- 5 Handling multiple speakers
- 6 Exporting and formatting results
Creating Your First Voice-Over
Learn how to convert text into natural-sounding speech in 3 simple steps
Step 1: Choose Your Voice
1. Click "New Voice-Over" in the top menu
2. Select your preferred voice:
Google WaveNet
Best for general use
OpenAI
Premium quality
ElevenLabs
Most natural
💡 Tip: Listen to voice samples before choosing. Each voice has unique characteristics
Step 2: Enter Your Text
1. Type or paste your text in the editor
2. Use SSML tags for control:
<speak>
<prosody rate="slow">Welcome to our guide.</prosody>
<break time="1s"/>
<prosody pitch="high">Let\'s get started!</prosody>
</speak>
Example: Use pauses and emphasis to make your voice-over more engaging
Step 3: Generate & Export
1. Click "Generate" to create your audio
2. Preview the result
3. Choose export format:
MP3
Best for web
WAV
High quality
OGG
Compressed
Speech-to-Text: Complete Guide
Transform your audio into accurate text with AI-powered transcription
Step 1: Upload Your Audio
Start by uploading your audio file:
- Supported formats: MP3, WAV, MP4, WebM, M4A
- Maximum file size: 25MB
- Maximum duration: 4 hours
- Batch upload: Up to 10 files at once
💡 Pro Tip: For best results, use clear audio with minimal background noise. Consider using noise reduction software before uploading.
Step 2: Configure Settings
Language Settings
- Auto-detect language
- Manual language selection (40+ languages)
- Multiple language support
- Custom vocabulary
Transcription Options
- Speaker diarization
- Punctuation
- Timestamps
- Formatting preferences
Step 3: Handling Multiple Speakers
Configure speaker identification:
Speaker Detection
- Enable "Multiple Speakers"
- Set number of speakers (2-10)
- Auto-detect speakers
- Manual speaker assignment
Speaker Labels
- Speaker 1, Speaker 2, etc.
- Custom names (John, Sarah)
- Role-based (Interviewer, Guest)
- Custom labels
Example Output:
[00:00:15] Interviewer: Welcome to our podcast. Today we\'re discussing AI technology.
[00:00:20] Guest: Thank you for having me. I\'m excited to share my insights.
[00:00:25] Interviewer: Let\'s start with the basics. What is AI?
💡 Pro Tip: For best results with multiple speakers, ensure clear audio separation and minimal background noise. Consider using separate microphones for each speaker in live recordings.
Step 4: Review & Edit
Review and refine your transcription:
Editing Tools
- Text correction
- Speaker reassignment
- Timestamp adjustment
- Punctuation editing
Quality Checks
- Accuracy verification
- Speaker identification check
- Format consistency
- Language accuracy
💡 Pro Tip: Use the keyboard shortcuts (⌘ + E for edit, ⌘ + S for save) to speed up your review process. The AI will learn from your corrections to improve future transcriptions.
Step 5: Export & Integration
Choose your export format and integration options:
Export Formats
Integration Options
- Direct download
- Cloud storage (Google Drive, Dropbox)
- API access for developers
- Webhook notifications
Format Examples:
SRT Format:
1
00:00:15,000 --> 00:00:20,000
Interviewer: Welcome to our podcast.
2
00:00:20,000 --> 00:00:25,000
Guest: Thank you for having me.
JSON Format:
{
"segments": [
{
"start": "00:00:15",
"end": "00:00:20",
"speaker": "Interviewer",
"text": "Welcome to our podcast."
}
]
}
💡 Pro Tip: Use the JSON format for programmatic access or when you need to process the transcription data further. The SRT format is ideal for video subtitles, while VTT is perfect for web video players.
Advanced Features & Tips
Master advanced features to get the most from Voice Tools Suite
Advanced Text-to-Speech
Voice Cloning
Upload a voice sample to create a custom voice
Emotion Control
<prosody emotion="happy">I\'m excited to share this!</prosody>
Advanced Speech-to-Text
Speaker Diarization
Automatically identify different speakers
Custom Vocabulary
Add industry-specific terms for better accuracy
Keyboard Shortcuts
Ready to Transform Your Content?
Start creating professional voice-overs and transcriptions today