Voice Wrangling
Text-to-Speech, Automatic Subtitles, ...
Text-to-Speech
Slow but best quality and open-source:
Fast and open-source:
Proprietary:
- ElevenAI — 10000 free characters per account
- TikTok voices (free and fast)
- NaturalReader (5 minute daily limit for Plus Voices)
- Uberduck
Real-time Voice Morphing
- Koe Recast desktop app
- voice.ai
- Voicemod
- NyVox
Voice Cloning
- awesome-voice-cloning
- CorentinJ/Real-Time-Voice-Cloning (demo)
- tortoise-tts can do easy voice cloning but accuracy is questionable
Speech recognition
Whisper (also see Show and Tell)
- Extension for Davinci Resolve
- WAAS — GUI and API for OpenAI Whisper
Online Services:
Diarization (differentiate people in a conversation):
- pyannote
- https://github.com/zachlatta/openai-whisper-speaker-identification
- https://github.com/openai/whisper/discussions/264
Noise removal
- Adobe Podcast Enhance
- Nvdidia RTX Voice — works in real time even with older Nvidia cards
- VoiceFixer
- Extract vocal/instrumental from music