โ All systems
TTS & Zero-Shot Voice Cloning
From text and three seconds of a voice to convincing speech
Architecture introduced 05 Jan 2023
This system turns written text into spoken audio โ and can speak it in a specific person's voice from just a few seconds of a sample. The words get cleaned up and turned into sounds-to-pronounce, a short reference clip is turned into a 'voiceprint', and the two are combined into speech you can play. The same magic that gives a voice to people who can't speak also powers scam calls that sound exactly like your boss or your bank.
InstructionsDataActionsControl / decisionFeedback / logs
๐ Click any component in the diagram to inspect its risks & defensesFollow a request ยท step 1 of 6
โ / โ keys
You type the words you want spoken and point at a voice to use โ maybe your own, maybe someone else's clip you uploaded.