PicoTTS Text-to-Speech component
Posted: Fri Feb 09, 2024 12:29 am
Hi all,
For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.
I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )
Hope you all find it useful!
Kindly,
/J
For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.
I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )
Hope you all find it useful!
Kindly,
/J