Hi all,
For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.
I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )
Hope you all find it useful!
Kindly,
/J
PicoTTS Text-to-Speech component
-
- Posts: 3
- Joined: Sat Aug 17, 2024 6:06 pm
Re: PicoTTS Text-to-Speech component
Hi jmattsson,
I am very happy that we share the goals of having a TTS working OFF-Line on ESP32 (I use an S3).
I've been trying different solutions for months and now I'll try yours.
I've tried porting Talkie and TTS from Arduino to ESP32, but the biggest difficulty is that many systems rely on processor "timing" and handle pauses by counting pause moments. Wanting to create a multitasking system, the very concept of WAIT is wrong and I am implementing functions that determine pauses by inserting silence into the audio buffer.
We have a lot of work to do and it's nice to share ideas.
Greetings
CityHunter71
I am very happy that we share the goals of having a TTS working OFF-Line on ESP32 (I use an S3).
I've been trying different solutions for months and now I'll try yours.
I've tried porting Talkie and TTS from Arduino to ESP32, but the biggest difficulty is that many systems rely on processor "timing" and handle pauses by counting pause moments. Wanting to create a multitasking system, the very concept of WAIT is wrong and I am implementing functions that determine pauses by inserting silence into the audio buffer.
We have a lot of work to do and it's nice to share ideas.
Greetings
CityHunter71
-
- Posts: 170
- Joined: Sun Jun 23, 2024 6:18 pm
Re: PicoTTS Text-to-Speech component
Thanks for sharing with us.
Re: PicoTTS Text-to-Speech component
Hi CityHunter71!
I hope you'll have fun experimenting with PicoTTS! You'll have full control over the audio buffers if you wish, so you can tinker as much as you'd like. I did find that I ran low on CPU cycles if I tried to both sample and generate speech at the same time, but I haven't had time to see whether further optimisation could resolve that.
Good luck, and looking forward to seeing what you come up with!
~Jade
I hope you'll have fun experimenting with PicoTTS! You'll have full control over the audio buffers if you wish, so you can tinker as much as you'd like. I did find that I ran low on CPU cycles if I tried to both sample and generate speech at the same time, but I haven't had time to see whether further optimisation could resolve that.
Good luck, and looking forward to seeing what you come up with!
~Jade
Re: PicoTTS Text-to-Speech component
Hi jmattsson,
I too am looking to use offline STT / TTS on my esp32-pico (Atom M5) boards. I was wondering if you have an example project I could reference?
Best,
lukepshot
I too am looking to use offline STT / TTS on my esp32-pico (Atom M5) boards. I was wondering if you have an example project I could reference?
Best,
lukepshot
Re: PicoTTS Text-to-Speech component
Hi lukepshot,
There is an example available with the component, have a look at the example page: https://components.espressif.com/compon ... anguage=en. It's currently only for the ESP-BOX, so you'd need to provide your own minimal BSP files for your board.
If you want to be fancy, you could make the board/BSP selectable via Kconfig and raise a PR over at https://github.com/DiUS/esp-picotts
You might find the pico quite tight on RAM to fit the TTS engine into, but I wish you luck!
Warmly,
~Jade
There is an example available with the component, have a look at the example page: https://components.espressif.com/compon ... anguage=en. It's currently only for the ESP-BOX, so you'd need to provide your own minimal BSP files for your board.
If you want to be fancy, you could make the board/BSP selectable via Kconfig and raise a PR over at https://github.com/DiUS/esp-picotts
You might find the pico quite tight on RAM to fit the TTS engine into, but I wish you luck!
Warmly,
~Jade
Re: PicoTTS Text-to-Speech component
Hi Jmattssonjmattsson wrote: ↑Fri Feb 09, 2024 12:29 amHi all,
For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.
I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )
Hope you all find it useful!
Kindly,
/J
I am a student who is trying to use PicoTTS in my ESP32S3 board.
But i find some troubles implementing this function using the example with the esp32-box.
Its possible to use this function with a esp32s3 using a MAX98357A amplifier with I2C?
However, thanks for you response!
Re: PicoTTS Text-to-Speech component
Hi rama98,
I would expect it to be possible, but I have no direct experience. You'll need to provide a different mini BSP (Board Support Package) that can do the required initialisation and setup of your hardware. Have a look at the files main/bsp/esp-box.h and main/bsp/esp-box.c — the three functions in that header is what you would need to provide a custom implementation of.
You may be able to find an example somewhere on the net already for those. Definitely have a look through the esp-bsp repo (https://github.com/espressif/esp-bsp/) and see if your board is already supported there. If it is, you can probably copy the functions from there.
Good luck!
I would expect it to be possible, but I have no direct experience. You'll need to provide a different mini BSP (Board Support Package) that can do the required initialisation and setup of your hardware. Have a look at the files main/bsp/esp-box.h and main/bsp/esp-box.c — the three functions in that header is what you would need to provide a custom implementation of.
You may be able to find an example somewhere on the net already for those. Definitely have a look through the esp-bsp repo (https://github.com/espressif/esp-bsp/) and see if your board is already supported there. If it is, you can probably copy the functions from there.
Good luck!
Who is online
Users browsing this forum: No registered users and 137 guests