PicoTTS Text-to-Speech component

jmattsson · Postby **jmattsson** » Fri Feb 09, 2024 12:29 am

Hi all,

For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.

I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )

Hope you all find it useful!

Kindly,
/J

CityHunter71 · Postby **CityHunter71** » Sat Aug 17, 2024 6:29 pm

Hi jmattsson,

I am very happy that we share the goals of having a TTS working OFF-Line on ESP32 (I use an S3).

I've been trying different solutions for months and now I'll try yours.

I've tried porting Talkie and TTS from Arduino to ESP32, but the biggest difficulty is that many systems rely on processor "timing" and handle pauses by counting pause moments. Wanting to create a multitasking system, the very concept of WAIT is wrong and I am implementing functions that determine pauses by inserting silence into the audio buffer.

We have a lot of work to do and it's nice to share ideas.

Greetings
CityHunter71

aliarifat794 · Postby **aliarifat794** » Sun Aug 18, 2024 9:09 am

Thanks for sharing with us.

jmattsson · Postby **jmattsson** » Mon Aug 19, 2024 3:45 am

Hi CityHunter71!

I hope you'll have fun experimenting with PicoTTS! You'll have full control over the audio buffers if you wish, so you can tinker as much as you'd like. I did find that I ran low on CPU cycles if I tried to both sample and generate speech at the same time, but I haven't had time to see whether further optimisation could resolve that.

Good luck, and looking forward to seeing what you come up with!
~Jade

lukepshot · Postby **lukepshot** » Fri Oct 11, 2024 5:23 pm

Hi jmattsson,

I too am looking to use offline STT / TTS on my esp32-pico (Atom M5) boards. I was wondering if you have an example project I could reference?

Best,
lukepshot

jmattsson · Postby **jmattsson** » Sun Oct 13, 2024 7:26 am

Hi lukepshot,

There is an example available with the component, have a look at the example page: https://components.espressif.com/compon ... anguage=en. It's currently only for the ESP-BOX, so you'd need to provide your own minimal BSP files for your board.

If you want to be fancy, you could make the board/BSP selectable via Kconfig and raise a PR over at https://github.com/DiUS/esp-picotts

You might find the pico quite tight on RAM to fit the TTS engine into, but I wish you luck!

Warmly,
~Jade

rama98 · Postby **rama98** » Sat Nov 02, 2024 6:33 am

jmattsson wrote: ↑
Fri Feb 09, 2024 12:29 am
Hi all,

For quite some time I've been hoping Espressif would provide a TTS engine for English as well as Chinese. It's all well and good to be able to call out to services like OpenAI's TTS, but to me on-the-edge speech interaction needs to function without Internet connectivity. We already get excellent wake word support and great speech command support on-chip. All that's missing for a full round trip of speech interaction is text-to-speech.

I finally got enough time to sit down with my ESP-BOX and port the PicoTTS engine to an ESP-IDF component. While the voice is nowhere near the quality of, say, OpenAI's, it's still remarkably good considering the limited resources it has available. So, if there are people other than myself who have been hungering for a text-to-speech engine on the ESP, have a play with the PicoTTS component. You get a choice between English (UK/US), German, French, Italian and Spanish voices. There's a small example included for the ESP-BOX that you can use to get started. If anyone wants to provide examples for something other than the ESP-BOX, feel free to raise a PR! (Or send me other dev boards =^,^= )

Hope you all find it useful!

Kindly,
/J

Hi Jmattsson
I am a student who is trying to use PicoTTS in my ESP32S3 board.
But i find some troubles implementing this function using the example with the esp32-box.
Its possible to use this function with a esp32s3 using a MAX98357A amplifier with I2C?
However, thanks for you response!

jmattsson · Postby **jmattsson** » Tue Nov 05, 2024 11:24 pm

Hi rama98,

I would expect it to be possible, but I have no direct experience. You'll need to provide a different mini BSP (Board Support Package) that can do the required initialisation and setup of your hardware. Have a look at the files main/bsp/esp-box.h and main/bsp/esp-box.c — the three functions in that header is what you would need to provide a custom implementation of.

You may be able to find an example somewhere on the net already for those. Definitely have a look through the esp-bsp repo (https://github.com/espressif/esp-bsp/) and see if your board is already supported there. If it is, you can probably copy the functions from there.

Good luck!

CityHunter71 · Postby **CityHunter71** » Fri Nov 15, 2024 8:11 pm

I finally managed to get your code to work as Manage_components on an esp32-s3 that doesn't have a dsp on board and so I had to rewrite the whole initialization component using the i2s module and figure out how to manage your libraries and now everything works fine.

First of all I created a primitive to manage sending messages more easily:

void picotts_speak(const char* text_to_speak) {
// Calculate the length of the text passed and use it to pass the parameter
picotts_add(text_to_speak, strlen(text_to_speak)+1);
}

Now I need to send a message and wait for it to be correctly reproduced and I was studying how you implemented the picotts_add function and I was thinking of creating another one by modifying the code and inserting a polling function to wait for the textQ queue to empty and so I was looking to understand the esp_pico_run call.

I wanted to thank you publicly because it is very interesting work, especially the hard task of making the audio module loading modules in the code.

I'm writing to you on GITHUB. So maybe you make the changes available to everyone in the code.

Best Regards!

jmattsson wrote: ↑
Mon Aug 19, 2024 3:45 am
Hi CityHunter71!

I hope you'll have fun experimenting with PicoTTS! You'll have full control over the audio buffers if you wish, so you can tinker as much as you'd like. I did find that I ran low on CPU cycles if I tried to both sample and generate speech at the same time, but I haven't had time to see whether further optimisation could resolve that.

Good luck, and looking forward to seeing what you come up with!
~Jade

CityHunter71 · Postby **CityHunter71** » Sat Nov 16, 2024 3:33 pm

You are a Genius Jade!!

I studied the library and discovered that you are one step ahead! Fantastic !

The picotts_set_idle_notify call is spectacular.

Thank you

CityHunter71 wrote: ↑
Fri Nov 15, 2024 8:11 pm
I finally managed to get your code to work as Manage_components on an esp32-s3 that doesn't have a dsp on board and so I had to rewrite the whole initialization component using the i2s module and figure out how to manage your libraries and now everything works fine.

First of all I created a primitive to manage sending messages more easily:

void picotts_speak(const char* text_to_speak) {
// Calculate the length of the text passed and use it to pass the parameter
picotts_add(text_to_speak, strlen(text_to_speak)+1);
}

Now I need to send a message and wait for it to be correctly reproduced and I was studying how you implemented the picotts_add function and I was thinking of creating another one by modifying the code and inserting a polling function to wait for the textQ queue to empty and so I was looking to understand the esp_pico_run call.

I wanted to thank you publicly because it is very interesting work, especially the hard task of making the audio module loading modules in the code.

I'm writing to you on GITHUB. So maybe you make the changes available to everyone in the code.

Best Regards!

jmattsson wrote: ↑
Mon Aug 19, 2024 3:45 am
Hi CityHunter71!

I hope you'll have fun experimenting with PicoTTS! You'll have full control over the audio buffers if you wish, so you can tinker as much as you'd like. I did find that I ran low on CPU cycles if I tried to both sample and generate speech at the same time, but I haven't had time to see whether further optimisation could resolve that.

Good luck, and looking forward to seeing what you come up with!
~Jade

PicoTTS Text-to-Speech component

PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Re: PicoTTS Text-to-Speech component

Who is online

About Us

Extra

Information