Page 1 of 1

esp_event_post_to function waits too long

Posted: Wed Oct 27, 2021 1:40 pm
by M.Pacheu
Hi everyone ! I am currently working on a C++ project with the ESP32 with this Environment:

Development Kit: ESP32-PICO-Kit
Kit version: v4.1
Module or chip used: ESP32-PICO-D4
IDF version v5.0-dev-39-ge5f5eb3cbb
Build System: idf.py
Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2021r2) 8.4.0
Operating System: Linux (Ubuntu 20.04.3 LTS)
Using an IDE?: Yes : Eclipse but won't matter
Power Supply: external 5V

I use a custom esp_event_loop and a custom Task with it, and I am pretty sure that my problem described latter comes from the fact that I am indeed using a custom Task for the process of events, and not a dedicated one from the API.

This custom task runs the esp_event_loop_run() function for 5 secs, runs some other process and loops over it.
I define an Event Base with some Event IDs and I register to this Event Loop some callbacks for the events.
Besides this Loop Task, I run several Tasks that will post events for different reasons. For this post, I will only talk about my HTTP task, that posts an 'HTTP' event to the Event Loop on :
- A complete received object via the HTTP GET method
- A partial received object (for byte streams) via also the HTTP GET method, when the Content-Type is octet-stream.
With the event posted is associated a data structure which holds information about the parsed data, or the buffer itself with its length, and some other information.

Anyway, I will now talk about the parts of the esp_event_loop component that are bothering me when I use them.
- To post events to the Event Loop, I use the function esp_event_post_to() with, as arguments, the handle of the Event Loop to post to, the Event Base + ID, the Event Data + Length and the Ticks to wait.
- I check the return of this function to know if the event was actually posted or not.
- So, when my HTTP task is posting events for each byte stream object it receives, it is trying 5 times to post the event and waiting each time 50 ms.
- During this time, my Event Task is running the callback of the event each time it sees the event in the internal Queue, and can take up to 500 ms to process the data, as I use ESP_LOG_BUFFER_HEXDUMP (which takes a bit of time) to log it to the debug output.
- ==> My problem is then that, sometimes (quite often), the HTTP Task runs out of its 5 attempts * 50 ms of waiting time to post the event with the data to the Event Loop Queue.

I looked for any reason why it has this behaviour :
- It cannot be that the Queue is full and the HTTP Task never sees the opportunity to post the event because of Queue is full, as I made the Queue is 20 item long to ensure that. I even made it bigger, but the problem still occurs when only a few item have been queued.
- I made sure that the Ticks to wait are greater than the configured RTOS Ticks rate as I saw that putting this argument to less than the Tick rate is more or less equivalent to putting 0.
- If I make the HTTP Task wait up to 1000 ms and retry 5 times, it will post the event on first try, so it can eventually post the event.

So my thinking was, when the Loop Task processes an event in esp_event_loop_run(), it either locks the Queue semaphore until it has finished processing the event or locks an internal semaphore and releases it also on event process completion, making the esp_event_post_to() function waiting for either one of theses semaphores.
As I looked in these two function implementation (in esp-idf/components/esp_event/esp_event.c ), it confirmed my thoughts (at least to my understanding but I hope I made a mistake):

- in esp_event_post_to from line 861 to 892 :
- If the Event Loop has no dedicated task, it tries to Take the internal Semaphore (result = xSemaphoreTakeRecursive(loop->mutex, ticks_to_wait); at line 865) and wait the Ticks to wait until it has acquired it
- Then releases this semaphore and Post to the RTOS Queue, waiting or not depending on the Task calling this function
- If the Event Loop HAS a dedicated task, it does the same thing without Taking the internal semaphore

- in esp_event_loop_run from line 566 to 634 :
- in the while(xQueueReceive() ) loop, once an event has been taken out of the Queue, it Takes the internal Semaphore (xSemaphoreTakeRecursive(loop->mutex, portMAX_DELAY); on line 568), so the same one as before
- Then it executes some process to finally call handler_execute() which, I guess, will call the callback and do my own process.
- By doing so, it ensures that the entire handling of the event is done, as it finally Gives back the semaphore when this process is done.

Moreover, to test my theory, I commented out the lines in esp_event_post_to from line 861 to 883 and made both cases (Event Loop has no dedicated task or HAS one) execute the 'HAS a dedicated task' process ( line 878 to 882).
=====>>>> And it solved my problem !!! Indeed, my events were always posted in one try of a max wait of 50 ms.

Thus, my questions are :
- Why does this esp_event_post_to() function locks the internal Event Loop Semaphore when this Event Loop has no dedicated Task ?
- Am I spotting a real issue here ? If not, then I am either mistaken and this behaviour comes from my own code, or this is totally intended, documented and this behaviour is well known ? Thus this would make long process in Event Callbacks NOT a good practice, as it is blocking all the event post attempts.
- Does anyone have an example of WHAT is the actual good (best ?) practice of both Event Callback process and Event posting process ? For example, what should I do when the esp_event_post_to fails, but I must post the event as it is a critical one, but I cannot wait for ever for the event to be finally posted ?

In the meantime, I will just make the Event Loop have its dedicated task made by the API and execute my other process in an event or something, that should go around the issue.

I hope this post has all the elements for you to understand the problem I have encountered, I look forward to reading anyone's thoughts on this subject ! :D

M.Pacheu