vTaskDelete sometimes hangs current task
vTaskDelete sometimes hangs current task
Hello everyone,
I've been running around an infrequent and seemingly random problem that occurs on my ESP32 powered board.
After reproducing the problem, I think I have narrowed down what is happening.
I am using both cores and multiple tasks accessing the same memory. To manage the concurrent access I am using a shared mutex semaphore.
At some point a "main" task has to delete other running subroutines. Their task handle is saved in a shared data structure, which access controlled by a mutex semaphore.
Every once in a while it appears the vTaskDelete call ends up deleting or blocking the calling task as well. Through debug prints I am sure the task handler passed to the function is neither NULL nor the current task's handler. Is this an expected behaviour?
I've been running around an infrequent and seemingly random problem that occurs on my ESP32 powered board.
After reproducing the problem, I think I have narrowed down what is happening.
I am using both cores and multiple tasks accessing the same memory. To manage the concurrent access I am using a shared mutex semaphore.
At some point a "main" task has to delete other running subroutines. Their task handle is saved in a shared data structure, which access controlled by a mutex semaphore.
Every once in a while it appears the vTaskDelete call ends up deleting or blocking the calling task as well. Through debug prints I am sure the task handler passed to the function is neither NULL nor the current task's handler. Is this an expected behaviour?
Re: vTaskDelete sometimes hangs current task
Is it possible the task handle is corrupted and it doesn't point to a valid task at all? This could result in the behaviour you describe.
Are you certain that the main task is blocking in vTaskDelete() and not in some nearby function, for example if a task is deleted while holding the semaphore that protects the shared "task handle info" data then a subsequent attempt to access this will deadlock.
If you are able to post code which exhibits this behaviour, we may be able to give some more detailed suggestions.
Are you certain that the main task is blocking in vTaskDelete() and not in some nearby function, for example if a task is deleted while holding the semaphore that protects the shared "task handle info" data then a subsequent attempt to access this will deadlock.
If you are able to post code which exhibits this behaviour, we may be able to give some more detailed suggestions.
Re: vTaskDelete sometimes hangs current task
This is the specific function that deletes the task:
Before calling vTaskDelete I make sure to take possession of a shared semaphore to avoid the situation you described. Besides, I made semaphore calls non blocking with no difference in behaviour.
By printing the task handles I know they are not corrupted (at least, not immediately before the call) and that they are indeed the pointer to the correct task.
In the next days I'll see if I can make a reduced example reproducible on a common demo board.
Code: Select all
void __delete_task(pcb_t *todel) {
timer_args_t *ptr;
uint8_t map;
ESP_LOGI(TAG, "deleteting process %d (%d) with alarm %i",(uint32_t) todel->task, (uint32_t)xTaskGetCurrentTaskHandle(), todel->alarm);
ptr = (timer_args_t*) todel->args;
takeStateSemaphore();
ESP_LOGI(TAG, "semaphore taken");
vTaskDelete(todel->task); // The following print is never issued
ESP_LOGI(TAG, "deleted");
giveStateSemaphore();
clear_activity_bitmap_output(ptr->bitmap);
map = ptr->dac >= 0 ? 1 << ptr-> dac : 0;
clear_activity_bitmap_dac(map);
// Clear outputs
clear_output_state(ptr->bitmap);
//if (ptr->dac >= 0 && ptr->dac < TOT_DAC)
//update_single_dac_state(ptr->dac, 0);
free(todel->args);
out_procq(&process_list, todel);
free_pcb(todel);
ESP_LOGI(TAG, "freed memory");
}
By printing the task handles I know they are not corrupted (at least, not immediately before the call) and that they are indeed the pointer to the correct task.
In the next days I'll see if I can make a reduced example reproducible on a common demo board.
Re: vTaskDelete sometimes hangs current task
I have tried and apparently the problem is reproducible on a simple ESP32 devkitC. My code uses a lot of peripherals (i2c, SPI and 232 serial), but they can all be ignored.
The error is reproducible by programming a devkit with my program and running the included stress.py Python script (python stress.py --deaf). My board is a slave that answers to serial commands, and the script simply sends a barrage of random orders.
Eventually (it might take a few minutes) an important process is blocked while trying to delete another one. This is evident in line 83, 84 and 85 of asynctasks.c: there is a print before calling vTaskDelete that is reached and one just after that is never seen (when the watchdog reset is eventually triggered).
I'd really appreciate if someone could give it a try and tell me if I'm missing something evident.
The error is reproducible by programming a devkit with my program and running the included stress.py Python script (python stress.py --deaf). My board is a slave that answers to serial commands, and the script simply sends a barrage of random orders.
Eventually (it might take a few minutes) an important process is blocked while trying to delete another one. This is evident in line 83, 84 and 85 of asynctasks.c: there is a print before calling vTaskDelete that is reached and one just after that is never seen (when the watchdog reset is eventually triggered).
I'd really appreciate if someone could give it a try and tell me if I'm missing something evident.
- Attachments
-
- project.tar.gz
- (41.71 KiB) Downloaded 777 times
-
- Posts: 47
- Joined: Thu Dec 20, 2018 9:47 am
Re: vTaskDelete sometimes hangs current task
If the following line is added directly before the call to vTaskDelete, does it print the name of the task that should be removed?
If the correct name is not printed, todel->task doesn't contain the correct handle.
Code: Select all
ESP_LOGI(TAG, "deleteting task with name %s", pcTaskGetTaskName(todel->task));
Re: vTaskDelete sometimes hangs current task
Hi maldus,
Sorry, this is too much code for us to try and use it to debug an OS-level bug. If you have a simple example (maybe by deleting code from this example until it's only a few short source files), then we can happily look at it. But maybe someone else from the forum can help.
I did notice one thing, which is that it's unclear to me that "out_procq()" and "remove_procq()" both delete the entry that they return from the list - the two functions seem to do slightly different things, although maybe the two things are equivalent when considering the linked list structures.
The reason I'm mentioning that is that if there are stale entries in the list of tasks, there could be a race where one task is calling vTaskDelete(NULL) on itself while another task is calling delete_all_tasks() leading to a vTaskDelete(that_task).
Best of luck debugging.
Sorry, this is too much code for us to try and use it to debug an OS-level bug. If you have a simple example (maybe by deleting code from this example until it's only a few short source files), then we can happily look at it. But maybe someone else from the forum can help.
I did notice one thing, which is that it's unclear to me that "out_procq()" and "remove_procq()" both delete the entry that they return from the list - the two functions seem to do slightly different things, although maybe the two things are equivalent when considering the linked list structures.
The reason I'm mentioning that is that if there are stale entries in the list of tasks, there could be a race where one task is calling vTaskDelete(NULL) on itself while another task is calling delete_all_tasks() leading to a vTaskDelete(that_task).
Best of luck debugging.
Re: vTaskDelete sometimes hangs current task
Hi Angus,
I understand and I managed to reduce the project to a smaller one that still displays the problem. If you can let me know whether this is enough. As always, to reproduce the issue you need to program a devkitC (it probably works on other demo boards as well but I haven't tried) and run the script stress.py (python stress.py --deaf); it might take as much as 10 minutes but the wdt eventually triggers.
You are right, there is an issue with delete_all_tasks but it's not related to this one; I almost never used that function and it is missing from this version.
I understand and I managed to reduce the project to a smaller one that still displays the problem. If you can let me know whether this is enough. As always, to reproduce the issue you need to program a devkitC (it probably works on other demo boards as well but I haven't tried) and run the script stress.py (python stress.py --deaf); it might take as much as 10 minutes but the wdt eventually triggers.
You are right, there is an issue with delete_all_tasks but it's not related to this one; I almost never used that function and it is missing from this version.
- Attachments
-
- project.tar.gz
- (24.04 KiB) Downloaded 801 times
Re: vTaskDelete sometimes hangs current task
Sorry, this is still not anything close to the kind of minimal example that we could use to reproduce the problem and show if it's likely to be an ESP-IDF bug. The only way I can debug with this example would be to debug your application logic, tasks framework, etc. We don't have the resources to debug that.
(Given that we have no other bug reports for vTaskDelete() hanging FreeRTOS, the chances that it's a bug somewhere in the application logic is high. It's not guaranteed, but we don't have the resources to debug your app to determine that.)
With a quick look I did see at least one more race condition:
- Various commands may cause the main task to call delete_tasks_by_output() which may cause a task to be deleted by the main task.
- The task itself may time out and decide to call vTaskDelete(NULL) to delete itself.
A race condition where both these things happen at the same time will almost certainly hang FreeRTOS. Suggest changing the structure so either only main task is responsible for stopping tasks, or tasks only ever call vTaskDelete(NULL) by themselves.
(Given that we have no other bug reports for vTaskDelete() hanging FreeRTOS, the chances that it's a bug somewhere in the application logic is high. It's not guaranteed, but we don't have the resources to debug your app to determine that.)
With a quick look I did see at least one more race condition:
- Various commands may cause the main task to call delete_tasks_by_output() which may cause a task to be deleted by the main task.
- The task itself may time out and decide to call vTaskDelete(NULL) to delete itself.
A race condition where both these things happen at the same time will almost certainly hang FreeRTOS. Suggest changing the structure so either only main task is responsible for stopping tasks, or tasks only ever call vTaskDelete(NULL) by themselves.
Re: vTaskDelete sometimes hangs current task
BTW The usual way to structure the kind of "worker tasks" arrangement you're building is not to create/delete tasks at all, but to have a worker pool who can either be idle (blocked receiving from a command queue) or currently working on a command. There's no reason what you're doing can't work, but you have to consider a lot more checks for live/dead tasks and stale handles, bad pointers, etc. compared to having a worker pool where the same tasks are running for the life of the firmware.
Re: vTaskDelete sometimes hangs current task
Yes, I too have since realized this is not the optimal solution. I was convinced I had everything sorted out by using mutex semaphores to regulate access to the worker data structures (the situation you described should not be possible, as only one task can be reading or writing the list of processes at any given time).BTW The usual way to structure the kind of "worker tasks" arrangement you're building is not to create/delete tasks at all, but to have a worker pool who can either be idle (blocked receiving from a command queue) or currently working on a command. There's no reason what you're doing can't work, but you have to consider a lot more checks for live/dead tasks and stale handles, bad pointers, etc. compared to having a worker pool where the same tasks are running for the life of the firmware.
Anyway, the race condition MUST be there somewhere in my code, so I solved the problem by simply notifying each task when it is scheduled for deletion, and then leaving it to them to terminate without further operation.
Next time I'll probably follow your advice from the start.
Who is online
Users browsing this forum: No registered users and 157 guests