Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Thu Nov 28, 2019 4:12 pm

I am occasionally getting a Guru Meditiation Error inside freertos. It is always in the same place (uxListRemove() in freertos list.c at line 218), the backtrace is always the same and the EXCVADDR is always the same (0x00000004).

I'd think it was me but I've COMPLETELY re-written the code that is at the origin of the call and the Guru Meditation Error is now occurring right at the start of my code, just after initialisation, so very little of my code has been executed. This is with headrev ESP-IDF as of 15 October 2019, #2e6398aff.

So I'm clutching at straws: has anyone else seen a Guru Meditation Error of this nature? We are using Wifi in STA mode but absolutely nothing has been done at this point except initialisation. Otherwise just UART, I2C, GPIOs, NVRAM and deep sleep mode, though again nothing has been done aside from initialisation. "overall" stack smash checking is switched on and has not triggered. Unfortunately I can't attach a debugger to my system; we ran out of IO pins.

Code: Select all

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x40090b19  PS      : 0x00060233  A0      : 0x8008f9de  A1      : 0x3ffc0770
0x40090b19: uxListRemove at C:/Users/rob/esp/esp-idf/components/freertos/list.c:218

A2      : 0x3ffba124  A3      : 0x3ffc07f4  A4      : 0x00000004  A5      : 0x3ffb9898
A6      : 0x00000fa2  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffafac8
A10     : 0x3ffc4a40  A11     : 0x00000001  A12     : 0x00000000  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x0000001f  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000004  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

ELF file SHA256: 96369d328cddc0c16505cf1631f9eea363662068eb427e4985a3ce6d5633da86

Backtrace: 0x40090b16:0x3ffc0770 0x4008f9db:0x3ffc0790 0x4008e5b0:0x3ffc07b0 0x400de9e5:0x3ffc07f0 0x400db1be:0x3ffc0820 0x400d435b:0x3ffc0900
0x40090b16: uxListRemove at C:/Users/rob/esp/esp-idf/components/freertos/list.c:214

0x4008f9db: xTaskRemoveFromEventList at C:/Users/rob/esp/esp-idf/components/freertos/tasks.c:3117

0x4008e5b0: xQueueGenericSend at C:/Users/rob/esp/esp-idf/components/freertos/queue.c:773

0x400de9e5: led at c:\users\rob\esp\device-application\build/../main/my_led.c:237

0x400db1be: doWakeUpAndSleep at c:\users\rob\esp\device-application\build/../main/main.c:645
 (inlined by) app_main at c:\users\rob\esp\device-application\build/../main/main.c:832

0x400d435b: main_task at C:/Users/rob/esp/esp-idf/components/esp32/cpu_start.c:569
Rob

ESP_Sprite
Posts: 9770
Joined: Thu Nov 26, 2015 4:08 am

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby ESP_Sprite » Fri Nov 29, 2019 2:53 am

Seems something is beng clobbered. In my esp-idf version, that code reads

Code: Select all

List_t * const pxList = ( List_t * ) pxItemToRemove->pvContainer;

if( pxList->pxIndex == pxItemToRemove ) <-- error is there
Probably, pxItemToRemove has been corrupted and the code tries to dereference that, upsetting the guru. What I think is happening here, going on your backtrace, is that you try to push something into a queue that has another task waiting for it. What happens is that the PCB for that task is stored in the queue, and the code path in your callback tries to take that PCB and put it back onto the list of running tasks. It fails as the queue is corrupted.

One trick you may not have used yet: you know we have a GDB stub that you can select to get invoked on a crash? This can be used to look at the system state when this happens. You can possibly look at the memory around the queue, to see if more data has been damaged and perhaps to figure out that way where the damage came from.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Fri Nov 29, 2019 7:32 am

Hi there, many thanks for the useful hint. I _think_ I've found what it was through code inspection.

All my OS thingies are static to avoid any allocation failure issues. I have a driver which creates a mutex with xSemaphoreCreateMutexStatic(), not checking if the returned mutex is NULL since, as the example in the documentation says, "as no dynamic memory allocation was performed, xSemaphore cannot be NULL, so there is no need to check it". It also didn't bother calling vSemaphoreDelete() on that mutex when it was deinitialised. My code then sometimes goes into deep sleep or sometimes not, depending on how much time is available; sometimes it just loops.

On the occasions when it did NOT go to sleep, it would effectively have been calling xSemaphoreCreateMutexStatic() repeatedly, and I think that did actually result in a failure somewhere. I've modified the code to call vSemaphoreDelete() delete at deinitialisation now. Initial testing suggests that this is good but of course time will tell.

I will update this post when I've done more testing.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Fri Nov 29, 2019 7:46 am

Darn, no, that wasn't it. I will explore further using your hint.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Fri Nov 29, 2019 6:08 pm

Though I can get idf.py monitor to invoke the GDB stub and call GDB, it doesn't stop the target, see below:

Code: Select all

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x40093cc2  PS      : 0x00060433  A0      : 0x8009297d  A1      : 0x3ffc0e20
0x40093cc2: uxListRemove at C:/Users/rob/esp/esp-idf/components/freertos/list.c:218

A2      : 0x3ffba7f4  A3      : 0x3ffc0ecc  A4      : 0x00000004  A5      : 0x3ffb9f68
A6      : 0x00000fa2  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffafb3c
A10     : 0x3ffb3744  A11     : 0x3ffc54b0  A12     : 0x9030d946  A13     : 0x3ffb3744
A14     : 0x00000000  A15     : 0x3ffc3678  SAR     : 0x0000001f  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000004  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

ELF file SHA256: c19e53ecd48fe26835d1cdd53654b9927726b62c3ceee6a39d7a0845fd0f23ce

Backtrace: 0x40093cbf:0x3ffc0e20 0x4009297a:0x3ffc0e50 0x40090ee2:0x3ffc0e80 0x400e2e71:0x3ffc0ec0 0x400de43b:0x3ffc0ef0 0x400d47ff:0x3ffc0fd0
0x40093cbf: uxListRemove at C:/Users/rob/esp/esp-idf/components/freertos/list.c:214
0x4009297a: xTaskRemoveFromEventList at C:/Users/rob/esp/esp-idf/components/freertos/tasks.c:3117
0x40090ee2: xQueueGenericSend at C:/Users/rob/esp/esp-idf/components/freertos/queue.c:773
0x400e2e71: led at c:\users\rob\esp\device-application\build/../main/my_led.c:238
0x400de43b: doWakeUpAndSleep at c:\users\rob\esp\device-application\build/../main/main.c:645
 (inlined by) app_main at c:\users\rob\esp\device-application\build/../main/main.c:832
0x400d47ff: main_task at C:/Users/rob/esp/esp-idf/components/esp32/cpu_start.c:569

Entering gdb stub now.
$T0b#e6GNU gdb (crosstool-NG esp32-2019r1) 8.1.0.20180627-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-host_w64-mingw32 --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from c:\users\rob\esp\device-application\build\device-application.elf...done.
Remote debugging using \\.\COM17
Ignoring packet error, continuing...
warning: unrecognized item "timeout" in "qSupported" response
Ignoring packet error, continuing...
Remote replied unexpectedly to 'vMustReplyEmpty': timeout
I can see that the LED on my target is still being flashed as it cheerfully continues. Any idea why this might be happening?

ESP_Sprite
Posts: 9770
Joined: Thu Nov 26, 2015 4:08 am

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby ESP_Sprite » Sat Nov 30, 2019 3:02 am

Ah, are you using Windows and has your board got the DTR/RTS thing to reset the ESP32? Windows may wiggle those handshake lines to reset the ESP32 in the process of going from the terminal to gdbstub...

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Sat Nov 30, 2019 8:22 am

Yes, it is Windows and with the RTS/CTS reset design. Any workarounds or do I need to get my soldering iron out?

ESP_Sprite
Posts: 9770
Joined: Thu Nov 26, 2015 4:08 am

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby ESP_Sprite » Mon Dec 02, 2019 2:39 am

Not a Windows user here, so if there are any workarounds I wouldn't know them...

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Mon Dec 02, 2019 2:13 pm

OK, I have got my soldering iron out and removed that problem. I've also switched on the FreeRTOS list integrity checking. I can see that the ListItem * passed into uxListRemove has the integrity words on either end of it but that pvContainer in the middle is set to NULL:

Code: Select all

uxListRemove (pxItemToRemove=0x3ffbab2c <gTaskLedBuffer+8>) at C:/Users/rob/esp/esp-idf/components/freertos/list.c:218
218             if( pxList->pxIndex == pxItemToRemove )
(gdb) print (ListItem_t) *pxItemToRemove
$1 = {xListItemIntegrityValue1 = 1515870810, xItemValue = 5806, pxNext = 0x3ffc5aa8, pxPrevious = 0x3ffafb54, pvOwner = 0x3ffbab24 <gTaskLedBuffer>, pvContainer = 0x0,  xListItemIntegrityValue2 = 1515870810}
It seems unlikely that this is corruption, more likely that pvContainer has been set to NULL. Can you see anything wrong with my construction below? Is there any way that the task or queue might not have been instantiated when it is being used?

I initialise the OS items involved as follows (called only from app_main()):

Code: Select all

if (!gInitialised) {
    // Mutex to protect running of the LED task
    gMtxTaskLedRunning = xSemaphoreCreateMutexStatic(&gMtxTaskLedRunningBuffer);
    // Queue to feed the LED task
    gQueueLed = xQueueCreateStatic(LED_QUEUE_LENGTH, sizeof(LedProperties),
                                   gQueueLedStorage, &gQueueLedBuffer);
    // Start the task
    gTaskLedHandle = xTaskCreateStatic(taskLed, "LedTask",
                                       sizeof(gTaskLedStack), NULL, 16,
                                       gTaskLedStack, &gTaskLedBuffer);
    gInitialised = true;
}
...de-initalize them with (called only from app_main()):

Code: Select all

LedProperties properties = LED_TERMINATE_TASK;
    
if (gInitialised) {
    gInitialised = false;
    xQueueSend(gQueueLed, (void *) &properties, (portTickType) portMAX_DELAY);
    xSemaphoreTake(gMtxTaskLedRunning, (portTickType) portMAX_DELAY);
    xSemaphoreGive(gMtxTaskLedRunning);
    vSemaphoreDelete(gMtxTaskLedRunning);
    vQueueDelete(gQueueLed);
}
The task itself is:

Code: Select all

void taskLed(void *pvParameters)
{
    LedProperties properties = LED_OFF;
    LedColour colour;
    int32_t durationMs;

    xSemaphoreTake(gMtxTaskLedRunning, (portTickType) portMAX_DELAY);
    (void) pvParameters;

    while (properties != LED_TERMINATE_TASK) {
        if (xQueueReceive(gQueueLed, (void *) &properties,
                          (portTickType) portMAX_DELAY)) {
            if (properties != LED_TERMINATE_TASK) {
                colour = LED_GET_COLOUR(properties);
                durationMs = LED_GET_DURATION_MS(properties);
                setLed(colour);
                vTaskDelay(durationMs / portTICK_PERIOD_MS);
                setLed(LED_OFF);
                vTaskDelay(LED_DURATION_GAP_MILLISECONDS / portTICK_PERIOD_MS);
            }
        }
    }

    xSemaphoreGive(gMtxTaskLedRunning);

    // Delete ourself: only valid way out in Free RTOS
    vTaskDelete(NULL);
}
...and I operate it with:

Code: Select all

void led(LedColour colour, int32_t durationMs)
{
    LedProperties properties = LED_OFF;

    if (gInitialised) {
        LED_SET_COLOUR(properties, colour);
        LED_SET_DURATION_MS(properties, durationMs);
        xQueueSend(gQueueLed, (void *) &properties, (portTickType) portMAX_DELAY);
    }
}
...which may be called from other tasks but those tasks must have exited before the de-initalize function is called.

RobMeades
Posts: 85
Joined: Thu Nov 29, 2018 1:12 pm

Re: Clutching at straws: guru doesn't like me in uxListRemove() (freertos/list.c: 218)

Postby RobMeades » Mon Dec 02, 2019 5:40 pm

A possible answer (still testing): allow the idle task to run. If I put a vTaskDelay() of 100 ms after deleting my "LED" task then, so far at least, the problem does not occur. Is it possible that FreeRTOS is getting confused when the task is created once more after it has been deleted if FreeRTOS is not allowed to do some form of clean-up first?

Who is online

Users browsing this forum: dzungpv and 83 guests