Suggestions for finding stack/program counter corruption

longtimer
Posts: 9
Joined: Wed Sep 27, 2017 3:18 pm

Suggestions for finding stack/program counter corruption

Postby longtimer » Thu Dec 07, 2017 10:17 pm

Hello,

I am hitting an infrequent problem that occurs once or twice per day and causes the system to reboot. I get a back trace but the addresses are nonsensical and random, indicating that the program counter is corrupted. I was looking for method/tools to debug the problem but have been unable to find what I need. I am wondering if anyone knows of options/techniques to do this on the ESP32. So far, I have:

1) looked at the FreeRTOS macro hooks but saw nothing that matched stack checking
2) looked for compiler options for protecting the stack with no luck.
3) looked for libraries/tool kits for instrumenting this

I am running on custom hardware so only have a UART for accessing the module so gdb doesn't seem a option.

Any suggestions would be appreciated.

Thanks,

Jason

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Suggestions for finding stack/program counter corruption

Postby ESP_Angus » Thu Dec 07, 2017 10:24 pm

You can set stack overflow checking in menuconfig, as documented here:
http://esp-idf.readthedocs.io/en/latest ... ckoverflow

The most useful option to enable is probably "Set a debug watchpoint at end of stack" which will cause an immediate debug exception if the task uses too much stack.

However, these options only detect if the task overflows its entire allocated stack memory region.

To detect stack smashing (which may be what causes this), the IDF master branch (and V3.0 once available) can enable stack smashing protection. You can find this in menuconfig under "Compiler Options".

longtimer
Posts: 9
Joined: Wed Sep 27, 2017 3:18 pm

Re: Suggestions for finding stack/program counter corruption

Postby longtimer » Thu Dec 07, 2017 11:11 pm

Thanks for the response.

I have stack smashing detection/protection enabled and that worked as expected when I overwrote the end of a char array while logging so I know that that works. Unfortunately, that is never triggered in this situation.

I also have checking of the stack size but it is not a stack overflow issue to the best of my knowledge. Right up until the situation occurs, there is plenty of stack. That said, I will enable the watchpoint to see if it is a stack overflow.

I am wondering if there are any other options/tools to get information when the exception handler is called because both the PC and A0 registers contain nonsensical values when this occurs.

Thanks,

Jason

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Suggestions for finding stack/program counter corruption

Postby ESP_Angus » Fri Dec 08, 2017 12:03 am

longtimer wrote: I am wondering if there are any other options/tools to get information when the exception handler is called because both the PC and A0 registers contain nonsensical values when this occurs.
JTAG (which I saw you've ruled out) is probably the best option, although even at the point it breaks into the debugger it may have lost relevant data. The other registers in the crash dump may also give you some clues, depending on what else you know about the program state when it crashes.

Depending on how badly broken the system is when it crashes, the core dump feature may be of use:
http://esp-idf.readthedocs.io/en/latest ... _dump.html

Are you able to post the crash dump (including decoded source lines for addresses which appear in any other registers) for us to take a look at? Just in case anything jumps out from it.

Who is online

Users browsing this forum: No registered users and 353 guests