Hello,
I am hitting an infrequent problem that occurs once or twice per day and causes the system to reboot. I get a back trace but the addresses are nonsensical and random, indicating that the program counter is corrupted. I was looking for method/tools to debug the problem but have been unable to find what I need. I am wondering if anyone knows of options/techniques to do this on the ESP32. So far, I have:
1) looked at the FreeRTOS macro hooks but saw nothing that matched stack checking
2) looked for compiler options for protecting the stack with no luck.
3) looked for libraries/tool kits for instrumenting this
I am running on custom hardware so only have a UART for accessing the module so gdb doesn't seem a option.
Any suggestions would be appreciated.
Thanks,
Jason
Suggestions for finding stack/program counter corruption
Re: Suggestions for finding stack/program counter corruption
You can set stack overflow checking in menuconfig, as documented here:
http://esp-idf.readthedocs.io/en/latest ... ckoverflow
The most useful option to enable is probably "Set a debug watchpoint at end of stack" which will cause an immediate debug exception if the task uses too much stack.
However, these options only detect if the task overflows its entire allocated stack memory region.
To detect stack smashing (which may be what causes this), the IDF master branch (and V3.0 once available) can enable stack smashing protection. You can find this in menuconfig under "Compiler Options".
http://esp-idf.readthedocs.io/en/latest ... ckoverflow
The most useful option to enable is probably "Set a debug watchpoint at end of stack" which will cause an immediate debug exception if the task uses too much stack.
However, these options only detect if the task overflows its entire allocated stack memory region.
To detect stack smashing (which may be what causes this), the IDF master branch (and V3.0 once available) can enable stack smashing protection. You can find this in menuconfig under "Compiler Options".
Re: Suggestions for finding stack/program counter corruption
Thanks for the response.
I have stack smashing detection/protection enabled and that worked as expected when I overwrote the end of a char array while logging so I know that that works. Unfortunately, that is never triggered in this situation.
I also have checking of the stack size but it is not a stack overflow issue to the best of my knowledge. Right up until the situation occurs, there is plenty of stack. That said, I will enable the watchpoint to see if it is a stack overflow.
I am wondering if there are any other options/tools to get information when the exception handler is called because both the PC and A0 registers contain nonsensical values when this occurs.
Thanks,
Jason
I have stack smashing detection/protection enabled and that worked as expected when I overwrote the end of a char array while logging so I know that that works. Unfortunately, that is never triggered in this situation.
I also have checking of the stack size but it is not a stack overflow issue to the best of my knowledge. Right up until the situation occurs, there is plenty of stack. That said, I will enable the watchpoint to see if it is a stack overflow.
I am wondering if there are any other options/tools to get information when the exception handler is called because both the PC and A0 registers contain nonsensical values when this occurs.
Thanks,
Jason
Re: Suggestions for finding stack/program counter corruption
JTAG (which I saw you've ruled out) is probably the best option, although even at the point it breaks into the debugger it may have lost relevant data. The other registers in the crash dump may also give you some clues, depending on what else you know about the program state when it crashes.longtimer wrote: I am wondering if there are any other options/tools to get information when the exception handler is called because both the PC and A0 registers contain nonsensical values when this occurs.
Depending on how badly broken the system is when it crashes, the core dump feature may be of use:
http://esp-idf.readthedocs.io/en/latest ... _dump.html
Are you able to post the crash dump (including decoded source lines for addresses which appear in any other registers) for us to take a look at? Just in case anything jumps out from it.
Who is online
Users browsing this forum: Bing [Bot] and 52 guests