Troubleshooting Double Exception

dsuliuno
Posts: 2
Joined: Wed Apr 27, 2022 6:02 am

Troubleshooting Double Exception

Postby dsuliuno » Wed Apr 27, 2022 11:00 pm

I am encountering an intermittent Double Exception fault in my application and would appreciate any suggestions on how to approach debugging this.

I am using esp-idf-v4.4 with some esp-iot-solution components (LVGL and display drivers) on an ESP32 WROOM with 8MB PSRAM enabled. My application is mostly written in C, with some C++. I am using mesh wifi with MQTT (notably, I do not get the double exception when I disable the networking components, though I have not been able to narrow this down more specifically).

I am aware that a Double Exception is the result of an exception occurring within the exception handler, and might be expected to be the result of insufficient stack, or stack corruption.
Accordingly, I have substantially oversized my task stacks. vTaskGetRunTimeStats() indicates I have a >1kB free stack on every task (while operating normally), and >2kB in most cases. My ISR stack is set to 7kB (up from 1.5kB). Further, I am careful in my application not to allocate large chunks of data from the stack. So I am doubtful that the exception is the result of running out of stack (though corruption via overrun is of course still a possibility). I have configured to abort when memory allocation fails (so my exception should not be the result of an unhandled malloc failure), and have heap checking/poisoning set to comprehensive.

Memory is admittedly tight in my application, as I am running mesh wifi, MQTT, LVGL and display drivers, and am storing buffers of data from sensors for processing, however I have >10kB of internal/8bit memory remaining while running normally.

Any suggestions on how to approach locating the source of this fault would be appreciated.

The backtrace of the Double Exception follows:

Code: Select all

Guru Meditation Error: Core  0 panic'ed (Double exception).

Core  0 register dump:
PC      : 0x4008d253  PS      : 0x00040436  A0      : 0x801f06f2  A1      : 0x3ffd06d0
0x4008d253: _xt_context_save at C:/Espressif/frameworks/esp-idf-v4.4/components/freertos/port/xtensa/xtensa_context.S:195

A2      : 0x3f403518  A3      : 0x00000804  A4      : 0x0000faf6  A5      : 0x00002af4
A6      : 0x20202020  A7      : 0x09420920  A8      : 0x800d7f80  A9      : 0x3ffd0680
A10     : 0x00000003  A11     : 0x3f403518  A12     : 0x3f403664  A13     : 0x0000faf6
A14     : 0x3f403518  A15     : 0x00002af4  SAR     : 0x00000004  EXCCAUSE: 0x00000002
EXCVADDR: 0x0000fae6  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd


Backtrace:0x4008d250:0x3ffd06d00x401f06ef:0x3ffd0960 0x401f06ef:0x09343430  |<-CORRUPTED
0x4008d250: _xt_context_save at C:/Espressif/frameworks/esp-idf-v4.4/components/freertos/port/xtensa/xtensa_context.S:194

0x401f06ef: main_task at C:/Espressif/frameworks/esp-idf-v4.4/components/freertos/port/port_common.c:129 (discriminator 2)

0x401f06ef: main_task at C:/Espressif/frameworks/esp-idf-v4.4/components/freertos/port/port_common.c:129 (discriminator 2)
Edit: I would also mention that searching for 'exception' in this forum breaks the search, no result page is given. Presumably an SQL sanitisation issue?

ESP_Sprite
Posts: 9545
Joined: Thu Nov 26, 2015 4:08 am

Re: Troubleshooting Double Exception

Postby ESP_Sprite » Thu Apr 28, 2022 3:00 am

That's a complicated issue... I do think this is a stack issue, though: your exception is in the window spill function, and A1 (the stack pointer) does look valid. If I recall correctly, the window spill logic uses pointers that are located on the stack to see where it needs to write back to-be-spilled registers. (If that does not make sense: it's deeper Xtensa magic that needs a well-formed stack to work.) I think stack checking logic doesn't really help you here, as that is done on a context switch, and it looks like your program crashes before that has the opportunity to finish. (One exception is that you may be able to get somewhere by enabling 'Set a debug watchpoint as a stack overflow check' as that triggers on the actual write... but it'd only trigger if you overflow the stack itself, not if you overflow a buffer allocated on the stack.)

You could maybe try to add a JTAG debugger, inspect the assembly at the offending instruction (it's likely a load or store, from your register dump to the address 0x0000fae6) and see where that comes from exactly. I'd also go over your code with a fine comb to see if you really aren't doing anything funny with stack-allocated variables. Sorry, there's no easy surefire solution to debug this.

dsuliuno
Posts: 2
Joined: Wed Apr 27, 2022 6:02 am

Re: Troubleshooting Double Exception

Postby dsuliuno » Thu Apr 28, 2022 4:08 am

Thanks for the detailed response, thats much what I expected.

You did help me find my fault quickly, so thankyou! Ironically (having spent a week hunting for this fault) the issue was in my memory diagnostic code (which was still hanging around since when I was getting things happy running in the external RAM). The buffer I passed to vTaskGetRunTimeStats() was allocated on the stack, and was insufficient in size (I had more tasks than I realised when I created it, and I never reviewed the size of the buffer). I broke my rule of no buffers on the stack because it was a quick and dirty diagnostic and I thought I had space to spare... Serves me right.

Appreciate the help, the confirmation that it was likely to be stack related helped get my head straight!

ESP_Sprite
Posts: 9545
Joined: Thu Nov 26, 2015 4:08 am

Re: Troubleshooting Double Exception

Postby ESP_Sprite » Thu Apr 28, 2022 5:15 am

Glad I could help you find the issue!

kuttabilla
Posts: 2
Joined: Wed Jul 24, 2024 11:02 am

Re: Troubleshooting Double Exception

Postby kuttabilla » Wed Jul 24, 2024 11:09 am

I am facing a similar issue
I want to receive wifi credentials via Bluetooth and connect to wifi and do an ota update esp32-s3
my serial monitor

Waiting for a client connection to notify...
Received: xxxxxx
SSID: xxxx
Password: xxx
Disconnected from Bluetooth
Connecting to WiFi...
Connected to WiFi
IP Address: x.x.x.x

Core 0 register dump:
PC : 0x3fcf6ab0 PS : 0x00000001 A0 : 0x00060320 A1 : 0x3fcf6b30
A2 : 0x00000000 A3 : 0xffffffff A4 : 0x00000004 A5 : 0x0000cdcd
A6 : 0x00060323 A7 : 0xb33fffff A8 : 0xb33fffff A9 : 0x803816f7
A10 : 0x3fcf6ad0 A11 : 0x3fcb7ef4 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x00060323 A15 : 0xb33fffff SAR : 0xb33fffff EXCCAUSE: 0x00000002
EXCVADDR: 0xffffffe0 LBEG : 0x3fcb7ef4 LEND : 0xffffffff LCOUNT : 0x00000000

Backtrace: 0x3fcf6aad:0x3fcf6b30 |<-CORRUPTED
ELF file SHA256: f9a4798fb8d04e9f

ELF file SHA256: f9a4798fb8d04e9f

Rebooting...
�ESP-ROM:esp32s3-20210327

ESP_Sprite
Posts: 9545
Joined: Thu Nov 26, 2015 4:08 am

Re: Troubleshooting Double Exception

Postby ESP_Sprite » Thu Jul 25, 2024 12:14 am

That is not 'similar' except that you also have a crash. Please post your own topic for this, and we need a fair bit more info to diag this. At the very least, we'll need that backtrace decoded and ideally we'll need a minimum project that shows this issue as well.

kuttabilla
Posts: 2
Joined: Wed Jul 24, 2024 11:02 am

Re: Troubleshooting Double Exception

Postby kuttabilla » Thu Jul 25, 2024 5:25 am

it wont let me create a post of my own

ESP_Sprite
Posts: 9545
Joined: Thu Nov 26, 2015 4:08 am

Re: Troubleshooting Double Exception

Postby ESP_Sprite » Fri Jul 26, 2024 1:53 am

I doubt that, lots of others are able to. Nevertheless, I'll split off your post into its own post if you can add the extra information I mentioned.

Who is online

Users browsing this forum: No registered users and 60 guests