Investigating program panic from xt_highint4

Baoshi
Posts: 22
Joined: Sun Nov 22, 2015 3:30 am

Investigating program panic from xt_highint4

Postby Baoshi » Thu Apr 25, 2024 2:42 am

Hi,
I have a program that runs into a weird situation:

Main component:
ESP32-S3 16MB / No PSRAM, esp-idf 5.1.2, using built-in unity test framework and protobuf-c

Description:
I have 2 test cases covering a protobuf unpack and a protobuf pack. Both test cases run fine individually. But if I run unpack then pack consecutively, TG1WDT_SYS_RST was triggered. It seems the protobuf pack test case is frozen internally and it causes UART interrupt timeout. So here are the investigations:
1. If I disable CONFIG_ESP_INT_WDT, the program freezes.
2. Using JTAG, I can break into when program freezes. It landed in panic_handler.c, line 145

Code: Select all

141:     // For cache error, pause the non-offending core - offending core handles panic
142:        if (panic_get_cause(frame) == PANIC_RSN_CACHEERR && core_id != esp_cache_err_get_cpuid()) {
143:            // Only print the backtrace for the offending core in case of the cache error
144:            g_exc_frames[core_id] = NULL;
145:            while (1) {
146:                ;
147:            }
148:        }
stack trace shows:

Code: Select all

(gdb) bt
#0  panic_handler (frame=<optimized out>, pseudo_excause=<optimized out>)
    at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/panic_handler.c:145
#1  0x40375d9c in panicHandler (frame=0x3fcaab20)
    at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/panic_handler.c:217
#2  0x40375c93 in xt_highint4 ()
    at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/soc/esp32s3/highint_hdl.S:108
#3  0x40040025 in ?? ()
#4  0x42030ed5 in repeated_field_get_packed_size (
    field=0x3c0ff5c0 <sensor_readings.field_descriptors+96>, count=1, 
    member=<optimized out>)
    at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/protobuf-c/protobuf-c/protobuf-c/protobuf-c.c:675
#5  0x42030933 in protobuf_c_message_get_packed_size (message=0x3fcaac70)
    at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/protobuf-c/protobuf-c/protobuf-c/protobuf-c.c:744
#6  0x4202c8a2 in sensor_readings__get_packed_size (message=0x3fcaac70)
    at /home/baoshi/Proj-Git/esp32-firmware/components/protobufs/sensor_readings.pb-c.c:20   
The #3 looks very strange. And since the program went straight into a while(1) loop, no information was printed out.
3. I have tried heap tracking, does not detect any problem. Checking stack high watermark is not showing any problem either. Increase stack to 16K is not helpful.
4. I can solve this program by ANY ONE of the following measures:
a. Set instruction cache to 32KB
b. Set Heap memory debugging to use "Comprehensive" Heap corruption detection (no corruption detected but error is gone)
c. Add some more ESP_LOGI in my protobuf packing code (strange, sometimes add one more space in the string will solve the problem).

However this problem only happens when running unity test and does not happen in my main program, also only happens under certain execution sequence, but 100% reproducible.

My work is not affected yet but this kind of behavior does not give too much confidence that such thing may happen in some corner case.

Any insight will be helpful.

TIA
Baoshi

ESP_Sprite
Posts: 9757
Joined: Thu Nov 26, 2015 4:08 am

Re: Investigating program panic from xt_highint4

Postby ESP_Sprite » Fri Apr 26, 2024 5:02 am

I'm decently sure #3 is the level 4 interrupt handler entry, I don't think that is suspect. From the line it calls the panic handler from, it seems that the interrupt watchdog timed out hard. You're not using anything that can get you into a loop in a critical section?

Baoshi
Posts: 22
Joined: Sun Nov 22, 2015 3:30 am

Re: Investigating program panic from xt_highint4

Postby Baoshi » Sat Apr 27, 2024 9:11 am

No, definitely not using any critical section, I also had interrupt watchdog disabled. The intriguing part is that increase instruction cache or introduce some irrelevant (logging) code within the function call will fix the issue. Is there any other reason level 4 interrupt can trigger other than watchdogs?

Jonathan2892
Posts: 45
Joined: Tue Dec 07, 2021 4:04 pm

Re: Investigating program panic from xt_highint4

Postby Jonathan2892 » Sun May 19, 2024 10:52 am

Hi,

Are there any news on this topic?

Best

Who is online

Users browsing this forum: Baidu [Spider] and 345 guests