Investigating program panic from xt_highint4
Posted: Thu Apr 25, 2024 2:42 am
Hi,
I have a program that runs into a weird situation:
Main component:
ESP32-S3 16MB / No PSRAM, esp-idf 5.1.2, using built-in unity test framework and protobuf-c
Description:
I have 2 test cases covering a protobuf unpack and a protobuf pack. Both test cases run fine individually. But if I run unpack then pack consecutively, TG1WDT_SYS_RST was triggered. It seems the protobuf pack test case is frozen internally and it causes UART interrupt timeout. So here are the investigations:
1. If I disable CONFIG_ESP_INT_WDT, the program freezes.
2. Using JTAG, I can break into when program freezes. It landed in panic_handler.c, line 145
stack trace shows:
The #3 looks very strange. And since the program went straight into a while(1) loop, no information was printed out.
3. I have tried heap tracking, does not detect any problem. Checking stack high watermark is not showing any problem either. Increase stack to 16K is not helpful.
4. I can solve this program by ANY ONE of the following measures:
a. Set instruction cache to 32KB
b. Set Heap memory debugging to use "Comprehensive" Heap corruption detection (no corruption detected but error is gone)
c. Add some more ESP_LOGI in my protobuf packing code (strange, sometimes add one more space in the string will solve the problem).
However this problem only happens when running unity test and does not happen in my main program, also only happens under certain execution sequence, but 100% reproducible.
My work is not affected yet but this kind of behavior does not give too much confidence that such thing may happen in some corner case.
Any insight will be helpful.
TIA
Baoshi
I have a program that runs into a weird situation:
Main component:
ESP32-S3 16MB / No PSRAM, esp-idf 5.1.2, using built-in unity test framework and protobuf-c
Description:
I have 2 test cases covering a protobuf unpack and a protobuf pack. Both test cases run fine individually. But if I run unpack then pack consecutively, TG1WDT_SYS_RST was triggered. It seems the protobuf pack test case is frozen internally and it causes UART interrupt timeout. So here are the investigations:
1. If I disable CONFIG_ESP_INT_WDT, the program freezes.
2. Using JTAG, I can break into when program freezes. It landed in panic_handler.c, line 145
Code: Select all
141: // For cache error, pause the non-offending core - offending core handles panic
142: if (panic_get_cause(frame) == PANIC_RSN_CACHEERR && core_id != esp_cache_err_get_cpuid()) {
143: // Only print the backtrace for the offending core in case of the cache error
144: g_exc_frames[core_id] = NULL;
145: while (1) {
146: ;
147: }
148: }
Code: Select all
(gdb) bt
#0 panic_handler (frame=<optimized out>, pseudo_excause=<optimized out>)
at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/panic_handler.c:145
#1 0x40375d9c in panicHandler (frame=0x3fcaab20)
at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/panic_handler.c:217
#2 0x40375c93 in xt_highint4 ()
at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/esp_system/port/soc/esp32s3/highint_hdl.S:108
#3 0x40040025 in ?? ()
#4 0x42030ed5 in repeated_field_get_packed_size (
field=0x3c0ff5c0 <sensor_readings.field_descriptors+96>, count=1,
member=<optimized out>)
at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/protobuf-c/protobuf-c/protobuf-c/protobuf-c.c:675
#5 0x42030933 in protobuf_c_message_get_packed_size (message=0x3fcaac70)
at /home/baoshi/Workspace/ESP32/esp-idf-v5.1/components/protobuf-c/protobuf-c/protobuf-c/protobuf-c.c:744
#6 0x4202c8a2 in sensor_readings__get_packed_size (message=0x3fcaac70)
at /home/baoshi/Proj-Git/esp32-firmware/components/protobufs/sensor_readings.pb-c.c:20
3. I have tried heap tracking, does not detect any problem. Checking stack high watermark is not showing any problem either. Increase stack to 16K is not helpful.
4. I can solve this program by ANY ONE of the following measures:
a. Set instruction cache to 32KB
b. Set Heap memory debugging to use "Comprehensive" Heap corruption detection (no corruption detected but error is gone)
c. Add some more ESP_LOGI in my protobuf packing code (strange, sometimes add one more space in the string will solve the problem).
However this problem only happens when running unity test and does not happen in my main program, also only happens under certain execution sequence, but 100% reproducible.
My work is not affected yet but this kind of behavior does not give too much confidence that such thing may happen in some corner case.
Any insight will be helpful.
TIA
Baoshi