ESP32-S3 dead core?
Posted: Fri Nov 01, 2024 5:53 pm
I'm analyzing a factory prototype with ESP32-S3 that wasn't booting. It seemed to fail just after enabling the second core(Core1), but before my application code runs. I was able to read and write the flash with ESPtool. I confirmed the images were correct, and that the efuses matched a working board(except normally different fuses like LDO cal, MAC, ADC cal). Using ESP-IDF v5.1.2
I checked the power rails and reset signal - all good. I turned down CPU from 240MHz to 80MHz, still didn't work. I turned flash down from QIO to DOUT in case one of the upper 2 bus pins had failed, still didn't work. But then I set CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE=y and CONFIG_FREERTOS_UNICORE=y, and the chip booted and worked properly(just slower).
Here is a log from the failing case:
But after this, the failures are quite variable. Sometimes's it's watchdog timeout, sometimes it's IPC1 stack overflow, sometimes it just hangs.
I also tried the esp_timer example, works on the good board, watchdog timer interrupt on the misbehaving board.
After seeing the stack overflows on my firmware, I increased the IPC_STACK_SIZE from 1280, to 2048, 4096, then 8192, but still get this error. I'm starting to wonder if this is not a stack overflow, but something else.
I copied the flash from the problematic device to a few others, and they worked fine. I suppose it's possible that a timing-dependent interrupt is occurring, depending on which task is running when the interrupt fires perhaps it does or does not run out of stack?
I believe the IPC tasks are only used in SMP mode, so if it is stack overflow, that might make sense. But it's odd the examples don't work. Do you know what might be the issue? Or, do you have any ideas for things to test? I'm hoping this is limited to a single PCB but I don't trust that until I know what happened to it.
I checked the power rails and reset signal - all good. I turned down CPU from 240MHz to 80MHz, still didn't work. I turned flash down from QIO to DOUT in case one of the upper 2 bus pins had failed, still didn't work. But then I set CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE=y and CONFIG_FREERTOS_UNICORE=y, and the chip booted and worked properly(just slower).
Here is a log from the failing case:
Code: Select all
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x1380
load:0x403c9700,len:0x4
load:0x403c9704,len:0x181c
load:0x403cc700,len:0x2834
entry 0x403c9a90
[boot] call_start_cpu0:207 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0:217 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] reset reason:21
[boot] call_start_cpu0:222 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0:232 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0_por:81 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0_por:83 mask=0x000f u=0x0 cmd=0x0004 pass=0xfff0bffb crashed=1
[boot] call_start_cpu0:276 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] booting index: 1
I (159) cpu_start: Multicore app
I (159) cpu_start: Pro cpu up.
I (159) cpu_start: Starting app cpu, entry point is 0x40375500
I (0) cpu_start: App cpu up.
I (177) cpu_start: Pro cpu start user code
I (177) cpu_start: cpu freq: 240000000 Hz
I (177) cpu_start: Application information:
I (180) cpu_start: Project name: blitz_boot
I (185) cpu_start: App version: 0.4.2
I (190) cpu_start: Compile time: Nov 1 2024 12:15:05
I (196) cpu_start: ELF file SHA256: 53a8ca17209ecd61...
I (202) cpu_start: ESP-IDF: v5.1.2-12-ga1d649238b
I (208) cpu_start: Min chip rev: v0.0
I (213) cpu_start: Max chip rev: v0.99
I (218) cpu_start: Chip rev: v0.2
I (222) heap_init: Initializing. RAM available for dynamic allocation:
I (230) heap_init: At 3FCA09D0 len 00048D40 (291 KiB): DRAM
I (236) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (242) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (249) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM
I (255) spi_flash: detected chip: gd
I (259) spi_flash: flash io: qio
I (263) sleep: Configure to isolate all GPIO pins in sleep state
I (270) sleep: Enable automatic switching of GPIO sleep configuration
I (277) coexist: coex firmware version: b6d5e8c
I (282) coexist: coexist rom version e7ae62f
I (287) app_start: Starting scheduler on CPU0
I (292) app_start: Starting scheduler on CPU1
Code: Select all
***ERROR*** A stack overflow in task ipc1 has been detected.
Backtrace: 0x40379fc6:0x3fced590 0x40388049:0x3fced5b0 0x4038afa1:0x3fced5d0 0x40389843:0x3fced650 0x4038b108:0x3fced670 0x4038ac95:0x3fced690 0x40379ff7:0x3fced6b0 0x40379ab6:0x3fced6d0 0x40043a43:0x3fced6f0 |<-CORRUPTED
0x40379fc6: panic_abort at /COMPONENT_ESP_SYSTEM_DIR/panic.c:452
0x40388049: esp_system_abort at /COMPONENT_ESP_SYSTEM_DIR/port/esp_system_chip.c:84
0x4038afa1: vApplicationStackOverflowHook at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/port.c:581
0x40389843: vTaskSwitchContext at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/tasks.c:3728
0x4038b108: _frxt_dispatch at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/portasm.S:450
0x4038ac95: xPortStartScheduler at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/port.c:140
0x40379ff7: start_cpu_other_cores_default at /COMPONENT_ESP_SYSTEM_DIR/startup.c:255
0x40379ab6: call_start_cpu1 at /COMPONENT_ESP_SYSTEM_DIR/port/cpu_start.c:203
0x40043a43: main in ROM
ELF file SHA256: ea8855bf1d547f5a
I (326) esp_core_dump_flash: Save core dump to flash...
I (332) esp_core_dump_common: Backing up stack @ 0x3fced3b0 and use core dump stack @ 0x3fca8190
I (341) esp_core_dump_flash: Erase flash 16384 bytes @ 0x290000
Guru Meditation Error: Core 1 panic'ed (IllegalInstruction). Exception was unhandled.
Core 1 register dump:
PC : 0x00000000 PS : 0x00000000 A0 : 0x00000000 A1 : 0x00000000
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x00000000
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000000 EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000 LBEG : 0x3c3a6dd8 LEND : 0x00000000 LCOUNT : 0x00000000
���
Code: Select all
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x3 (RTC_SW_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)
Saved PC:0x403757e0
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x1760
load:0x403c9700,len:0x4
load:0x403c9704,len:0xc00
load:0x403cc700,len:0x2e04
entry 0x403c9908
I (26) boot: ESP-IDF v5.1.2-12-ga1d649238b 2nd stage bootloader
I (26) boot: compile time Nov 1 2024 12:49:14
I (27) boot: Multicore bootloader
I (31) boot: chip revision: v0.2
I (34) boot.esp32s3: Boot SPI Speed : 80MHz
I (39) boot.esp32s3: SPI Mode : DIO
I (44) boot.esp32s3: SPI Flash Size : 2MB
I (49) boot: Enabling RNG early entropy source...
I (54) boot: Partition Table:
I (58) boot: ## Label Usage Type ST Offset Length
I (65) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (72) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (80) boot: 2 factory factory app 00 00 00010000 00100000
I (87) boot: End of partition table
I (91) esp_image: segment 0: paddr=00010020 vaddr=3c020020 size=0a7f8h ( 43000) map
I (108) esp_image: segment 1: paddr=0001a820 vaddr=3fc92700 size=0290ch ( 10508) load
I (111) esp_image: segment 2: paddr=0001d134 vaddr=40374000 size=02ee4h ( 12004) load
I (120) esp_image: segment 3: paddr=00020020 vaddr=42000020 size=1cac8h (117448) map
I (146) esp_image: segment 4: paddr=0003caf0 vaddr=40376ee4 size=0b7b8h ( 47032) load
I (157) esp_image: segment 5: paddr=000482b0 vaddr=600fe000 size=0005ch ( 92) load
I (164) boot: Loaded app from partition at offset 0x10000
I (164) boot: Disabling RNG early entropy source...
I (177) cpu_start: Multicore app
I (178) cpu_start: Pro cpu up.
I (178) cpu_start: Starting app cpu, entry point is 0x40375414
I (0) cpu_start: App cpu up.
I (196) cpu_start: Pro cpu start user code
I (196) cpu_start: cpu freq: 160000000 Hz
I (196) cpu_start: Application information:
I (199) cpu_start: Project name: esp_timer_example
I (205) cpu_start: App version: ChessUp2_0.4.9-2-ga1d649238b
I (212) cpu_start: Compile time: Nov 1 2024 12:48:53
I (218) cpu_start: ELF file SHA256: 8347b44a570efd71...
I (224) cpu_start: ESP-IDF: v5.1.2-12-ga1d649238b
I (230) cpu_start: Min chip rev: v0.0
I (234) cpu_start: Max chip rev: v0.99
I (239) cpu_start: Chip rev: v0.2
I (244) heap_init: Initializing. RAM available for dynamic allocation:
I (251) heap_init: At 3FC95898 len 00053E78 (335 KiB): DRAM
I (257) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (264) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (270) heap_init: At 600FE05C len 00001F8C (7 KiB): RTCRAM
I (278) spi_flash: detected chip: gd
I (281) spi_flash: flash io: dio
W (285) spi_flash: Detected size(16384k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (298) sleep: Configure to isolate all GPIO pins in sleep state
I (305) sleep: Enable automatic switching of GPIO sleep configuration
I (312) app_start: Starting scheduler on CPU0
I (317) app_start: Starting scheduler on CPU1
I (317) main_task: Started on CPU0
0x403757e0: esp_restart_noos_dig at C:/Espressif/frameworks/esp-idf-v5.1.2/components/esp_system/port/esp_system_chip.c:57 (discriminator 1)
0x40375414: call_start_cpu1 at C:/Espressif/frameworks/esp-idf-v5.1.2/components/esp_system/port/cpu_start.c:157
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0).
Core 0 register dump:
PC : 0x4201c281 PS : 0x00060a34 A0 : 0x8037d1cc A1 : 0x3fc993f0
0x4201c281: main_task at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/app_startup.c:164 (discriminator 1)
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x00000001 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x3fc993d0
A10 : 0x00000000 A11 : 0x3fc950f0 A12 : 0x3c023ff8 A13 : 0x0000013d
A14 : 0x3c023fec A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x400556d5 LEND : 0x400556e5 LCOUNT : 0xfffffffd
0x400556d5: strlen in ROM
0x400556e5: strlen in ROM
Backtrace: 0x4201c27e:0x3fc993f0 0x4037d1c9:0x3fc99420
0x4201c27e: main_task at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/app_startup.c:163
0x4037d1c9: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162
I copied the flash from the problematic device to a few others, and they worked fine. I suppose it's possible that a timing-dependent interrupt is occurring, depending on which task is running when the interrupt fires perhaps it does or does not run out of stack?
I believe the IPC tasks are only used in SMP mode, so if it is stack overflow, that might make sense. But it's odd the examples don't work. Do you know what might be the issue? Or, do you have any ideas for things to test? I'm hoping this is limited to a single PCB but I don't trust that until I know what happened to it.