ESP32-S3 dead core?

Bryght-Richard
Posts: 49
Joined: Thu Feb 22, 2024 3:59 pm

ESP32-S3 dead core?

Postby Bryght-Richard » Fri Nov 01, 2024 5:53 pm

I'm analyzing a factory prototype with ESP32-S3 that wasn't booting. It seemed to fail just after enabling the second core(Core1), but before my application code runs. I was able to read and write the flash with ESPtool. I confirmed the images were correct, and that the efuses matched a working board(except normally different fuses like LDO cal, MAC, ADC cal). Using ESP-IDF v5.1.2

I checked the power rails and reset signal - all good. I turned down CPU from 240MHz to 80MHz, still didn't work. I turned flash down from QIO to DOUT in case one of the upper 2 bus pins had failed, still didn't work. But then I set CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE=y and CONFIG_FREERTOS_UNICORE=y, and the chip booted and worked properly(just slower).

Here is a log from the failing case:

Code: Select all

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x1380
load:0x403c9700,len:0x4
load:0x403c9704,len:0x181c
load:0x403cc700,len:0x2834
entry 0x403c9a90
[boot] call_start_cpu0:207 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0:217 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] reset reason:21
[boot] call_start_cpu0:222 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0:232 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0_por:81 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] call_start_cpu0_por:83 mask=0x000f u=0x0 cmd=0x0004 pass=0xfff0bffb crashed=1
[boot] call_start_cpu0:276 mask=0x000f u=0x0 cmd=0x0001 pass=0xfff0bffe crashed=1
[boot] booting index: 1
I (159) cpu_start: Multicore app
I (159) cpu_start: Pro cpu up.
I (159) cpu_start: Starting app cpu, entry point is 0x40375500
I (0) cpu_start: App cpu up.
I (177) cpu_start: Pro cpu start user code
I (177) cpu_start: cpu freq: 240000000 Hz
I (177) cpu_start: Application information:
I (180) cpu_start: Project name:     blitz_boot
I (185) cpu_start: App version:      0.4.2
I (190) cpu_start: Compile time:     Nov  1 2024 12:15:05
I (196) cpu_start: ELF file SHA256:  53a8ca17209ecd61...
I (202) cpu_start: ESP-IDF:          v5.1.2-12-ga1d649238b
I (208) cpu_start: Min chip rev:     v0.0
I (213) cpu_start: Max chip rev:     v0.99
I (218) cpu_start: Chip rev:         v0.2
I (222) heap_init: Initializing. RAM available for dynamic allocation:
I (230) heap_init: At 3FCA09D0 len 00048D40 (291 KiB): DRAM
I (236) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (242) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (249) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM
I (255) spi_flash: detected chip: gd
I (259) spi_flash: flash io: qio
I (263) sleep: Configure to isolate all GPIO pins in sleep state
I (270) sleep: Enable automatic switching of GPIO sleep configuration
I (277) coexist: coex firmware version: b6d5e8c
I (282) coexist: coexist rom version e7ae62f
I (287) app_start: Starting scheduler on CPU0
I (292) app_start: Starting scheduler on CPU1
But after this, the failures are quite variable. Sometimes's it's watchdog timeout, sometimes it's IPC1 stack overflow, sometimes it just hangs.

Code: Select all

***ERROR*** A stack overflow in task ipc1 has been detected.


Backtrace: 0x40379fc6:0x3fced590 0x40388049:0x3fced5b0 0x4038afa1:0x3fced5d0 0x40389843:0x3fced650 0x4038b108:0x3fced670 0x4038ac95:0x3fced690 0x40379ff7:0x3fced6b0 0x40379ab6:0x3fced6d0 0x40043a43:0x3fced6f0 |<-CORRUPTED
0x40379fc6: panic_abort at /COMPONENT_ESP_SYSTEM_DIR/panic.c:452
0x40388049: esp_system_abort at /COMPONENT_ESP_SYSTEM_DIR/port/esp_system_chip.c:84
0x4038afa1: vApplicationStackOverflowHook at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/port.c:581
0x40389843: vTaskSwitchContext at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/tasks.c:3728
0x4038b108: _frxt_dispatch at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/portasm.S:450
0x4038ac95: xPortStartScheduler at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/port.c:140
0x40379ff7: start_cpu_other_cores_default at /COMPONENT_ESP_SYSTEM_DIR/startup.c:255
0x40379ab6: call_start_cpu1 at /COMPONENT_ESP_SYSTEM_DIR/port/cpu_start.c:203
0x40043a43: main in ROM


ELF file SHA256: ea8855bf1d547f5a

I (326) esp_core_dump_flash: Save core dump to flash...
I (332) esp_core_dump_common: Backing up stack @ 0x3fced3b0 and use core dump stack @ 0x3fca8190
I (341) esp_core_dump_flash: Erase flash 16384 bytes @ 0x290000
Guru Meditation Error: Core  1 panic'ed (IllegalInstruction). Exception was unhandled.

Core  1 register dump:
PC      : 0x00000000  PS      : 0x00000000  A0      : 0x00000000  A1      : 0x00000000
A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x00000000
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00000000  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x3c3a6dd8  LEND    : 0x00000000  LCOUNT  : 0x00000000
���
I also tried the esp_timer example, works on the good board, watchdog timer interrupt on the misbehaving board.

Code: Select all

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x3 (RTC_SW_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)
Saved PC:0x403757e0
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x1760
load:0x403c9700,len:0x4
load:0x403c9704,len:0xc00
load:0x403cc700,len:0x2e04
entry 0x403c9908
I (26) boot: ESP-IDF v5.1.2-12-ga1d649238b 2nd stage bootloader
I (26) boot: compile time Nov  1 2024 12:49:14
I (27) boot: Multicore bootloader
I (31) boot: chip revision: v0.2
I (34) boot.esp32s3: Boot SPI Speed : 80MHz
I (39) boot.esp32s3: SPI Mode       : DIO
I (44) boot.esp32s3: SPI Flash Size : 2MB
I (49) boot: Enabling RNG early entropy source...
I (54) boot: Partition Table:
I (58) boot: ## Label            Usage          Type ST Offset   Length
I (65) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (72) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (80) boot:  2 factory          factory app      00 00 00010000 00100000
I (87) boot: End of partition table
I (91) esp_image: segment 0: paddr=00010020 vaddr=3c020020 size=0a7f8h ( 43000) map
I (108) esp_image: segment 1: paddr=0001a820 vaddr=3fc92700 size=0290ch ( 10508) load
I (111) esp_image: segment 2: paddr=0001d134 vaddr=40374000 size=02ee4h ( 12004) load
I (120) esp_image: segment 3: paddr=00020020 vaddr=42000020 size=1cac8h (117448) map
I (146) esp_image: segment 4: paddr=0003caf0 vaddr=40376ee4 size=0b7b8h ( 47032) load
I (157) esp_image: segment 5: paddr=000482b0 vaddr=600fe000 size=0005ch (    92) load
I (164) boot: Loaded app from partition at offset 0x10000
I (164) boot: Disabling RNG early entropy source...
I (177) cpu_start: Multicore app
I (178) cpu_start: Pro cpu up.
I (178) cpu_start: Starting app cpu, entry point is 0x40375414
I (0) cpu_start: App cpu up.
I (196) cpu_start: Pro cpu start user code
I (196) cpu_start: cpu freq: 160000000 Hz
I (196) cpu_start: Application information:
I (199) cpu_start: Project name:     esp_timer_example
I (205) cpu_start: App version:      ChessUp2_0.4.9-2-ga1d649238b
I (212) cpu_start: Compile time:     Nov  1 2024 12:48:53
I (218) cpu_start: ELF file SHA256:  8347b44a570efd71...
I (224) cpu_start: ESP-IDF:          v5.1.2-12-ga1d649238b
I (230) cpu_start: Min chip rev:     v0.0
I (234) cpu_start: Max chip rev:     v0.99
I (239) cpu_start: Chip rev:         v0.2
I (244) heap_init: Initializing. RAM available for dynamic allocation:
I (251) heap_init: At 3FC95898 len 00053E78 (335 KiB): DRAM
I (257) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (264) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (270) heap_init: At 600FE05C len 00001F8C (7 KiB): RTCRAM
I (278) spi_flash: detected chip: gd
I (281) spi_flash: flash io: dio
W (285) spi_flash: Detected size(16384k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (298) sleep: Configure to isolate all GPIO pins in sleep state
I (305) sleep: Enable automatic switching of GPIO sleep configuration
I (312) app_start: Starting scheduler on CPU0
I (317) app_start: Starting scheduler on CPU1
I (317) main_task: Started on CPU0
0x403757e0: esp_restart_noos_dig at C:/Espressif/frameworks/esp-idf-v5.1.2/components/esp_system/port/esp_system_chip.c:57 (discriminator 1)

0x40375414: call_start_cpu1 at C:/Espressif/frameworks/esp-idf-v5.1.2/components/esp_system/port/cpu_start.c:157

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0).

Core  0 register dump:
PC      : 0x4201c281  PS      : 0x00060a34  A0      : 0x8037d1cc  A1      : 0x3fc993f0
0x4201c281: main_task at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/app_startup.c:164 (discriminator 1)

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x00000000
A6      : 0x00000001  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3fc993d0
A10     : 0x00000000  A11     : 0x3fc950f0  A12     : 0x3c023ff8  A13     : 0x0000013d
A14     : 0x3c023fec  A15     : 0x00000000  SAR     : 0x00000019  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffd
0x400556d5: strlen in ROM
0x400556e5: strlen in ROM



Backtrace: 0x4201c27e:0x3fc993f0 0x4037d1c9:0x3fc99420
0x4201c27e: main_task at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/app_startup.c:163
0x4037d1c9: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v5.1.2/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162
After seeing the stack overflows on my firmware, I increased the IPC_STACK_SIZE from 1280, to 2048, 4096, then 8192, but still get this error. I'm starting to wonder if this is not a stack overflow, but something else.

I copied the flash from the problematic device to a few others, and they worked fine. I suppose it's possible that a timing-dependent interrupt is occurring, depending on which task is running when the interrupt fires perhaps it does or does not run out of stack?

I believe the IPC tasks are only used in SMP mode, so if it is stack overflow, that might make sense. But it's odd the examples don't work. Do you know what might be the issue? Or, do you have any ideas for things to test? I'm hoping this is limited to a single PCB but I don't trust that until I know what happened to it.

MicroController
Posts: 1844
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: ESP32-S3 dead core?

Postby MicroController » Fri Nov 01, 2024 10:48 pm

Power supply (soldering, capacitors,...) issue perhaps, playing tricks on you when the second core goes active?
Or maybe actually a defective piece.

Bryght-Richard
Posts: 49
Joined: Thu Feb 22, 2024 3:59 pm

Re: ESP32-S3 dead core?

Postby Bryght-Richard » Mon Nov 04, 2024 7:51 pm

> Power supply (soldering, capacitors,...) issue perhaps, playing tricks on you when the second core goes active?

I monitored it with a scope and AC trigger, and power looks good. This was my first thought too, but doesn't seem to be the issue.

> Or maybe actually a defective piece.

I was able to find our test logs, it programmed and passed self-test in our factory(which uses both cores). Maybe a latent failure?

Who is online

Users browsing this forum: No registered users and 42 guests