Boot Failure from v3 bootloader into v5 app
Posted: Thu Nov 30, 2023 3:49 pm
We have many ESP32-WROVER-B (100k online, many more manufactured) with various revisions of IDF v3 bootloaders. We've been happily shipping firmware updates for ~5 years, all with IDF v3.x apps. We are now testing an IDF v5.x-based app, and sadly running into some bootloops which look like bootloader incompatibility. We could use some pointers on how to get v5 OTAs stable.
----
The bootloop does not occur for every v5 app image we build. We found that the bootloop is sensitive to seemingly minor changes like upgrading our Ninja version or adding a header file, but it then reoccurs in a later build. When we have a failing app image, it always fails on units with a certain version of bootloader, and the issue follows the a full firmware image rather than following the hardware.
I attach GDB screengrabs from a working and non-working image, where we are trying to determine root cause. The working image was built using a newer Ninja version. We were surprised that changing the Ninja version would change the Xtensa assembly for this `bootloader_init_mem` function! You can see that it crashes at an `lsx` instruction at `bootloader_init_mem+6`, but in the working version there's a `memw` instruction in the same place. Also, looking at where those `call` operations are pointing, the non-working image doesn't make much sense.
IDF version of problematic bootloaders: v3.1.2-1-g92456f014e
Current IDF app commit: v5.1.1-910-g93b9ee2315
We are aware of several menuconfig options related to bootloader-app compatibility (APP_COMPATIBLE_PRE_V3_1_BOOTLOADERS and APP_COMPATIBLE_PRE_V2_1_BOOTLOADERS) and we have those options *disabled* because they seem to apply to older bootloaders than our oldest.
Serial log from one such bootloop follows:
----
The bootloop does not occur for every v5 app image we build. We found that the bootloop is sensitive to seemingly minor changes like upgrading our Ninja version or adding a header file, but it then reoccurs in a later build. When we have a failing app image, it always fails on units with a certain version of bootloader, and the issue follows the a full firmware image rather than following the hardware.
I attach GDB screengrabs from a working and non-working image, where we are trying to determine root cause. The working image was built using a newer Ninja version. We were surprised that changing the Ninja version would change the Xtensa assembly for this `bootloader_init_mem` function! You can see that it crashes at an `lsx` instruction at `bootloader_init_mem+6`, but in the working version there's a `memw` instruction in the same place. Also, looking at where those `call` operations are pointing, the non-working image doesn't make much sense.
IDF version of problematic bootloaders: v3.1.2-1-g92456f014e
Current IDF app commit: v5.1.1-910-g93b9ee2315
We are aware of several menuconfig options related to bootloader-app compatibility (APP_COMPATIBLE_PRE_V3_1_BOOTLOADERS and APP_COMPATIBLE_PRE_V2_1_BOOTLOADERS) and we have those options *disabled* because they seem to apply to older bootloaders than our oldest.
Serial log from one such bootloop follows:
Code: Select all
I (31) boot: ESP-IDF v3.1-beta1-612-g92456f0 2nd stage bootloader
I (31) boot: compile time 19:53:59
I (31) boot: Enabling RNG early entropy source...
I (31) boot: SPI Speed : 40MHz
I (32) boot: SPI Mode : DIO
I (32) boot: SPI Flash Size : 16MB
I (33) boot: ########
I (33) boot: ######## BOND-CORE ESP32 Bootloader
I (34) boot: ######## TARGET: zermatt
I (34) boot: ######## VERSION: v2.5.2
I (35) boot: ######## BUILT: Thu Mar 7 19:54:03 UTC 2019
I (36) boot: ########
I (37) boot: Partition Table:
I (37) boot: ## Label Usage Type ST Offset Length
I (37) boot: 0 nvs WiFi data 01 02 00009000 00004000
I (38) boot: 1 otadata OTA data 01 00 0000d000 00002000
I (39) boot: 2 phy_init RF data 01 01 0000f000 00001000
I (40) boot: 3 factory factory app 00 00 00010000 002f0000
I (41) boot: 4 ota_0 OTA app 00 10 00300000 00400000
I (42) boot: 5 ota_1 OTA app 00 11 00700000 00400000
I (43) boot: 6 id unknown 40 00 00b00000 00010000
I (44) boot: 7 db unknown 40 00 00b10000 00400000
I (45) boot: 8 state unknown 40 00 00f10000 000e0000
I (46) boot: 9 coredump Unknown data 01 03 00ff0000 00010000
I (47) boot: End of partition table
I (47) boot: FACTORY_FIRST: disabled, using default esp behavior
I (48) boot: boot_index = 0
I (49) esp_image: segment 0: paddr=0x00300020 vaddr=0x3f400020 size=0x7fb4c (523084) map
I (233) esp_image: segment 1: paddr=0x0037fb74 vaddr=0x3ff80085 size=0x00008 ( 8) load
I (233) esp_image: segment 2: paddr=0x0037fb84 vaddr=0x3ffb0000 size=0x00494 ( 1172) load
I (234) esp_image: segment 3: paddr=0x00380020 vaddr=0x400d0020 size=0x135d40 (1269056) map
0x400d0020: _stext at ??:?
I (679) esp_image: segment 4: paddr=0x004b5d68 vaddr=0x3ffb0494 size=0x06e08 ( 28168) load
I (691) esp_image: segment 5: paddr=0x004bcb78 vaddr=0x40080000 size=0x1e57c (124284) load
0x40080000: _WindowOverflow4 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:2023
I (743) esp_image: segment 6: paddr=0x004db0fc vaddr=0x400c0000 size=0x00088 ( 136) load
I (761) boot: Loaded app from partition at offset 0x300000
I (762) boot: Disabling RNG early entropy source...
I (762) cpu_start: Multicore app
I (762) quad_psram: This chip is ESP32-D0WD
I (763) esp_psram: Found 8MB PSRAM device
I (763) esp_psram: PSRAM initialized, cache is in low/high (2-core) mode.
W (764) esp_psram: Virtual address not enough for PSRAM, map as much as we can. 4MB is mapped
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x4017c0be PS : 0x00060930 A0 : 0x800886b5 A1 : 0x3ffe3bd0
0x4017c0be: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22
A2 : 0x3ffc0690 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x3ffc0690
A6 : 0x3f800000 A7 : 0x3ff0005c A8 : 0x8014cff5 A9 : 0x3ffe3b90
A10 : 0x3ffe3bd4 A11 : 0x00000069 A12 : 0x00000010 A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffe SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x800886c5 LBEG : 0x4014d595 LEND : 0x4014d59e LCOUNT : 0x00000000
0x4014d595: psram_clear_spi_fifo at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:239
(inlined by) psram_read_id at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:471
0x4014d59e: psram_read_id at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:472
Backtrace: 0x4017c0bb:0x3ffe3bd0 0x400886b2:0x3ffe3c00 0x4007921b:0x3ffe3c50 |<-CORRUPTED
0x4017c0bb: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22
0x400886b2: call_start_cpu0 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/cpu_start.c:472
ELF file SHA256: b6a12518dd5ec44f
Saving Memfault Coredump!
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40083c95 PS : 0x00060434 A0 : 0x80147d29 A1 : 0x3ffe37d0
0x40083c95: esp_flash_app_disable_os_functions at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/spi_flash_os_func_noos.c:111
A2 : 0x00000000 A3 : 0x3ffb1a88 A4 : 0x3ffc0f00 A5 : 0x00000000
A6 : 0x00000001 A7 : 0x000249f0 A8 : 0x3ffc0690 A9 : 0x3ffb56c8
A10 : 0x3ffe3a98 A11 : 0x3ffc0690 A12 : 0x3ffc0690 A13 : 0x3ffe3b10
A14 : 0x00000001 A15 : 0x00000000 SAR : 0x0000000a EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000008 LBEG : 0x40089c4e LEND : 0x40089c60 LCOUNT : 0x00000000
0x40089c4e: esp_app_get_elf_sha256 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_app_format/esp_app_desc.c:90 (discriminator 3)
0x40089c60: esp_app_get_elf_sha256 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_app_format/esp_app_desc.c:93
Backtrace: 0x40083c92:0x3ffe37d0 0x40147d26:0x3ffe3800 0x4017bf0a:0x3ffe3830 0x40178d8c:0x3ffe3860 0x4017b7a0:0x3ffe3890 0x4017b872:0x3ffe3910 0x4017ba51:0x3ffe3940 0x40178712:0x3ffe3990 0x40176109:0x3ffe3a20 0x401751d2:0x3ffe3a90 0x40088969:0x3ffe3ae0 0x40081798:0x3ffe3b10 0x4017c0bb:0x3ffe3bd0 0x400886b2:0x3ffe3c00 0x4007921b:0x3ffe3c50 |<-CORRUPTED
0x40083c92: esp_flash_app_disable_os_functions at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/spi_flash_os_func_noos.c:111
0x40147d26: esp_flash_app_disable_protect at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/esp_flash_api.c:1378
0x4017bf0a: memfault_esp_spi_flash_coredump_begin at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/v5.x/memfault_esp_spi_flash.c:16
0x40178d8c: memfault_platform_coredump_save_begin at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/common/memfault_platform_coredump.c:360
0x4017b7a0: prv_write_coredump_sections at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_coredump.c:436
0x4017b872: memfault_coredump_save at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_coredump.c:537
0x4017ba51: memfault_fault_handler at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_fault_handling_xtensa.c:70
0x40178712: __wrap_esp_core_dump_to_flash at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/common/memfault_fault_handler.c:185
0x40176109: esp_panic_handler at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/panic.c:362
0x401751d2: panic_handler at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/panic_handler.c:190
0x40088969: xt_unhandled_exception at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/panic_handler.c:225
0x40081798: _xt_handle_exc at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:805
0x4017c0bb: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22
0x400886b2: call_start_cpu0 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/cpu_start.c:472
ELF file SHA256: b6a12518dd5ec44f
Re-entered core dump! Exception happened during core dump!
Rebooting...