Boot Failure from v3 bootloader into v5 app

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Thu Nov 30, 2023 3:49 pm

We have many ESP32-WROVER-B (100k online, many more manufactured) with various revisions of IDF v3 bootloaders. We've been happily shipping firmware updates for ~5 years, all with IDF v3.x apps. We are now testing an IDF v5.x-based app, and sadly running into some bootloops which look like bootloader incompatibility. We could use some pointers on how to get v5 OTAs stable.

----

The bootloop does not occur for every v5 app image we build. We found that the bootloop is sensitive to seemingly minor changes like upgrading our Ninja version or adding a header file, but it then reoccurs in a later build. When we have a failing app image, it always fails on units with a certain version of bootloader, and the issue follows the a full firmware image rather than following the hardware.

I attach GDB screengrabs from a working and non-working image, where we are trying to determine root cause. The working image was built using a newer Ninja version. We were surprised that changing the Ninja version would change the Xtensa assembly for this `bootloader_init_mem` function! You can see that it crashes at an `lsx` instruction at `bootloader_init_mem+6`, but in the working version there's a `memw` instruction in the same place. Also, looking at where those `call` operations are pointing, the non-working image doesn't make much sense.

IDF version of problematic bootloaders: v3.1.2-1-g92456f014e
Current IDF app commit: v5.1.1-910-g93b9ee2315

We are aware of several menuconfig options related to bootloader-app compatibility (APP_COMPATIBLE_PRE_V3_1_BOOTLOADERS and APP_COMPATIBLE_PRE_V2_1_BOOTLOADERS) and we have those options *disabled* because they seem to apply to older bootloaders than our oldest.

Serial log from one such bootloop follows:

Code: Select all

I (31) boot: ESP-IDF v3.1-beta1-612-g92456f0 2nd stage bootloader
I (31) boot: compile time 19:53:59
I (31) boot: Enabling RNG early entropy source...
I (31) boot: SPI Speed      : 40MHz
I (32) boot: SPI Mode       : DIO
I (32) boot: SPI Flash Size : 16MB
I (33) boot: ########
I (33) boot: ######## BOND-CORE ESP32 Bootloader
I (34) boot: ######## TARGET:  zermatt
I (34) boot: ######## VERSION: v2.5.2
I (35) boot: ######## BUILT:   Thu Mar  7 19:54:03 UTC 2019
I (36) boot: ########
I (37) boot: Partition Table:
I (37) boot: ## Label            Usage          Type ST Offset   Length
I (37) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (38) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (39) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (40) boot:  3 factory          factory app      00 00 00010000 002f0000
I (41) boot:  4 ota_0            OTA app          00 10 00300000 00400000
I (42) boot:  5 ota_1            OTA app          00 11 00700000 00400000
I (43) boot:  6 id               unknown          40 00 00b00000 00010000
I (44) boot:  7 db               unknown          40 00 00b10000 00400000
I (45) boot:  8 state            unknown          40 00 00f10000 000e0000
I (46) boot:  9 coredump         Unknown data     01 03 00ff0000 00010000
I (47) boot: End of partition table
I (47) boot: FACTORY_FIRST: disabled, using default esp behavior
I (48) boot: boot_index = 0
I (49) esp_image: segment 0: paddr=0x00300020 vaddr=0x3f400020 size=0x7fb4c (523084) map
I (233) esp_image: segment 1: paddr=0x0037fb74 vaddr=0x3ff80085 size=0x00008 (     8) load
I (233) esp_image: segment 2: paddr=0x0037fb84 vaddr=0x3ffb0000 size=0x00494 (  1172) load
I (234) esp_image: segment 3: paddr=0x00380020 vaddr=0x400d0020 size=0x135d40 (1269056) map
0x400d0020: _stext at ??:?

I (679) esp_image: segment 4: paddr=0x004b5d68 vaddr=0x3ffb0494 size=0x06e08 ( 28168) load
I (691) esp_image: segment 5: paddr=0x004bcb78 vaddr=0x40080000 size=0x1e57c (124284) load
0x40080000: _WindowOverflow4 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:2023

I (743) esp_image: segment 6: paddr=0x004db0fc vaddr=0x400c0000 size=0x00088 (   136) load
I (761) boot: Loaded app from partition at offset 0x300000
I (762) boot: Disabling RNG early entropy source...
I (762) cpu_start: Multicore app
I (762) quad_psram: This chip is ESP32-D0WD
I (763) esp_psram: Found 8MB PSRAM device
I (763) esp_psram: PSRAM initialized, cache is in low/high (2-core) mode.
W (764) esp_psram: Virtual address not enough for PSRAM, map as much as we can. 4MB is mapped
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4017c0be  PS      : 0x00060930  A0      : 0x800886b5  A1      : 0x3ffe3bd0  
0x4017c0be: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22

A2      : 0x3ffc0690  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3ffc0690  
A6      : 0x3f800000  A7      : 0x3ff0005c  A8      : 0x8014cff5  A9      : 0x3ffe3b90  
A10     : 0x3ffe3bd4  A11     : 0x00000069  A12     : 0x00000010  A13     : 0xffffffff  
A14     : 0x00000000  A15     : 0xfffffffe  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x800886c5  LBEG    : 0x4014d595  LEND    : 0x4014d59e  LCOUNT  : 0x00000000  
0x4014d595: psram_clear_spi_fifo at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:239
 (inlined by) psram_read_id at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:471

0x4014d59e: psram_read_id at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_psram/esp32/esp_psram_impl_quad.c:472



Backtrace: 0x4017c0bb:0x3ffe3bd0 0x400886b2:0x3ffe3c00 0x4007921b:0x3ffe3c50 |<-CORRUPTED
0x4017c0bb: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22

0x400886b2: call_start_cpu0 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/cpu_start.c:472





ELF file SHA256: b6a12518dd5ec44f

Saving Memfault Coredump!
Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x40083c95  PS      : 0x00060434  A0      : 0x80147d29  A1      : 0x3ffe37d0  
0x40083c95: esp_flash_app_disable_os_functions at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/spi_flash_os_func_noos.c:111

A2      : 0x00000000  A3      : 0x3ffb1a88  A4      : 0x3ffc0f00  A5      : 0x00000000  
A6      : 0x00000001  A7      : 0x000249f0  A8      : 0x3ffc0690  A9      : 0x3ffb56c8  
A10     : 0x3ffe3a98  A11     : 0x3ffc0690  A12     : 0x3ffc0690  A13     : 0x3ffe3b10  
A14     : 0x00000001  A15     : 0x00000000  SAR     : 0x0000000a  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x00000008  LBEG    : 0x40089c4e  LEND    : 0x40089c60  LCOUNT  : 0x00000000  
0x40089c4e: esp_app_get_elf_sha256 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_app_format/esp_app_desc.c:90 (discriminator 3)

0x40089c60: esp_app_get_elf_sha256 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_app_format/esp_app_desc.c:93



Backtrace: 0x40083c92:0x3ffe37d0 0x40147d26:0x3ffe3800 0x4017bf0a:0x3ffe3830 0x40178d8c:0x3ffe3860 0x4017b7a0:0x3ffe3890 0x4017b872:0x3ffe3910 0x4017ba51:0x3ffe3940 0x40178712:0x3ffe3990 0x40176109:0x3ffe3a20 0x401751d2:0x3ffe3a90 0x40088969:0x3ffe3ae0 0x40081798:0x3ffe3b10 0x4017c0bb:0x3ffe3bd0 0x400886b2:0x3ffe3c00 0x4007921b:0x3ffe3c50 |<-CORRUPTED
0x40083c92: esp_flash_app_disable_os_functions at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/spi_flash_os_func_noos.c:111

0x40147d26: esp_flash_app_disable_protect at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/spi_flash/esp_flash_api.c:1378

0x4017bf0a: memfault_esp_spi_flash_coredump_begin at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/v5.x/memfault_esp_spi_flash.c:16

0x40178d8c: memfault_platform_coredump_save_begin at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/common/memfault_platform_coredump.c:360

0x4017b7a0: prv_write_coredump_sections at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_coredump.c:436

0x4017b872: memfault_coredump_save at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_coredump.c:537

0x4017ba51: memfault_fault_handler at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/components/panics/src/memfault_fault_handling_xtensa.c:70

0x40178712: __wrap_esp_core_dump_to_flash at /Users/rafaeldantas/study/bond/bond-core/sys/BMemfault/memfault-firmware-sdk/ports/esp_idf/memfault/common/memfault_fault_handler.c:185

0x40176109: esp_panic_handler at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/panic.c:362

0x401751d2: panic_handler at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/panic_handler.c:190

0x40088969: xt_unhandled_exception at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/panic_handler.c:225

0x40081798: _xt_handle_exc at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:805

0x4017c0bb: bootloader_init_mem at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/bootloader_support/src/bootloader_mem.c:22

0x400886b2: call_start_cpu0 at /Users/rafaeldantas/study/bond/bond-core/bsp/esp-idf/components/esp_system/port/cpu_start.c:472





ELF file SHA256: b6a12518dd5ec44f

Re-entered core dump! Exception happened during core dump!
Rebooting...
Attachments
crashing.png
crashing.png (139.31 KiB) Viewed 39778 times
working.png
working.png (179.7 KiB) Viewed 39778 times

ESP_Yake
Posts: 109
Joined: Mon Mar 06, 2017 12:23 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby ESP_Yake » Wed Dec 06, 2023 12:31 pm

Hi, I used v3.1 bootloader to load a 5.1.1 firmware and did not test the problem you mentioned.

From the error message you printed, the problem is that there is a problem with psram initialization, which is already running normally to the APP partition.

Have you tested using v5.1.1 bootloader to load the firmware directly? I wonder if there is a problem at this time.
If there is no problem with this, could you please send me the sdkconfig file and bootloader firmware of IDF v3.1?

In addition, the warning information before crash is also very strange. I am curious whether this log also exists when you use the bootloader of v5.1.1 to load the firmware. If possible, I hope you can package the elf file for v5.1.1 along with the bin firmware and sdkconfig file and send it to me for a look

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Wed Dec 06, 2023 9:47 pm

Thank you for reply, Yake.

> Have you tested using v5.1.1 bootloader to load the firmware directly? I wonder if there is a problem at this time.

What I have tried is using a more recent v3.x bootloader binary. Swapping just the bootloader binaries causes the problem to follow the bootloader binary. Using just the v5.1.1 bootloader fails maybe due to old v3.1 partition table format? What I can say is that flashing the bootloader, partition table, and the new firmware (via make flash monitor) seems to always work without any issue.

> If there is no problem with this, could you please send me the sdkconfig file and bootloader firmware of IDF v3.1?

Sure. At the following URL, you will find:
bootloader_v3.1.bin
bootloader_v3.3.4.bin
sdkconfig.v3.1
sdkconfig.v3.3.4
sdkconfig.v5.1
boot_v3.1_app_v5.1.log -- shows the crash with the oldest bootloader, new app
boot_v3.3.4_app_v5.1.log -- shows success, with newer bootloader, with same app image
boot_v5.1_app_v5.1.log -- shows success, with v5.1 bootloader (newest), with again same app image
https://bond-updates.s3.amazonaws.com/b ... 231206.zip


> In addition, the warning information before crash is also very strange. I am curious whether this log also exists when you use the bootloader of v5.1.1 to load the firmware.

Yes, the "Virtual address not enough for PSRAM" warning is still printed even when doing "make flash monitor" with the v5.1-IDF (bootloader, partition table, and app all v5.1-based). I thought it was was expected due to us using 8MB PSRAM inside ESP32-WROVER-E but ESP32 only supporting memmap of 4MB.

> If possible, I hope you can package the elf file for v5.1.1 along with the bin firmware and sdkconfig file and send it to me for a look

I am sending you a link to the v5.1 bins and elfs in a DM. (The ELF file we consider confidential due to containing our debug symbols.)

We really appreciate the support here. We're heavily invested in ESP and absolutely love it. (Also really encouraging to start seeing Rust support!). Just need some help to make this v3 --> v5 jump finally :)

ESP_Yake
Posts: 109
Joined: Mon Mar 06, 2017 12:23 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby ESP_Yake » Thu Dec 07, 2023 12:56 pm

Hi, I downloaded the firmware you provided, but there are still a few problems.
1. There is no problem in upgrading the v5.1 firmware with v3.3.4 bootloader, which indicates that the boot changes from v3.1 to v3.3.4 caused the problem. Have you tried to upgrade the v5.1 firmware with the v3.2 boot?
2. The bootloader_v3.1.bin size is 32.8K and your partition address is 0x8000, which will cause the partition to be overwritten by the bootloader??

I said the above two are I want to confirm your problem IDF bootloader from v3.1 to v3.3.4 which branch caused the problem, because these versions are too old, I can not quickly find the cause of this problem.

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Fri Dec 08, 2023 8:53 pm

> Have you tried to upgrade the v5.1 firmware with the v3.2 boot?

We can successfully boot the v5.1 app from v3.3.4 bootloader. We can use esptool to write just the boot partition and observe the issue occurring and not occurring based on using v3.1 or v3.3.4 bootloader.

We don't have a v3.2 bootloader in use.

> 32.8K

The file size of bootloader_v3.1.bin is 32768 B, which in hex is 0x8000.

The bootloader_v3.3.4.bin I gave you I copied too many bytes. I should have truncated at 0x8000. I accidentally copied also the partition table in this file.
00005da0 bc ce c2 32 0c 9e 44 86 bb 68 fb 30 d5 54 a5 86 |...2..D..h.0.T..|
00005db0 f5 fc 01 c5 c8 63 89 5f e6 9d 82 5f 5c fe b3 b1 |.....c._..._\...|
00005dc0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00008000 aa 50 01 02 00 90 00 00 00 40 00 00 6e 76 73 00 |.P.......@..nvs.|
00008010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00008020 aa 50 01 00 00 d0 00 00 00 20 00 00 6f 74 61 64 |.P....... ..otad|
> I want to confirm your problem IDF bootloader from v3.1 to v3.3.4 which branch caused the problem

We are one commit ahead of v3.1.2. We just modified the esp_ip in `tcpip_adapter_init` due to a conflict with 192.168.4.1. --- So I think we can consider it to just be v3.1.2.

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Fri Dec 15, 2023 8:29 pm

@ESP_Yake any tips here?

Referring to the original screen captures of gdb, you can see that the `memw` instruction is sometimes present, sometimes absent.

I guess we need to check the exact compile & link flags for all files, to make sure the PSRAM workaround with memw instructions is always flagged.

ESP_Yake
Posts: 109
Joined: Mon Mar 06, 2017 12:23 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby ESP_Yake » Mon Dec 18, 2023 12:58 pm

I am very sorry for replying so late, because the BBS did not prompt me with the message, I forgot this question.
The firmware you gave me still cannot run normally. I tried to compile the hello_world demo using the config file you gave, but I changed the baud rate to 115200. The bootloader and partition used version v3.1.2(sdkconfig using sdkconfig_v3.1), while the app used version v5.1.2(sdkconfig_v5.1). I also uploaded the log and firmware from my tests,Can you test it again using our hello_world demo ,and tell me how to reproduce it?
Attachments
test_bin.zip
v3.1.2 boot v5.1.2 app
(127.92 KiB) Downloaded 1088 times

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Tue Apr 02, 2024 7:29 pm

Are there any consultants out there who can help with this? I really think it has to do with "memw psram workaround" flags. DM me...

chrismerck
Posts: 76
Joined: Tue May 29, 2018 8:00 pm

Re: Boot Failure from v3 bootloader into v5 app

Postby chrismerck » Tue Jul 23, 2024 8:44 pm

To bring some closure here for any other parties that have been supporting ESP32 devices in the field since early v3 IDF, ultimately we were unable to discover the root cause of the bootloader-app incompatibility here, despite some efforts from ESP staff to help us out.

For some context, we use a 3-image update scheme with "factory", and "ota0/1" images. What we did to unblock updates was:

Release an intermediate firmware version that is still compatible with the old IDF v3.1 bootloader. This image contains within it two other images: (a) a newer IDF v3.3 bootloader and (b) a newer IDF v3.3 factory image. The fw then reflashes the bootloader and factory images. Next time it boots, if it detects the newer bootloader and factory image (by reading specific offsets containing build time), then it flags to our OTAU system that it is ready to accept an update to the latest (IDF v5) firmware.

We had to patch the factory image because, to our horror, although the newer IDF v3.3 bootloader was now forwards-compatible with the new factory fw, it was now backwards-incompatible with the old IDF v3.1 factory image. Thankfully we used the 16 MB flash version of ESP32-WROVER so we had sufficient space in `ota0/1` to fit the entire rescue image inside as additional data.

This in-field patching of bootloader and factory firmware is risky and really shouldn't have been necessary... I would like to think that just a few assembler instructions in the very beginning of the new app image could have smoothed out any incompatibility with the old bootloader, just we lack the knowledge of XTensa/ESP internals to do so on our end. --- That said, we have had zero issues deploying this to hundreds of units so far, which makes me more bullish on this kind of "bootpatch" in the future.

Who is online

Users browsing this forum: Bing [Bot] and 117 guests