ESP32S3 failure in esp_ota_end()

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

ESP32S3 failure in esp_ota_end()

Postby rtborg » Sat Dec 30, 2023 12:03 pm

I am implementing update over USB HID for my application, following the native OTA example. I can write the complete binary image using esp_ota_write, and when I call esp_ota_end, the application hangs with message:
I (184599) esp_image: segment 0: paddr=00210020 vaddr=3c060020 size=1f13ch (127292) map
and after a while a WDT reset is issued.

I found a topic dealing with similar issue here, and the solution the author proposed was modifying access to global variables. How would that help? If this is the case, how to find the offending variable? My project is quite large and I need some direction about what to look for.

Relevant log:

Code: Select all

�x������ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0xb (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x164c
load:0x403c9700,len:0xbe0
load:0x403cc700,len:0x2ef8
SHA-256 comparison failed:
Calculated: 3b1e244f79920e4deddd0d0122e19def7def0fbdf3709306078086a1c8bd59b0
Expected: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Attempting to boot anyway...
entry 0x403c9900
I (43) boot: ESP-IDF v5.0.1-dirty 2nd stage bootloader
I (43) boot: compile time 11:23:05
I (43) boot: chip revision: v0.2
I (45) boot.esp32s3: Boot SPI Speed : 80MHz
I (50) boot.esp32s3: SPI Mode       : DIO
I (55) boot.esp32s3: SPI Flash Size : 8MB
I (60) boot: Enabling RNG early entropy source...
I (65) boot: Partition Table:
I (69) boot: ## Label            Usage          Type ST Offset   Length
I (76) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (83) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (91) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (98) boot:  3 factory          factory app      00 00 00010000 00100000
I (106) boot:  4 ota_0            OTA app          00 10 00210000 00100000
I (113) boot:  5 ota_1            OTA app          00 11 00410000 00100000
I (121) boot:  6 ffat             Unknown data     01 81 00510000 00020000
I (129) boot: End of partition table
I (133) boot: Defaulting to factory image
I (138) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=1f13ch (127292) map
I (160) esp_image: segment 1: paddr=0002f164 vaddr=3fc97000 size=00eb4h (  3764) load
I (161) esp_image: segment 2: paddr=00030020 vaddr=42000020 size=54cbch (347324) map
I (205) esp_image: segment 3: paddr=00084ce4 vaddr=3fc97eb4 size=020d8h (  8408) load
I (206) esp_image: segment 4: paddr=00086dc4 vaddr=40374000 size=12f34h ( 77620) load
I (223) esp_image: segment 5: paddr=00099d00 vaddr=600fe000 size=0003ch (    60) load
I (230) boot: Loaded app from partition at offset 0x10000
I (231) boot: Disabling RNG early entropy source...
I (243) cpu_start: Pro cpu up.
I (243) cpu_start: Starting app cpu, entry point is 0x40375804
0x40375804: call_start_cpu1 at /home/null/esp/esp-idf/components/esp_system/port/cpu_start.c:142

I (0) cpu_start: App cpu up.
I (257) cpu_start: Pro cpu start user code
I (257) cpu_start: cpu freq: 160000000 Hz
I (257) cpu_start: Application information:
I (260) cpu_start: Project name:     myapp
I (266) cpu_start: App version:      1
I (271) cpu_start: Compile time:     Dec 30 2023 11:23:03
I (277) cpu_start: ELF file SHA256:  4aa97be7cff9ed89...
I (283) cpu_start: ESP-IDF:          v5.0.1-dirty
I (288) cpu_start: Min chip rev:     v0.0
I (293) cpu_start: Max chip rev:     v0.99 
I (298) cpu_start: Chip rev:         v0.2
I (302) heap_init: Initializing. RAM available for dynamic allocation:
I (310) heap_init: At 3FC9CF90 len 0004C780 (305 KiB): D/IRAM
I (316) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (323) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (329) heap_init: At 600FE03C len 00001FC4 (7 KiB): RTCRAM
I (336) spi_flash: detected chip: issi
I (340) spi_flash: flash io: dio
I (345) sleep: Configure to isolate all GPIO pins in sleep state
I (350) sleep: Enable automatic switching of GPIO sleep configuration
I (358) cpu_start: Starting scheduler on PRO CPU.

***********************************
I (709) ao_firmware_updater: Running partition type 0 subtype 0 (offset 0x00010000)
I (4809) ao_firmware_updater: Expected image size in bytes: 564544
I (6179) ao_firmware_updater: FW Update data, 1024 bytes:
I (6179) ao_firmware_updater: New firmware version: 1
I (6179) ao_firmware_updater: Running firmware version: 1
I (6189) ao_firmware_updater: Writing to partition subtype 16 at offset 0x210000
I (6189) ao_firmware_updater: esp_ota_begin succeeded, handle: 1
I (6269) ao_firmware_updater: Writting first 1024 bytes of 564544 bytes
I (6549) ao_firmware_updater: Progress: 0.36%
I (6829) ao_firmware_updater: Progress: 0.54%
***********************************

I (184389) ao_firmware_updater: Progress: 99.94%
I (184589) ao_firmware_updater: Progress: 100.00%
I (184589) ao_firmware_updater: Finished binary image write, validating...
I (184599) esp_image: segment 0: paddr=00210020 vaddr=3c060020 size=1f13ch (127292) map

/* REBOOTS... */

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x8 (TG1WDT_SYS_RST),boot:0xb (SPI_FAST_FLASH_BOOT)
Saved PC:0x4037e1ab
0x4037e1ab: panic_handler at /home/null/esp/esp-idf/components/esp_system/port/panic_handler.c:147 (discriminator 3)

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x164c
load:0x403c9700,len:0xbe0
load:0x403cc700,len:0x2ef8
SHA-256 comparison failed:
Calculated: 3b1e244f79920e4deddd0d0122e19def7def0fbdf3709306078086a1c8bd59b0
Expected: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Attempting to boot anyway...
entry 0x403c9900
I (47) boot: ESP-IDF v5.0.1-dirty 2nd stage bootloader
I (48) boot: compile time 11:23:05
I (48) boot: chip revision: v0.2
I (50) boot.esp32s3: Boot SPI Speed : 80MHz
I (55) boot.esp32s3: SPI Mode       : DIO
I (59) boot.esp32s3: SPI Flash Size : 8MB
W (64) boot.esp32s3: PRO CPU has been reset by WDT.
W (70) boot.esp32s3: APP CPU has been reset by WDT.
I (75) boot: Enabling RNG early entropy source...
I (81) boot: Partition Table:
I (84) boot: ## Label            Usage          Type ST Offset   Length
I (92) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (99) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (106) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (114) boot:  3 factory          factory app      00 00 00010000 00100000
I (122) boot:  4 ota_0            OTA app          00 10 00210000 00100000
I (129) boot:  5 ota_1            OTA app          00 11 00410000 00100000
I (137) boot:  6 ffat             Unknown data     01 81 00510000 00020000
I (144) boot: End of partition table
I (149) boot: Defaulting to factory image
I (153) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=1f13ch (127292) map
I (176) esp_image: segment 1: paddr=0002f164 vaddr=3fc97000 size=00eb4h (  3764) load
I (177) esp_image: segment 2: paddr=00030020 vaddr=42000020 size=54cbch (347324) map
I (221) esp_image: segment 3: paddr=00084ce4 vaddr=3fc97eb4 size=020d8h (  8408) load
I (222) esp_image: segment 4: paddr=00086dc4 vaddr=40374000 size=12f34h ( 77620) load
I (238) esp_image: segment 5: paddr=00099d00 vaddr=600fe000 size=0003ch (    60) load
I (246) boot: Loaded app from partition at offset 0x10000
I (246) boot: Disabling RNG early entropy source...
I (258) cpu_start: Pro cpu up.
I (258) cpu_start: Starting app cpu, entry point is 0x40375804
0x40375804: call_start_cpu1 at /home/null/esp/esp-idf/components/esp_system/port/cpu_start.c:142

I (0) cpu_start: App cpu up.
I (273) cpu_start: Pro cpu start user code
I (273) cpu_start: cpu freq: 160000000 Hz
I (273) cpu_start: Application information:
I (276) cpu_start: Project name:     myapp
I (282) cpu_start: App version:      1
I (286) cpu_start: Compile time:     Dec 30 2023 11:23:03
I (292) cpu_start: ELF file SHA256:  4aa97be7cff9ed89...
I (298) cpu_start: ESP-IDF:          v5.0.1-dirty
I (304) cpu_start: Min chip rev:     v0.0
I (308) cpu_start: Max chip rev:     v0.99 
I (313) cpu_start: Chip rev:         v0.2
I (318) heap_init: Initializing. RAM available for dynamic allocation:
I (326) heap_init: At 3FC9CF90 len 0004C780 (305 KiB): D/IRAM
I (332) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (338) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (344) heap_init: At 600FE03C len 00001FC4 (7 KiB): RTCRAM
I (352) spi_flash: detected chip: issi
I (355) spi_flash: flash io: dio
***********************************************************
EDIT: I get the same error using latest espidf master branch, with a watchdog reset right after the following message:
esp_image: segment 0: paddr=00210020 vaddr=3c060020 size=22eb8h (143032) map

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Sun Dec 31, 2023 8:26 am

Update: I attached a JTAG debugger, which confirms failure happens in esp_image_format.c/process_segment function.
In addition, EXCAUSE is set to 0x04.
debug_screenshot_all.png
debug_screenshot_all.png (254.11 KiB) Viewed 5323 times

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Tue Jan 02, 2024 11:30 am

Updating the topic again in the hope that anyone with a similar problem will shed some light.

The reset reason which is printed out after the chip reboots is
rst:0x8 (TG1WDT_SYS_RST),boot:0xb (SPI_FAST_FLASH_BOOT)
It is defined in rtc.h as "Timer Group1 Watch dog reset digital core". Looking in the docs, the Timer Group 1 refers to the interrupt watchdog. It's set to 1000 ms in my code, and FreeRTOS's tick rate is 100 ms. I have tries increasing the timeout to 10 seconds with no success.

According to the docs, potential reason for triggering the WDT is an interrupt or a critical section is running longer than the timeout period. In my code, I have a function which I pass to esp_register_freertos_tick_hook, and all it does is post to a queue using xQueueSendFromISR, so my assumption is that it would not block.

Tracing the exception a bit more, it appears it is triggered in function process_segment_data, and more specifically during the loop on line 639. I am able to set a breakpoint there and program execution never progresses after the loop's bounds, and function bootloader_sha256_data is never called.

Core is never dumped; tried both Flash & UART as destinations.

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Tue Jan 02, 2024 10:11 pm

Another lead: the code execution ends in panic_handlers.c:

Code: Select all

        // For cache error, pause the non-offending core - offending core handles panic
        if (panic_get_cause(frame) == PANIC_RSN_CACHEERR && core_id != esp_cache_err_get_cpuid()) {
            // Only print the backtrace for the offending core in case of the cache error
            g_exc_frames[core_id] = NULL;
            while (1) {
                ;	/* THIS IS THE END */
            }
        }
I can see that the EXCCAUSE register is set to 4, but the condition requires PANIC_RSN_CACHEERR (which is defined as 7 in panic_reason.h)

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Wed Jan 03, 2024 2:57 pm

After another day of fruitless bug hunting, I am considering the possibility of cache error. I've disabled the TWAI driver, and deregistered the FreeRTOS tick hook just before calling esp_ota_end - with no effect.

Edit: I also checked all running tasks' stack before calling esp_ota_end, did not notice anything odd. Free heap is also more than enough (~200K).

Any help from Espressif staff will be highly appreciated.

hmalpani
Posts: 11
Joined: Tue May 02, 2023 9:22 am

Re: ESP32S3 failure in esp_ota_end()

Postby hmalpani » Thu Jan 04, 2024 8:09 am

Hello

It will be really helpful if you could share some code(sample application) which can be used to recreate this issue. I have tried native_ota example and was unable to reproduce this. Could you also share the sdkconfig file with which you get this issue.

Also can you share the debug logs. You will need to enable debug logs in menuconfig.

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Thu Jan 04, 2024 9:38 am

Hello,

Unfortunately I cannot share my complete application.

I've enabled Debug log level, boot log attached.

What I also did was comment out the while loop in the panic handler:

Code: Select all

        // For cache error, pause the non-offending core - offending core handles panic
        if (panic_get_cause(frame) == PANIC_RSN_CACHEERR && core_id != esp_cache_err_get_cpuid()) {
            // Only print the backtrace for the offending core in case of the cache error
            g_exc_frames[core_id] = NULL;
            // while (1) {
            //     ;
            // }
        }
Flashing the above results in some interesting messages:

Code: Select all

I (224329) ao_firmware_updater: Progress: 99.96%
I (224499) ao_firmware_updater: Progress: 100.00%
I (224529) ao_firmware_updater: OTA END...
I (225529) esp_ota_ops: Offset: 1114112, size: 1048576
D (225529) esp_image: reading image header @ 0x110000
D (225529) esp_image: image header: 0xe9 0x06 0x02 0x03 4037589c
I (225529) esp_image: segment 0: paddr=00110020 vaddr=3c060020 size=21d6ch (138604) map
D (225539) esp_image: free data page_count 0x000001f7
GGuurruu  MMeeddiittaattiioonn  EErrrroorr::  CCoorree    //  ppaanniicc''eedd  ((CCaacchhee  ddiissaabblleedd  bbuutt  ccaacchheedd memory region accessed). 
Write back error occurred while dcache tries to write back to flash


Setting breakpoint at 0x00000000 and returning...
I (225549) esp_image: segment 1: paddr=00131d94 vaddr=3fc98000 size=03f7ch ( 16252) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 2: paddr=00135d18 vaddr=40374000 size=0a300h ( 41728) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 3: paddr=00140020 vaddr=42000020 size=59fbch (368572) map
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 4: paddr=00199fe4 vaddr=4037e300 size=09cb8h ( 40120) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 5: paddr=001a3ca4 vaddr=600fe000 size=0003ch (    60) 
D (225549) esp_image: free data page_count 0x000001f7
D (225549) esp_image: reading image header @ 0x110000
D (225549) esp_image: image header: 0xe9 0x06 0x02 0x03 4037589c
I (225549) esp_image: segment 0: paddr=00110020 vaddr=3c060020 size=21d6ch (138604) map
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 1: paddr=00131d94 vaddr=3fc98000 size=03f7ch ( 16252) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 2: paddr=00135d18 vaddr=40374000 size=0a300h ( 41728) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 3: paddr=00140020 vaddr=42000020 size=59fbch (368572) map
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 4: paddr=00199fe4 vaddr=4037e300 size=09cb8h ( 40120) 
D (225549) esp_image: free data page_count 0x000001f7
I (225549) esp_image: segment 5: paddr=001a3ca4 vaddr=600fe000 size=0003ch (    60) 
D (225549) esp_image: free data page_count 0x000001f7
Edit: I found a similar issue at https://github.com/espressif/esp-idf/issues/11421, however they fix described is not applicable to my code.

Edit: Here's a part of the log after the failure:

Code: Select all

W (99) boot.esp32s3: PRO CPU has been reset by WDT.
W (105) boot.esp32s3: APP CPU has been reset by WDT.
D (110) boot.esp32s3: WDT reset info: PRO CPU STATUS        0xdeadbeef
D (117) boot.esp32s3: WDT reset info: PRO CPU PID           0x00000000
D (124) boot.esp32s3: WDT reset info: PRO CPU PDEBUGINST    0x00003000
D (130) boot.esp32s3: WDT reset info: PRO CPU PDEBUGSTATUS  0x0000000c
D (137) boot.esp32s3: WDT reset info: PRO CPU PDEBUGDATA    0x00000011
D (143) boot.esp32s3: WDT reset info: PRO CPU PDEBUGPC      0x403c9c68
D (150) boot.esp32s3: WDT reset info: PRO CPU PDEBUGLS0STAT 0x00001025
D (157) boot.esp32s3: WDT reset info: PRO CPU PDEBUGLS0ADDR 0x600ce064
D (163) boot.esp32s3: WDT reset info: PRO CPU PDEBUGLS0DATA 0x00000000
D (170) boot.esp32s3: WDT reset info: APP CPU STATUS        0xdeadbeef
D (175) boot.esp32s3: WDT reset info: APP CPU PID           0x00000000
D (175) boot.esp32s3: WDT reset info: APP CPU PDEBUGINST    0x05003000
D (178) boot.esp32s3: WDT reset info: APP CPU PDEBUGSTATUS  0x00000004
D (185) boot.esp32s3: WDT reset info: APP CPU PDEBUGDATA    0x4200460b
0x4200460b: panic_handler at /home/null/esp/esp-idf/components/esp_system/port/panic_handler.c:147 (discriminator 3)

D (192) boot.esp32s3: WDT reset info: APP CPU PDEBUGPC      0x4200460b
0x4200460b: panic_handler at /home/null/esp/esp-idf/components/esp_system/port/panic_handler.c:147 (discriminator 3)

D (198) boot.esp32s3: WDT reset info: APP CPU PDEBUGLS0STAT 0x00a00020
D (205) boot.esp32s3: WDT reset info: APP CPU PDEBUGLS0ADDR 0x3fc9bfd4
D (211) boot.esp32s3: WDT reset info: APP CPU PDEBUGLS0DATA 0x00000000
Attachments
sdkconfig.txt
(54.94 KiB) Downloaded 251 times
boot_log_debug.txt
(20.69 KiB) Downloaded 259 times

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Thu Jan 04, 2024 11:28 am

It appears that a call to esp_partition_get_sha256 also results in the same crash.

I added the following code (taken from native OTA example):

Code: Select all

        uint8_t sha_256[HASH_LEN] = {0};
        esp_partition_t partition;

        // get sha256 digest for the partition table
        partition.address = ESP_PARTITION_TABLE_OFFSET;
        partition.size = ESP_PARTITION_TABLE_MAX_LEN;
        partition.type = ESP_PARTITION_TYPE_DATA;
        esp_partition_get_sha256(&partition, sha_256);
        print_sha256(sha_256, "SHA-256 for the partition table: ");

        // get sha256 digest for bootloader
        partition.address = ESP_BOOTLOADER_OFFSET;
        partition.size = ESP_PARTITION_TABLE_OFFSET;
        partition.type = ESP_PARTITION_TYPE_APP;
        esp_partition_get_sha256(&partition, sha_256);
        print_sha256(sha_256, "SHA-256 for bootloader: ");

        // get sha256 digest for running partition
        esp_partition_get_sha256(esp_ota_get_running_partition(), sha_256);
        print_sha256(sha_256, "SHA-256 for current firmware: ");
The result is that the app crashes in the call esp_partition_get_sha256(esp_ota_get_running_partition(), sha_256):

Code: Select all

I (728) ao_firmware_updater: SHA-256 for the partition table: : d03e97fee58b19a2892365ea66b01d525a1f2c77a0aa39872e76f244d7dc40b2
I (748) ao_firmware_updater: SHA-256 for bootloader: : 8846d696c5458467eef1d1a0173e95764d5091ebebdcb14abcb2afff0cfb6e3a
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x8 (TG1WDT_SYS_RST),boot:0xb (SPI_FAST_FLASH_BOOT)
Saved PC:0x420043f3
0x420043f3: panic_handler at /home/null/esp/esp-idf/components/esp_system/port/panic_handler.c:147 (discriminator 3)
However, when using esp_partition_get_sha256() on running partition in the beginning of main() (before initializing or starting any threads and so on), it runs fine. Hence my reasoning would be that esp_partition_get_sha256() fails for some reason, and that reason may be the same preventing esp_ota_end() from running correctly. That reason will be something that is done in firmware (because before starting threads, etc, esp_partition_get_sha256(esp_ota_get_running_partition(), sha_256) runs fine on current partition).

Tracing esp_partition_get_sha256, it leads to calling process_segment_data(), which is where esp_ota_end() also crashes.

rtborg
Posts: 67
Joined: Wed Oct 23, 2019 6:15 am

Re: ESP32S3 failure in esp_ota_end()

Postby rtborg » Thu Jan 04, 2024 12:41 pm

Finally I found the reason - I was doing memcpy to a string literal. This gave no warnings (and also somehow worked). When the code was fixed, esp_ota_end does not fail.

BadAtCStrings
Posts: 1
Joined: Tue May 28, 2024 5:48 pm

Re: ESP32S3 failure in esp_ota_end()

Postby BadAtCStrings » Tue May 28, 2024 5:56 pm

rtborg wrote:
Thu Jan 04, 2024 12:41 pm
Finally I found the reason - I was doing memcpy to a string literal. This gave no warnings (and also somehow worked). When the code was fixed, esp_ota_end does not fail.
This comment just saved me days. Wrote an unsafe wrapper around nvs_get_str and only produced sudden crash after the second OTA update for some reason

Who is online

Users browsing this forum: Baidu [Spider] and 330 guests