device bricked in the field
Posted: Mon Jan 06, 2020 9:43 pm
## Environment
- Development Kit: [none, custom product using wrover module.]
- Module or chip used: [ESP32-WROVER 16M]
- IDF version: v3.1.3
- Build System: [Make]
- Compiler version: xtensa-esp32-elf-gcc.exe (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a5) 5.2.0
- Operating System: [Windows]
- (Windows only) environment type: [MSYS2 mingw32].
- Power Supply: [on-board 3.3V]
## Problem Description
A unit was bricked in the field. The user may have been performing an OTA update when it failed.
My symptoms seems similar to another post, but I'm not using flash encryption. viewtopic.php?f=12&t=10346&hilit=bootloader+corrupt
### Expected Behavior
Unit can't be bricked. Factory restore procedure should always fix an otherwise bricked unit.
### Actual Behavior
The reset button does nothing. The factory restore sequence (hold reset button while power-on) does nothing.
### Steps to reproduce
I'm unable to reproduce the error in my lab. I might consider implementing some stress test in the future.
The product is using the http-based OTA method available from the libesphttpd project. See here: https://github.com/chmorgan/esphttpd-freertos
## Debug Logs
I cut open the device enclosure to access the serial programming pins. The serial output is:
Another trace with a different error the first time. It's a checksum error: (first time only)
## My troubleshooting
I dumped the flash and inspected with a hex editor (FlexHEX).
I inspected the partition table, and each partition and compared it to a working unit. I didn't see any issues with any of the partitions, but the bootloader is corrupt, see below.
Focusing on the bootloader: Dump just the bootloader to a file and compare with original "bootloader.bin"
Diffing the bootloader: There are only 2 bytes different.
Interestingly, the EA vs. EE looks the same as the difference in the serial output.
The failed device outputs: (notice the 0xea)
A working device outputs: (notice 0xee)
So my next steps could be to:
- reflash the bootloader to correct the 2 bytes that are different.
- or clear the ota_data partition to force boot from factory app.
I'll try to reflash the bootloader. (BTW, I saved the corrupted one here: dump-bootloader.bin)
After reflashing the bootloader, the device boots properly (booting from OTA0).
Next, to be sure, flash the corrupted bootloader to a working poco and test:
After flashing the corrupt bootloader to a previously working device, it no longer boots with similar error.
So I'm confident that this corruption is the issue. Now to figure out how/why it got corrupted.
- Development Kit: [none, custom product using wrover module.]
- Module or chip used: [ESP32-WROVER 16M]
- IDF version: v3.1.3
- Build System: [Make]
- Compiler version: xtensa-esp32-elf-gcc.exe (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a5) 5.2.0
- Operating System: [Windows]
- (Windows only) environment type: [MSYS2 mingw32].
- Power Supply: [on-board 3.3V]
## Problem Description
A unit was bricked in the field. The user may have been performing an OTA update when it failed.
My symptoms seems similar to another post, but I'm not using flash encryption. viewtopic.php?f=12&t=10346&hilit=bootloader+corrupt
### Expected Behavior
Unit can't be bricked. Factory restore procedure should always fix an otherwise bricked unit.
### Actual Behavior
The reset button does nothing. The factory restore sequence (hold reset button while power-on) does nothing.
### Steps to reproduce
I'm unable to reproduce the error in my lab. I might consider implementing some stress test in the future.
The product is using the http-based OTA method available from the libesphttpd project. See here: https://github.com/chmorgan/esphttpd-freertos
## Debug Logs
I cut open the device enclosure to access the serial programming pins. The serial output is:
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xea
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:5068
load:0x40078000,len:9244
ho 0 tail 12 room 4
load:0x40080000,len:5772
0x40080000: _WindowOverflow4 at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-
idf/components/freertos/xtensa_vectors.S:1779
entry 0x40080068
0x40080068: _xt_alloca_exc at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-id
f/components/freertos/xtensa_vectors.S:1835
Fatal exception (0): IllegalInstruction
epc1=0x40080068, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
0x40080068: _xt_alloca_exc at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-id
f/components/freertos/xtensa_vectors.S:1835
��@�ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xea
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:5068
load:0x40078000,len:9244
ho 0 tail 12 room 4
load:0x40080000,len:5772
0x40080000: _WindowOverflow4 at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-
idf/components/freertos/xtensa_vectors.S:1779
entry 0x40080068
0x40080068: _xt_alloca_exc at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-id
f/components/freertos/xtensa_vectors.S:1835
Fatal exception (0): IllegalInstruction
epc1=0x40080068, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
0x40080068: _xt_alloca_exc at C:/msys32/home/labview/lumitec-dev-fw-plibox-esp32/Tools/esp-id
f/components/freertos/xtensa_vectors.S:1835
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x23 (DOWNLOAD_BOOT(UART0/UART1/SDIO_REI_REO_V2))
waiting for download
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xea
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:5068
load:0x40078000,len:9244
ho 0 tail 12 room 4
load:0x40080000,len:5772
csum err:0x8e!=0x8a
ets_main.c 371
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xea
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:5068
load:0x40078000,len:9244
ho 0 tail 12 room 4
load:0x40080000,len:5772
entry 0x40080268
Fatal exception (0): IllegalInstruction
epc1=0x4008026b, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x23 (DOWNLOAD_BOOT(UART0/UART1/SDIO_REI_REO_V2))
waiting for download
I dumped the flash and inspected with a hex editor (FlexHEX).
I inspected the partition table, and each partition and compared it to a working unit. I didn't see any issues with any of the partitions, but the bootloader is corrupt, see below.
Focusing on the bootloader: Dump just the bootloader to a file and compare with original "bootloader.bin"
Code: Select all
esptool.py -p COM21 -b 921600 read_flash 0x1000 0x4EE0 dump-bootloader.bin
Diffing the bootloader: There are only 2 bytes different.
Code: Select all
>fc dump-bootloader.bin bootloader.bin
Comparing files dump-bootloader.bin and BOOTLOADER.BIN
00000004: 68 E8
00000008: EA EE
The failed device outputs: (notice the 0xea)
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xea
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
- reflash the bootloader to correct the 2 bytes that are different.
- or clear the ota_data partition to force boot from factory app.
I'll try to reflash the bootloader. (BTW, I saved the corrupted one here: dump-bootloader.bin)
Code: Select all
esptool.py -p COM21 -b 921600 --chip esp32 --before default_reset --after no_reset write_flash -z --flash_mode dio --flash_freq 80m --flash_size 16MB 0x1000 bootloader.bin
Next, to be sure, flash the corrupted bootloader to a working poco and test:
Code: Select all
esptool.py -p COM21 -b 921600 --chip esp32 --before default_reset --after no_reset write_flash -z --flash_mode dio --flash_freq 80m --flash_size 16MB 0x1000 dump-bootloader.bin
After flashing the corrupt bootloader to a previously working device, it no longer boots with similar error.
So I'm confident that this corruption is the issue. Now to figure out how/why it got corrupted.