ESP32 BUG cause software brick

User avatar
urbanze
Posts: 301
Joined: Sat Jun 10, 2017 9:55 pm
Location: Brazil

ESP32 BUG cause software brick

Postby urbanze » Tue May 21, 2019 1:50 am

Hi guys, something crazy happen with my esp32 product...

I was testing our product in my home (I brought it from work) and turned on the power, everything working normally.

I removed and inserted the power a few more times to hear the sweet noise of the beating relays (soon at the beginning of the code), but after a few times, without any explanation and supposition of mine, the code stopped working. Yes, the code and not the hardware. All voltages are working, including ESP32 is starting and printing information BEFORE APP_MAIN(), but the RTC_WDT will restart (in looping).

I can not share the code, but at the beginning, a rtc_wdt feed timer (testing only) is created, some leds configured, and so on. This has been working for over a year and minutes before this bug. Apparently, esp32 is not starting the app_main() OR code are erased/corrupted, where I would see the led at least change color, it looks like something has been corrupted, I do not know.

I do not want to rewrite the binary yet to just detect what it is, what caused it and how to fix it (if possible, without rewriting the code).

Here some pertinent information, I need to help me go deeper and detect what caused the app_main () not be run anymore:

Note (again): this code run more then 1 year and I never see app_main() not called (if this really are happening)

Terminal output is looping the same information (after bug)

Code: Select all

ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:4140
load:0x40078000,len:9948
ho 0 tail 12 room 4
load:0x40080400,len:6428
entry 0x400806c8
I (224) cpu_start: Pro cpu up.
I (224) cpu_start: Application information:
I (224) cpu_start: App version:      9c1388d-dirty
I (227) cpu_start: ELF file SHA256:  d91c7c6728cfe6af...
I (233) cpu_start: ESP-IDF:          v4.0-dev-76-g96aa08a0f-dirty
I (240) cpu_start: Starting app cpu, entry point is 0x400811d0
0x400811d0: start_cpu0_default at /home/ze/esp/esp-idf/components/esp32/cpu_start.c:351

I (0) cpu_start: App cpu up.
I (250) heap_init: Initializing. RAM available for dynamic allocation:
I (257) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (263) heap_init: At 3FFB8910 len 000276F0 (157 KiB): DRAM
I (269) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (276) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (282) heap_init: At 4008C578 len 00013A88 (78 KiB): IRAM
I (288) cpu_start: Pro cpu start user code
I (305) esp_core_dump_flash: Init core dump to flash
I (306) esp_core_dump_flash: Found partition 'coredump' @ 3a6000 204800 bytes
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:4140
load:0x40078000,len:9948
ho 0 tail 12 room 4
load:0x40080400,len:6428
entry 0x400806c8
I (224) cpu_start: Pro cpu up.
I (224) cpu_start: Application information:
I (224) cpu_start: App version:      9c1388d-dirty
I (227) cpu_start: ELF file SHA256:  d91c7c6728cfe6af...
I (233) cpu_start: ESP-IDF:          v4.0-dev-76-g96aa08a0f-dirty
I (240) cpu_start: Starting app cpu, entry point is 0x400811d0
0x400811d0: start_cpu0_default at /home/ze/esp/esp-idf/components/esp32/cpu_start.c:351

I (0) cpu_start: App cpu up.
I (250) heap_init: Initializing. RAM available for dynamic allocation:
I (257) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (263) heap_init: At 3FFB8910 len 000276F0 (157 KiB): DRAM
I (269) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (276) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (282) heap_init: At 4008C578 len 00013A88 (78 KiB): IRAM
I (288) cpu_start: Pro cpu start user code
I (305) esp_core_dump_flash: Init core dump to flash
I (306) esp_core_dump_flash: Found partition 'coredump' @ 3a6000 204800 bytes
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:4140
load:0x40078000,len:9948
ho 0 tail 12 room 4
load:0x40080400,len:6428
entry 0x400806c8
I (224) cpu_start: Pro cpu up.
I (224) cpu_start: Application information:
I (224) cpu_start: App version:      9c1388d-dirty
I (227) cpu_start: ELF file SHA256:  d91c7c6728cfe6af...
I (233) cpu_start: ESP-IDF:          v4.0-dev-76-g96aa08a0f-dirty
I (240) cpu_start: Starting app cpu, entry point is 0x400811d0
0x400811d0: start_cpu0_default at /home/ze/esp/esp-idf/components/esp32/cpu_start.c:351

I (0) cpu_start: App cpu up.
I (250) heap_init: Initializing. RAM available for dynamic allocation:
I (257) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (263) heap_init: At 3FFB8910 len 000276F0 (157 KiB): DRAM
I (269) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (276) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (282) heap_init: At 4008C578 len 00013A88 (78 KiB): IRAM
I (288) cpu_start: Pro cpu start user code
I (305) esp_core_dump_flash: Init core dump to flash
I (306) esp_core_dump_flash: Found partition 'coredump' @ 3a6000 204800 bytes
esptool read_flash_status

Code: Select all

python esptool.py read_flash_status
esptool.py v2.6
Found 3 serial ports
Serial port /dev/ttyUSB0
Connecting........_____....._____.
Detecting chip type... ESP32
Chip is ESP32D0WDQ6 (revision 1)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
MAC: b4:e6:2d:ce:8a:b9
Uploading stub...
Running stub...
Stub running...
Status value: 0x0200
Hard resetting via RTS pin...

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: ESP32 BUG cause software brick

Postby ESP_Angus » Tue May 21, 2019 4:58 am

It looks like maybe there is some problem with the core dump partition in flash? Is there any chance you could dump this partition with "esptool.py read_flash" and then send it to me?

BTW, you say you ran the code for more than 1 year but it looks like you've more recently rebuilt the project with a newer development IDF version. Is that right? The version is v4.0-dev-76-g96aa08a0f-dirty and this commit is from March 2019.

There has been some work on core dump in the V4.0 development timeline so It may be worth pulling the latest master branch, rebuilding, and flashing again (please dump the flash first before doing this, to help any later diagnosis.)
Angus

User avatar
urbanze
Posts: 301
Joined: Sat Jun 10, 2017 9:55 pm
Location: Brazil

Re: ESP32 BUG cause software brick

Postby urbanze » Tue May 21, 2019 4:02 pm

ESP_Angus wrote:
Tue May 21, 2019 4:58 am
It looks like maybe there is some problem with the core dump partition in flash? Is there any chance you could dump this partition with "esptool.py read_flash" and then send it to me?

BTW, you say you ran the code for more than 1 year but it looks like you've more recently rebuilt the project with a newer development IDF version. Is that right? The version is v4.0-dev-76-g96aa08a0f-dirty and this commit is from March 2019.

There has been some work on core dump in the V4.0 development timeline so It may be worth pulling the latest master branch, rebuilding, and flashing again (please dump the flash first before doing this, to help any later diagnosis.)
Angus
What I meant by more than a year is the algorithm that starts the system (leds, relays, touch, etc). The IDF I actually recently (in march) upgraded (master) but never got into any problems with the code OR core dump, both works nicelly all of tests.

So if your assumption is right, even if I reupload the code (withou erase), the error will continue, since it's a problem with coredump ?! I'll try it out in 2 days!

Reading coredump (size = 200K) in flash (I used address and range that terminal output in first post):

Code: Select all

python esptool.py read_flash 0x3a6000 0x32000 /home/ze/coredump.bin
File is too large, I uploaded in my Google Drive and send link in your private.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: ESP32 BUG cause software brick

Postby ESP_Angus » Wed May 22, 2019 12:46 am

Got it, thanks.

If you now do "esptool.py erase_region 0x3a6000 0x32000", without changing anything else, does the firmware finish booting correctly?

User avatar
urbanze
Posts: 301
Joined: Sat Jun 10, 2017 9:55 pm
Location: Brazil

Re: ESP32 BUG cause software brick

Postby urbanze » Wed May 22, 2019 1:43 am

ESP_Angus wrote:
Wed May 22, 2019 12:46 am
Got it, thanks.

If you now do "esptool.py erase_region 0x3a6000 0x32000", without changing anything else, does the firmware finish booting correctly?
No, I deleted that partition with its specific size and at the first boot, an error message appeared while loading the partition table, but as soon as the next reboot, everything continued as it was (with rtc restarting without reasons), then nothing happened, bug continued. I was able to jump into my job-work and reupload the code again (make flash), to test the assumption that it was just bug in coredump.

1. When re-upload the code (make flash), it remained bugged in exactly the same way.

2. Erase all memory (erase_flash) and re-uploaded (make flash), then it worked.

Suggestions for this problem? That commit you mentioned is specifically about this bug? I do not intend to update the IDF and maybe even prefer to disable coredump until we update the environment...

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: ESP32 BUG cause software brick

Postby ESP_Angus » Wed May 22, 2019 7:58 am

Damn. Maybe it's not related to core dump then, I was just going on the last log message before it froze.

I don't know of anything in particular with this commit, but this is the problem with "Development" versions (-dev) - they have had some basic automated testing but they're not comprehensively tested. The "beta" and "release" versions have been tested more thoroughly. Full details here.

In general, if you have a bug on an old development version then the best thing you can do is update (ideally to a beta or a release version) and see if the bug goes away.

If you update to latest master and the bug is still there, please send me your app .elf file and all project .bin files via PM and I'll take a look. However please update first, we don't support old -dev versions.

Angus

User avatar
urbanze
Posts: 301
Joined: Sat Jun 10, 2017 9:55 pm
Location: Brazil

Re: ESP32 BUG cause software brick

Postby urbanze » Wed May 22, 2019 4:04 pm

ESP_Angus wrote:
Wed May 22, 2019 7:58 am
Damn. Maybe it's not related to core dump then, I was just going on the last log message before it froze.

I don't know of anything in particular with this commit, but this is the problem with "Development" versions (-dev) - they have had some basic automated testing but they're not comprehensively tested. The "beta" and "release" versions have been tested more thoroughly. Full details here.

In general, if you have a bug on an old development version then the best thing you can do is update (ideally to a beta or a release version) and see if the bug goes away.

If you update to latest master and the bug is still there, please send me your app .elf file and all project .bin files via PM and I'll take a look. However please update first, we don't support old -dev versions.

Angus

Thanks for support Angus, I will check this more often and check if it happen again... (ps* this never occur before)
If it happen again, I will write here to continue tests and fix/avoid problem.

Who is online

Users browsing this forum: Bing [Bot] and 108 guests