Weird hang crash lwip stack overflow behaviour after OTA download is complete

RMandR
Posts: 75
Joined: Mon Oct 29, 2018 3:13 pm

Weird hang crash lwip stack overflow behaviour after OTA download is complete

Postby RMandR » Sat Apr 02, 2022 5:30 pm

In IDF4.4 OTA nearly exactly the same as the advanced OTA example.

Noticed that sometimes the device hangs/crashes and even the WDT does not get involved reboot. But we see messages from wifi:

Code: Select all

I (143926) esp_image: Verifying image signature...
I (143926) secure_boot_v2: Take trusted digest key(s) from eFuse block(s)
I (143936) secure_boot_v2: #0 app key digest == #0 trusted key digest
I (143936) secure_boot_v2: Verifying with RSA-PSS...
I (144026) secure_boot_v2: Signature verified successfully!
I (144096) HTTP_CLIENT: ESP_HTTPS_OTA upgrade successful. Rebooting ...
I (147096) wifi:state: run -> init (0)
I (147096) wifi:pm stop, total sleep time: 98307291 us / 140397706 us
W (147096) wifi:<ba-del>idx
I (147096) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (147106) wifi:hmac tx: ifx0 stop, discard
W (147106) wifi:hmac tx: ifx0 stop, discard
<---- HANGS HERE ---->
Other times, it just works and the code reaches esp_restart();

Other times, an actual SO is produced. This happens more if we increase the taskdelay.

It looks like the tiT task is the lwip TCPIP task with a default stack size of 2048. Not sure if this needs an increase, but not sure if that's the root cause of the crash.

Code: Select all

I (128002) secure_boot_v2: Take trusted digest key(s) from eFuse block(s)
I (128002) secure_boot_v2: #0 app key digest == #0 trusted key digest
I (128002) secure_boot_v2: Verifying with RSA-PSS...
I (128092) secure_boot_v2: Signature verified successfully!
I (128092) HTTP_CLIENT: ESP_HTTPS_OTA upgrade successful. Rebooting ...
I (131092) wifi:state: run -> init (0)
I (131092) wifi:pm stop, total sleep time: 82194662 us / 124457581 us
W (131092) wifi:<ba-del>idx
I (131092) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (131102) wifi:hmac tx: ifx0 stop, discard
W (131102) wifi:hmac tx: ifx0 stop, discard

***ERROR*** A stack overflow in task tiT has been detected.

Backtrace:0x40081ce2:0x3ffd5b900x400919a9:0x3ffd5bb0 0x40094bc2:0x3ffd5bd0 0x40093521:0x3ffd5c50 0x40091aa8:0x3ffd5c80 0x40091a5a:0x00000000  |<-CORRUPTED

0x40081ce2: panic_abort at C:/Users/user/esp-idf-v4.4/components/esp_system/panic.c:402
0x400919a9: esp_system_abort at C:/Users/user/esp-idf-v4.4/components/esp_system/esp_system.c:121
0x40094bc2: vApplicationStackOverflowHook at C:/Users/user/esp-idf-v4.4/components/freertos/port/xtensa/port.c:394
0x40093521: vTaskSwitchContext at C:/Users/user/esp-idf-v4.4/components/freertos/tasks.c:3506
0x40091aa8: _frxt_dispatch at C:/Users/user/esp-idf-v4.4/components/freertos/port/xtensa/portasm.S:436
0x40091a5a: _frxt_int_exit at C:/Users/user/esp-idf-v4.4/components/freertos/port/xtensa/portasm.S:231
Here's the code at the end of the OTA download. The rest of the process is the same as OTA advanced example. There are also other tasks in the application. But they do not use net_if/TCPIP.

Code: Select all

ota_finish_err = esp_https_ota_finish(https_ota_handle);
			if ((err == ESP_OK) && (ota_finish_err == ESP_OK)) {
				ESP_LOGI(TAG, "ESP_HTTPS_OTA upgrade successful. Rebooting ...");
				vTaskDelay(3000 / portTICK_PERIOD_MS);
				esp_restart();
			}
This was not happening prior to upgrade to 4.4 from 4.0.

Am I doing something wrong?
Last edited by RMandR on Tue Apr 05, 2022 12:51 pm, edited 1 time in total.

RMandR
Posts: 75
Joined: Mon Oct 29, 2018 3:13 pm

Re: Weird hang crash lwip stack overflow behaviour after OTA download is complete

Postby RMandR » Sun Apr 03, 2022 3:32 pm

Update:

The hang and SO occurs when esp_restart() is executing.

I tried to follow the wifi API guide saying:
Wi-Fi Deinit Phase

s8.1: Call :cpp:func:`esp_wifi_disconnect()` to disconnect the Wi-Fi connectivity.
s8.2: Call :cpp:func:`esp_wifi_stop()` to stop the Wi-Fi driver.
s8.3: Call :cpp:func:`esp_wifi_deinit()` to unload the Wi-Fi driver.
It didn't immediately work. But it looks like if there is some delay between calls caused by the uart prints, everything goes smoothly from there.

Code: Select all

ESP_LOGI("FOTA", "wifi disc");
fflush(stdout);
esp_wifi_disconnect();
ESP_LOGI("FOTA", "wifi stop");
fflush(stdout);
esp_wifi_stop();
ESP_LOGI("FOTA", "wifi deinit");
fflush(stdout);
esp_wifi_deinit();
ESP_LOGI("DONE", "Rebooting");
fflush(stdout);
esp_restart();
results in consistent:

Code: Select all

I(113475) FOTA: wifi disc
I(113475) wifi : state : run->init(0)
I(113475) wifi : pm stop, total sleep time : 67012390 us / 106790000 us

W(113475) wifi : <ba - del>idx
I(113475) wifi : new : <11, 0>, old : <11, 0>, ap : <255, 255>, sta : <11, 0>, prof : 1
I(113485) FOTA : wifi stop
I(113485) wifi : flush txq
I(113485) wifi : stop sw txq
I(113495) wifi : lmac stop hw txq
I(113495) FOTA : wifi deinit
I(113495) wifi : Deinit lldesc rx mblock : 10
I(113515) FOTA : Rebooting
ets Jul 29 2019 12 : 21 : 46

Who is online

Users browsing this forum: Basalt and 310 guests