gettimeofday() causing reboots?
Posted: Thu Sep 03, 2020 8:56 pm
I moved to the master branch two months ago and everything was smooth as butter. A few days ago I updated to the latest version and now I'm having problems with what appears to be gettimeofday(). I won't go through everything I tried to pinpoint the source of the problem and I'm not even sure I found it, but I have had many clues pointing to that function. I should also point that it is happening only when the OTA process is ongoing, but then again, not much is happening the rest of the time.
I'm using the Exception Decoder for the Arduino IDE and every time I checked the backtrace, the gettimeofday function was there. I thought it was just part of the abort process so I wasn't paying much attention. But the backtrace pointed at the MQTT client component when it was calling platform_tick_get_ms() (line 1285 of mqtt_client.c) which in turn calls gettimeofday(), the httpd server and a few others also had a call to that function when crashing. Every time I was disabling something, it would eventually happen with something else. Then the decoder pointed at a ESP_LOGI line so that couldn't possibly be bugged. I had the new timestamp feature to print system time instead of milliseconds turned on, so I turned that feature off. After many more OTA updates than anyone would consider enough, it eventually crashed again. After an OTA update I set a timer for the next automatic update check and this time the decoder pointed at that function and the line in question was where I call time() which calls gettimeofday().
I removed every function I had with IRAM_ATTR, I changed all the malloc to heap_caps_malloc() with the MALLOC_CAP_DMA, I tried everything I could think of. I'm out of ideas. The 64 bits timer option is not enabled in sdkconfig.
I checked the esp-idf source tree and saw that newlib, and especially time.c, has been updated between the time I first installed the master branch and the update from a few days ago. Was there something changed that could be causing this?
This is the last backtrace I got, I can provide more if needed.
0x40082e59: _lock_acquire_recursive at D:/Work/Electronic/Espressif/esp-idf/components/newlib/locks.c line 170
0x40167531: _vfiprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c line 853
0x4015f9b1: fiprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c line 48
0x4015f925: __assert_func at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c line 58
0x400d80f9: esp_time_impl_get_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/soc/include/soc/spinlock.h line 145
0x400d7ae7: adjust_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 63
0x40082f61: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 104
0x40082ff3: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 177
0x400dc0ce: hub_ota_set_check_timer at ../main/hub_ota.c line 341
0x400dc5fe: hub_ota_task at ../main/hub_ota.c line 208
This is the lines before and after line 341 in hub_ota.c:
time_t now = 0;
struct tm timeinfo = { 0 };
time(&now); // <-- Line 341
localtime_r(&now, &timeinfo);
And this is the for the line 208:
esp_wifi_set_ps(WIFI_PS_MIN_MODEM);
hub_ota_set_check_timer(); // <-- Line 208. It's the end of the whole OTA process.
xSemaphoreGive(xSemOTA);
vTaskDelete(NULL);
I'm using the Exception Decoder for the Arduino IDE and every time I checked the backtrace, the gettimeofday function was there. I thought it was just part of the abort process so I wasn't paying much attention. But the backtrace pointed at the MQTT client component when it was calling platform_tick_get_ms() (line 1285 of mqtt_client.c) which in turn calls gettimeofday(), the httpd server and a few others also had a call to that function when crashing. Every time I was disabling something, it would eventually happen with something else. Then the decoder pointed at a ESP_LOGI line so that couldn't possibly be bugged. I had the new timestamp feature to print system time instead of milliseconds turned on, so I turned that feature off. After many more OTA updates than anyone would consider enough, it eventually crashed again. After an OTA update I set a timer for the next automatic update check and this time the decoder pointed at that function and the line in question was where I call time() which calls gettimeofday().
I removed every function I had with IRAM_ATTR, I changed all the malloc to heap_caps_malloc() with the MALLOC_CAP_DMA, I tried everything I could think of. I'm out of ideas. The 64 bits timer option is not enabled in sdkconfig.
I checked the esp-idf source tree and saw that newlib, and especially time.c, has been updated between the time I first installed the master branch and the update from a few days ago. Was there something changed that could be causing this?
This is the last backtrace I got, I can provide more if needed.
0x40082e59: _lock_acquire_recursive at D:/Work/Electronic/Espressif/esp-idf/components/newlib/locks.c line 170
0x40167531: _vfiprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c line 853
0x4015f9b1: fiprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c line 48
0x4015f925: __assert_func at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c line 58
0x400d80f9: esp_time_impl_get_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/soc/include/soc/spinlock.h line 145
0x400d7ae7: adjust_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 63
0x40082f61: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 104
0x40082ff3: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 177
0x400dc0ce: hub_ota_set_check_timer at ../main/hub_ota.c line 341
0x400dc5fe: hub_ota_task at ../main/hub_ota.c line 208
This is the lines before and after line 341 in hub_ota.c:
time_t now = 0;
struct tm timeinfo = { 0 };
time(&now); // <-- Line 341
localtime_r(&now, &timeinfo);
And this is the for the line 208:
esp_wifi_set_ps(WIFI_PS_MIN_MODEM);
hub_ota_set_check_timer(); // <-- Line 208. It's the end of the whole OTA process.
xSemaphoreGive(xSemOTA);
vTaskDelete(NULL);