Logless system freeze.
Posted: Tue Mar 26, 2024 11:27 am
We are experiencing a strange issue where the system freezes with no log output, no brownout detection no hardfaults or reset, it just seems to stop doing anything until the system is manually reset, after which it works as normal.
We are making an air quality monitor with an ESP32-C6 using ESP-IDF v5.2. The system contains environmental sensors + lcd display. Under normal operation the device reads from the sensors, sends data over MQTT and updates the display with time + sensor data. We have had devices run for weeks at a time with no issues. We are only seeing this issue when running the device within a environmental chamber as we are running tests for our temperature + humidity sensors. The temperature and humidity within the chamber doesn't reach values outside expected operating condition (15-30C and 30-60% humidity), we believe the freeze we are seeing may be due to lower WiFi quality. We are also testing with 3 devices simultaneously, all devices fail within the chamber but at different times, from 2-18 hours of operation.
I have tested the task and interrupt level watchdog by adding endless while loops, both trigger errors as expected, and cause a system reset.
I also created a timer using esp_timer which toggled the display backlight every 1s. This also stops during the system freeze, so we ruled out it being an RTOS task issue.
I have also tried increasing the system log level to verbose. There is still no sign of anything unusual happening before the system freezes. We monitor the remaining heap every 15s and there are no signs of any memory leaks.
The device is also capable of recovering and reconnecting when WiFi signal is lost.
Does anyone have any thoughts on what could be causing this system freeze, or have any suggestions on how to help debug the problem?
We are making an air quality monitor with an ESP32-C6 using ESP-IDF v5.2. The system contains environmental sensors + lcd display. Under normal operation the device reads from the sensors, sends data over MQTT and updates the display with time + sensor data. We have had devices run for weeks at a time with no issues. We are only seeing this issue when running the device within a environmental chamber as we are running tests for our temperature + humidity sensors. The temperature and humidity within the chamber doesn't reach values outside expected operating condition (15-30C and 30-60% humidity), we believe the freeze we are seeing may be due to lower WiFi quality. We are also testing with 3 devices simultaneously, all devices fail within the chamber but at different times, from 2-18 hours of operation.
I have tested the task and interrupt level watchdog by adding endless while loops, both trigger errors as expected, and cause a system reset.
I also created a timer using esp_timer which toggled the display backlight every 1s. This also stops during the system freeze, so we ruled out it being an RTOS task issue.
I have also tried increasing the system log level to verbose. There is still no sign of anything unusual happening before the system freezes. We monitor the remaining heap every 15s and there are no signs of any memory leaks.
The device is also capable of recovering and reconnecting when WiFi signal is lost.
Does anyone have any thoughts on what could be causing this system freeze, or have any suggestions on how to help debug the problem?