Logless system freeze.

liamm36
Posts: 4
Joined: Mon Jan 24, 2022 12:48 pm

Logless system freeze.

Postby liamm36 » Tue Mar 26, 2024 11:27 am

We are experiencing a strange issue where the system freezes with no log output, no brownout detection no hardfaults or reset, it just seems to stop doing anything until the system is manually reset, after which it works as normal.

We are making an air quality monitor with an ESP32-C6 using ESP-IDF v5.2. The system contains environmental sensors + lcd display. Under normal operation the device reads from the sensors, sends data over MQTT and updates the display with time + sensor data. We have had devices run for weeks at a time with no issues. We are only seeing this issue when running the device within a environmental chamber as we are running tests for our temperature + humidity sensors. The temperature and humidity within the chamber doesn't reach values outside expected operating condition (15-30C and 30-60% humidity), we believe the freeze we are seeing may be due to lower WiFi quality. We are also testing with 3 devices simultaneously, all devices fail within the chamber but at different times, from 2-18 hours of operation.

I have tested the task and interrupt level watchdog by adding endless while loops, both trigger errors as expected, and cause a system reset.

I also created a timer using esp_timer which toggled the display backlight every 1s. This also stops during the system freeze, so we ruled out it being an RTOS task issue.

I have also tried increasing the system log level to verbose. There is still no sign of anything unusual happening before the system freezes. We monitor the remaining heap every 15s and there are no signs of any memory leaks.

The device is also capable of recovering and reconnecting when WiFi signal is lost.

Does anyone have any thoughts on what could be causing this system freeze, or have any suggestions on how to help debug the problem?

Tomatendose
Posts: 2
Joined: Tue Apr 16, 2024 7:19 am

Re: Logless system freeze.

Postby Tomatendose » Tue Apr 16, 2024 7:39 am

Hello,

I have about the same problem. My systems are two ESP32-C6-Mini using arduino-esp32 3.0.0 alpha 2. I have two ESP32 running here with the same code. These both have different distances to an AP. The one with the furthest distance to this AP freezes after 1-2 hours. This has an RSSI of around -74. The ESP32 that runs for longer is around 12+ hours at around -40 and has a significantly longer runtime. I have already measured the voltages to rule out a voltage drop as the reason for the freezing. The hardware timer watchdog from the example code from Github is also running. This does not trigger after freezing.

Occasionally when it doesn't freeze, the error message appears:
reboot()
abort() was called at PC 0x408034ff on core 0
After that, the ESP32 does not restart and also remains frozen.

However, I cannot rule out whether this error is triggered by other events.

If you need more information, please let me know.

Tomatendose
Posts: 2
Joined: Tue Apr 16, 2024 7:19 am

Re: Logless system freeze.

Postby Tomatendose » Wed Apr 17, 2024 5:57 am

I tested 3.0.0 rc1, which was released a few days ago. Here the set watchdog seems to work if the connection establishment takes too long. I still have to run a permanent test.

mtraven
Posts: 28
Joined: Thu Jul 07, 2022 3:34 am

Re: Logless system freeze.

Postby mtraven » Wed Apr 17, 2024 8:47 pm

liamm36 wrote:
Tue Mar 26, 2024 11:27 am
We are experiencing a strange issue where the system freezes with no log output, no brownout detection no hardfaults or reset, it just seems to stop doing anything until the system is manually reset, after which it works as normal.

We are making an air quality monitor with an ESP32-C6 using ESP-IDF v5.2. The system contains environmental sensors + lcd display. Under normal operation the device reads from the sensors, sends data over MQTT and updates the display with time + sensor data. We have had devices run for weeks at a time with no issues. We are only seeing this issue when running the device within a environmental chamber as we are running tests for our temperature + humidity sensors. The temperature and humidity within the chamber doesn't reach values outside expected operating condition (15-30C and 30-60% humidity), we believe the freeze we are seeing may be due to lower WiFi quality. We are also testing with 3 devices simultaneously, all devices fail within the chamber but at different times, from 2-18 hours of operation.

I have tested the task and interrupt level watchdog by adding endless while loops, both trigger errors as expected, and cause a system reset.

I also created a timer using esp_timer which toggled the display backlight every 1s. This also stops during the system freeze, so we ruled out it being an RTOS task issue.

I have also tried increasing the system log level to verbose. There is still no sign of anything unusual happening before the system freezes. We monitor the remaining heap every 15s and there are no signs of any memory leaks.

The device is also capable of recovering and reconnecting when WiFi signal is lost.

Does anyone have any thoughts on what could be causing this system freeze, or have any suggestions on how to help debug the problem?
I remember reading somewhere (freeRTOS docs, i think) that the esp_log system is used by the OS itself for "something" important...sorry, I dont recall exactly what that is. My point is, have you considered that the very act of turning the logging completely off (i assume through menu config), is what cause your problem?

a workaround might be to restart your device on a schedule to avoid the failures....thats kinda shotty though...

if you think the wifi signal to the board inside the box is the problem, have you explored extending your antenna? Or an even simpler place to start might be different orientations or locations in the box?

Just some stuff to think about.

Who is online

Users browsing this forum: Bing [Bot] and 148 guests