ULP wake failing with WDT reset

icrowe
Posts: 5
Joined: Fri Feb 26, 2021 11:07 pm

ULP wake failing with WDT reset

Postby icrowe » Tue May 11, 2021 12:14 am

We are having a problem with the ESP32 waking from deep sleep. We have multiple systems that have been reliably going into deepsleep, performing several I2C sensor readings, and then waking the main cpu to report the readings over wifi. These systems have had reliable up-times of weeks to months.

We have recently been testing the systems in warmer conditions, and have discovered occasional gaps in our sensor readings. These gaps are always associated with a WDT reset happening at the time of the regularly scheduled wakeup. This shows the ULP is running its code loop and tries to wake the xtensa at the correct time. But as the xtensa wakes, it sees a WDT instead.
Here are two log excerpts to show the event.
* A normal sleep then wakeup:

Code: Select all

I (36475) ff_boot: power killed
I (36485) ff_ulp: Setting DOWNLOADBOOT (gpio0) as EXT1 wakeup source
I (36485) ff_ulp: rtc i2c2 gpio to inputs
W (36495) ff_ulp: GPIO13 to ULP control
I (36495) ff_ulp: prepping deep sleep. Mode 11, sleeptime 5m, min ULP cycles: 12
I (36505) ff_boot: Entering deep sleep.  ULP sensor period 300s, nominally 12 sweeps

I (19) boot: ESP-IDF v3.3.2-241-g1d2d93acd 2nd stage bootloader
I (19) boot: compile time 00:32:48
D (19) boot: Enabling RTCWDT(90000 ms)
I (20) boot: Enabling RNG early entropy source...
D (24) boot: magic e9
D (26) boot: segments 04
* A watchdog event:

Code: Select all

I (36675) ff_boot: power killed
I (36675) ff_ulp: Setting DOWNLOADBOOT (gpio0) as EXT1 wakeup source
I (36675) ff_ulp: rtc i2c2 gpio to inputs
W (36685) ff_ulp: GPIO13 to ULP control
I (36685) ff_ulp: prepping deep sleep. Mode 11, sleeptime 5m, min ULP cycles: 12
I (36695) ff_boot: Entering deep sleep.  ULP sensor period 300s, nominally 12 sweeps

W (14) boot: PRO CPU has been reset by WDT.
W (14) boot: WDT reset info: PRO CPU PC=0x2721c99f
D (14) boot: WDT reset info: PRO CPU STATUS        0x00000000
D (16) boot: WDT reset info: PRO CPU PID           0x00000002
D (22) boot: WDT reset info: PRO CPU PDEBUGINST    0x0f303000
D (27) boot: WDT reset info: PRO CPU PDEBUGSTATUS  0x00000008
D (33) boot: WDT reset info: PRO CPU PDEBUGDATA    0x7f6940aa
D (38) boot: WDT reset info: PRO CPU PDEBUGPC      0x2721c99f
D (44) boot: WDT reset info: PRO CPU PDEBUGLS0STAT 0x00001024
D (49) boot: WDT reset info: PRO CPU PDEBUGLS0ADDR 0x235d859c
D (55) boot: WDT reset info: PRO CPU PDEBUGLS0DATA 0x2467c04d
W (60) boot: WDT reset info: APP CPU PC=0x7677c9cf
D (65) boot: WDT reset info: APP CPU STATUS        0x00000000
D (70) boot: WDT reset info: APP CPU PID           0x00000003
D (75) boot: WDT reset info: APP CPU PDEBUGINST    0x0e101002
D (81) boot: WDT reset info: APP CPU PDEBUGSTATUS  0x00000026
D (86) boot: WDT reset info: APP CPU PDEBUGDATA    0xaacb95f0
D (92) boot: WDT reset info: APP CPU PDEBUGPC      0x7677c9cf
D (97) boot: WDT reset info: APP CPU PDEBUGLS0STAT 0x00b0000a
D (103) boot: WDT reset info: APP CPU PDEBUGLS0ADDR 0xc1d8b65b
D (108) boot: WDT reset info: APP CPU PDEBUGLS0DATA 0xbe30baca
I (114) boot: ESP-IDF v3.3.2-241-g1d2d93acd 2nd stage bootloader
The WDT resets are also associated with off-module (but on PCB) sensor readings that are over 40C - we haven't seen this because the readings are lost. When we graph our temperature readings, the trends show that above 40C the readings stop, and below 40C they restart.

40C is well within the operating temperature of both our sensor chip and the ESP32 module. To prove this, I ran a system without sleeping until it was reporting temperatures of nearly 50C, then let it sleep. It was fully operational, and there were no error messages at the time deepsleep was triggered. When I force the xtensa to wake with an external trigger, the same WDT reset happened.

The WDT reset information above doesn't tell me very much, perhaps someone else will understand it better. All I note is that the PRO CPU PC address is not even in an address range that the ESP32 normally runs code from. The ESP32 datasheet says all accesses below 0x4000:0000 are treated as data access, although I've also seen information in the technical manual suggesting some code can run in the 3F00:0000 range.

This behaviour occurs even with these two settings in sdkconfig:
CONFIG_ESP32_DEEP_SLEEP_WAKEUP_DELAY=5000
CONFIG_BOOTLOADER_WDT_ENABLE=n

Is there a hardware or software bug that could be causing this issue? One suspicion was that the external flash is getting hotter, and isn't able to be accessed even after the 5000us delay.

Any suggested workaround?

icrowe
Posts: 5
Joined: Fri Feb 26, 2021 11:07 pm

Re: ULP wake failing with WDT reset

Postby icrowe » Wed Jun 02, 2021 5:30 pm

I have since found my own workaround for this unpopular problem.

After reset, I check the return of esp_reset_reason(), and if it is ESP_RST_WDT, I do a second check to see if the RTC memory used by the ULP is coherent. If so, I assume it was a normal wake, and not a watchdog reset.

Interestingly, the ULP is still running when this happens - although the reset reason is a watchdog reset, it has not been halted as one might expect.

With the ULP as the only wake source, this comes with a risk of bricking the system if it has been reset. Further work remains to ensure that the ULP is really running. I’d be very interested in hearing from anybody who knows how to check if the ULP is currently alive, even if it is in the halted state.

icrowe
Posts: 5
Joined: Fri Feb 26, 2021 11:07 pm

Re: ULP wake failing with WDT reset

Postby icrowe » Thu Jun 03, 2021 10:20 pm

    I think I have found a way to verify that the ULP is actually going to wake up and run its program. The ULP Coprocessor programming API Guide has this to say near the end of the https://docs.espressif.com/projects/esp ... ogram-flow section:
    The program runs until it encounters a `halt` instruction or an illegal instruction. Once the program halts, ULP coprocessor powers down, and the timer is started again.

    To disable the timer (effectively preventing the ULP program from running again), clear the `RTC_CNTL_ULP_CP_SLP_TIMER_EN` bit in the `RTC_CNTL_STATE0_REG` register.
    Presumably the opposite applies too: if that bit is set then the timer is running and the ULP will restart when it times out.

    A full workaround runtime test for detecting this spurious watchdog reset is:
    1. esp_reset_reason() returns ESP_RST_TASK_WDT
    2. RTC memory contains values that make sense for a previously running ULP
    3. RTC_CNTL_ULP_CP_SLP_TIMER_EN bit is set
    Given these 3 conditions, I can determine that the watchdog reset was caused by some problem during the wake triggered by the ULP and safely continue processing the ULP gathered data.

    Who is online

    Users browsing this forum: No registered users and 96 guests