IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

birdistheword96
Posts: 5
Joined: Mon Apr 01, 2024 10:34 am

IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby birdistheword96 » Mon Apr 01, 2024 10:41 am

Hi all,
I'm facing a challenging issue with an ESP32 project (using ESP_IDF) that involves Inter-Processor Call (IPC) tasks sometimes locking up, subsequently triggering the watchdog. The problem seems to occur within the spi_flash_op_block_func, specifically in a while loop that waits for flash operations to complete. Here's the relevant code snippet:

Code: Select all

while (!s_flash_op_complete) {     
// busy loop here and wait for the other CPU to finish flash operation 
} 
This suggests that the s_flash_op_complete flag is sometimes not being set by the other core. I've ensured compliance with ESP_INTR_FLAG_IRAM constraints, verifying that IRAM interrupts do not access any flash content, they only modify some static variables.
The issue becomes more obvious with frequent writes to NVS_Storage, which also resides on the onboard flash with an ESP NVS partition. Temporarily disabling NVS writes seems to stop the problem, indicating a potential conflict or resource contention issue related to flash access.
For context, all NVS writes are protected by a mutex to prevent concurrent write attempts that could block the SPI bus. Despite this, the lock-up still occurs, and since the NVS is needed for this project, there seems no obvious solution until I can work out what is causing this.
Has anyone encountered a similar issue or have insights into what might be causing this and how to resolve it? Any advice on troubleshooting or fixing this issue would be greatly appreciated. Thanks in advance for your help!

For context this is on an ESP32S3

ESP_Sprite
Posts: 9730
Joined: Thu Nov 26, 2015 4:08 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby ESP_Sprite » Tue Apr 02, 2024 1:45 am

What hardware is this on? I'm thinking that for some reason you may have very slow flash. Does it help to increase the watchdog timeouts?

birdistheword96
Posts: 5
Joined: Mon Apr 01, 2024 10:34 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby birdistheword96 » Tue Apr 02, 2024 10:02 am

This is on an ESP32S3. The watchdog is set to 5 seconds, but it gets stuck in a loop once it kicks in, and then the other core is always in the IDLE loop (since the task scheduler is disabled during flash operations). This is all using the internal flash.

ESP_Sprite
Posts: 9730
Joined: Thu Nov 26, 2015 4:08 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby ESP_Sprite » Wed Apr 03, 2024 2:30 am

That is odd... any chance you could reduce your code to a minimal working example of this behaviour and post it here or send it to me (jeroen at espressif dot com)? Fwiw, this is not something I've seen before, so my first thought is that your program may be doing something weird like corrupting some memory somewhere.

birdistheword96
Posts: 5
Joined: Mon Apr 01, 2024 10:34 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby birdistheword96 » Thu Apr 04, 2024 8:58 am

Unfortunately it's not easy to create a minimal working example, because it seems to only occur when there processor has lots of work to do. After some further debugging, we have found a little more information, which I have described below:

----------------------------------------------------------------------------------------------------
Issue Description
----------------------------------------------------------------------------------------------------
We have an issue where the ipc1 task locks up when running spi_flash_op_block_func() causing the
watchdog to report IDLE1. It happens when using the NVS API and is because the s_flash_op_complete
flag never gets cleared in spi_flash_enable_interrupts_caches_and_other_cpu().

Typically this is an infrequent issue but we can make it occur more quickly by adding NVS writes
at higher rates in the main task.

It only seems to occur when the system is loaded but we do not know whether this is because of total
processor loading, the number of context switches or something else. We have tried to replicate the
issue in a more simple application but have not yet managed to do so.

The issue occurs when our application is simultaneously:
- Receiving and processing 25 Hz input from a GNSS engine.
- Reading from two I2C devices at 10 Hz (accelerometer and GPIO expander).
- Updating a display at 25 Hz.
- Logging 25 Hz data to an SD card.

----------------------------------------------------------------------------------------------------
Setup Description
----------------------------------------------------------------------------------------------------
We are running ESP-IDF v5.1.1 with the following minor changes:
- Ported esp_intr_dump().
- Enabled FF_USE_FIND in the fatfs component.
- Added custom code to second stage bootloader.
- Added fatfs component support for SD card with invalid BIOS parameter block file system name.

We are using an ESP32-S3 on custom PCB and running the following tasks:
Core 0 Core 1 tskNO_AFFINITY
------ ------ ------
IDLE0 IDLE1 display_task
ipc0 ipc1 keys_task
main log_task serial_rx_task
esp_timer can_task gnss_rx_task
sys_evt data_processing_task

----------------------------------------------------------------------------------------------------
Debug process so far
----------------------------------------------------------------------------------------------------
To debug the issue we have added call counters which get printed in esp_task_wdt_isr_user_handler()
when the watchdog occurs. These suggest that when the issue occurs:


The call to esp_ipc_call() in spi_flash_disable_interrupts_caches_and_other_cpu() when called from
core 0 does not return:

Code: Select all

ESP_ERROR_CHECK(esp_ipc_call(other_cpuid, &spi_flash_op_block_func, (void *) other_cpuid));
The call to xTaskNotify() in esp_ipc_call_and_wait() when called from core 1 does not return:

Code: Select all

xTaskNotify(s_ipc_task_handle[cpu_id], wait_for, eSetValueWithOverwrite);
The call to vPortExitCritical() in xTaskGenericNotify() does not return:

Code: Select all

taskEXIT_CRITICAL( &xKernelLock );

birdistheword96
Posts: 5
Joined: Mon Apr 01, 2024 10:34 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby birdistheword96 » Thu Apr 04, 2024 1:59 pm

Its also worth noting that all the tasks on CORE_0 appear to be running OK. It is only one core that locks up.

birdistheword96
Posts: 5
Joined: Mon Apr 01, 2024 10:34 am

Re: IPC Task Lock-Up and Watchdog Triggering with SPI Flash Operations

Postby birdistheword96 » Sat Apr 06, 2024 4:26 pm

If we change the configuration to run on a single core, the problem disappears. It seems to be an issue with cross-core synchronisation.

Who is online

Users browsing this forum: No registered users and 140 guests