Bluetooth Controller and UART IWDT Timeout - Need Help Debugging
Posted: Tue Apr 18, 2023 8:26 pm
I've got a fairly consistent IWDT timeout that occurs about 25%-50% of the time given the following conditions:
- Setup: ESP32 running ESP-IDF v4.4.1 (tried upgrading to v4.4.4 but it did not solve the issue)
- Bluetooth enabled (NimBLE) and connected to device for WiFi provisioning. Bluetooth controller is pinned to Core 0.
- WiFi enabled and attempting to connect to access point. WiFi task is attached to Core 0.
- Communication through I2C (three devices on core 1, one device on core 0).
- Almost all of the back traces seem to show the esp32 running the btdm_controller_task and timing out while servicing an interrupt while in the function uart_hal_write_txfifo
Here is a typical backtrace (ESP-IDF v4.4.4):
a slightly different Backtrace (ESP-IDF v4.4.1):
Observations:
- The IWDT timeout is always on CPU0
- Increasing the IWDT timeout from 3,000ms to 10,000ms (incrementally) appeared to decrease the frequency of the timeout, but it was still observed at 10,000ms.
- Changing the Bluetooth core affinity to Core 1 seems to get rid of the issue. (I didn't realize there was another setting for NimBLE host core so that is still on Core 0, though the timeout is not occurring anymore nevertheless.)
- Upgrading from ESP-IDF v4.4.1 to v4.4.4 didn't change things.
Questions:
- It seems like based on the backtrace, and how changing Bluetooth to core 1 helps this issue, that a UART ISR and the Bluetooth controller are causing a deadlock condition / starvation? Or perhaps having WiFi, Bluetooth, and some i2c all on core 0 is leading to a deadlock condition? Anyone have insight into how to confirm these theories?
- The reason why we had WiFi and Bluetooth pinned to the same core is that we have a task running on other core (core 1) which is running a PID controller and we wanted to avoid starving that task and destabilizing our PID controllers. Is it possible (or advisable) to raise the priority of this controller task above the BLE task to ensure it has priority?
- Any tips for profiling the scheduler and interrupts in general?
Any advice is greatly appreciated...
- Setup: ESP32 running ESP-IDF v4.4.1 (tried upgrading to v4.4.4 but it did not solve the issue)
- Bluetooth enabled (NimBLE) and connected to device for WiFi provisioning. Bluetooth controller is pinned to Core 0.
- WiFi enabled and attempting to connect to access point. WiFi task is attached to Core 0.
- Communication through I2C (three devices on core 1, one device on core 0).
- Almost all of the back traces seem to show the esp32 running the btdm_controller_task and timing out while servicing an interrupt while in the function uart_hal_write_txfifo
Here is a typical backtrace (ESP-IDF v4.4.4):
Code: Select all
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0).
Core 0 register dump:
PC : 0x400831a1 PS : 0x00050035 A0 : 0x40083891 A1 : 0x3ffbe940
0x400831a1: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.4/components/freertos/port/xtensa/xtensa_vectors.S:1114
0x40083891: queue_isr_handler at C:/Espressif/frameworks/esp-idf-v4.4.4/components/bt/controller/esp32/hli_api.c:170
A2 : 0x00020149 A3 : 0x00038149 A4 : 0x000637ff A5 : 0x3ffbe900
A6 : 0x3ffcf910 A7 : 0x3ffbf378 A8 : 0x00000001 A9 : 0x00000002
A10 : 0x3ffbf35c A11 : 0x00000000 A12 : 0x00000001 A13 : 0x3ffbe8e0
A14 : 0x3ffbf378 A15 : 0x3ffbe914 SAR : 0x00000017 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Core 0 was running in ISR context:
EPC1 : 0x400d3a5f EPC2 : 0x00000000 EPC3 : 0x400831bc EPC4 : 0x400831bc
0x400d3a5f: uart_hal_write_txfifo at C:/Espressif/frameworks/esp-idf-v4.4.4/components/hal/uart_hal_iram.c:35
0x400831bc: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.4/components/freertos/port/xtensa/xtensa_vectors.S:1114
0x400831bc: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.4/components/freertos/port/xtensa/xtensa_vectors.S:1114
Backtrace: 0x4008319e:0x3ffbe940 0x4008388e:0x3ffd9910 0x400f5842:0x3ffd9960 0x40019d11:0x3ffd99a0 0x40055b4d:0x3ffd99c0 0x400ebb2f:0x3ffd99e0 0x400ec169:0x3ffd9a00
0x4008319e: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.4/components/freertos/port/xtensa/xtensa_vectors.S:1114
0x4008388e: queue_isr_handler at C:/Espressif/frameworks/esp-idf-v4.4.4/components/bt/controller/esp32/hli_api.c:168
0x400f5842: r_lld_evt_deffered_elt_handler at ??:?
0x400ebb2f: r_rw_schedule at ??:?
0x400ec169: btdm_controller_task at ??:?
Core 1 register dump:
PC : 0x40192716 PS : 0x00060c35 A0 : 0x800d35ac A1 : 0x3ffbce30
0x40192716: esp_pm_impl_waiti at C:/Espressif/frameworks/esp-idf-v4.4.4/components/esp_pm/pm_impl.c:839
A2 : 0x00000000 A3 : 0x00000001 A4 : 0x8009672d A5 : 0x3ffbcd60
A6 : 0x00000003 A7 : 0x00060023 A8 : 0x80110d82 A9 : 0x3ffbce10
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x1a4edd92 A13 : 0x3ff5f078
A14 : 0x3ff5f07c A15 : 0x00060023 SAR : 0x00000000 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000
Backtrace: 0x40192713:0x3ffbce30 0x400d35a9:0x3ffbce50 0x40093ffb:0x3ffbce70
0x40192713: cpu_ll_waiti at C:/Espressif/frameworks/esp-idf-v4.4.4/components/hal/esp32/include/hal/cpu_ll.h:183
(inlined by) esp_pm_impl_waiti at C:/Espressif/frameworks/esp-idf-v4.4.4/components/esp_pm/pm_impl.c:837
0x400d35a9: esp_vApplicationIdleHook at C:/Espressif/frameworks/esp-idf-v4.4.4/components/esp_system/freertos_hooks.c:63
0x40093ffb: prvIdleTask at C:/Espressif/frameworks/esp-idf-v4.4.4/components/freertos/tasks.c:3987
Code: Select all
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0).
Core 0 register dump:
PC : 0x4008ae27 PS : 0x00050c35 A0 : 0x40083064 A1 : 0x3ffbe900
0x4008ae27: i2c_isr_handler_default at C:/Espressif/frameworks/esp-idf-v4.4.1/components/driver/i2c.c:479
0x40083064: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/xtensa_vectors.S:1111
A2 : 0x3ffd0758 A3 : 0x00000000 A4 : 0x00000002 A5 : 0x4009254c
0x4009254c: _frxt_int_enter at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/portasm.S:119
A6 : 0x400f5580 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x3ffbe8d0
0x400f5580: r_lld_evt_deffered_elt_handler at ??:?
A10 : 0x00000001 A11 : 0x3ffbe900 A12 : 0x3ffbe908 A13 : 0x00000000
A14 : 0x00000001 A15 : 0x3ffd087c SAR : 0x00000017 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Core 0 was running in ISR context:
EPC1 : 0x400d3b0f EPC2 : 0x00000000 EPC3 : 0x4000bff0 EPC4 : 0x400960d5
0x400d3b0f: uart_hal_write_txfifo at C:/Espressif/frameworks/esp-idf-v4.4.1/components/hal/uart_hal_iram.c:35
0x400960d5: compare_and_set_native at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_hw_support/include/soc/compare_set.h:25
(inlined by) spinlock_acquire at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_hw_support/include/soc/spinlock.h:103
(inlined by) xPortEnterCriticalTimeout at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/port.c:288
Backtrace:0x4008ae24:0x3ffbe9000x40083061:0x3ffbe930 0x4008afce:0x3ffb5e90 0x40083611:0x3ffb5ea0 0x400839fb:0x3ffb5ec0 0x4008a962:0x3ffb5ee0 0x400f58d5:0x3ffb5f00 0x400f8401:0x3ffb5f20 0x400f56c6:0x3ffb5f70 0x40019d11:0x3ffb5fb0 0x40055b4d:0x3ffb5fd0 0x400eba77:0x3ffb5ff0 0x400ec0c4:0x3ffb6010
0x4008ae24: i2c_isr_handler_default at C:/Espressif/frameworks/esp-idf-v4.4.1/components/driver/i2c.c:476
0x40083061: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/xtensa_vectors.S:1111
0x4008afce: i2c_isr_handler_default at C:/Espressif/frameworks/esp-idf-v4.4.1/components/driver/i2c.c:531
0x40083611: hli_intr_restore at C:/Espressif/frameworks/esp-idf-v4.4.1/components/bt/controller/esp32/hli_api.c:117
0x400839fb: interrupt_hlevel_restore at C:/Espressif/frameworks/esp-idf-v4.4.1/components/bt/controller/esp32/bt.c:590
0x4008a962: r_global_int_restore at ??:?
0x400f58d5: coex_ble_lld_last_active_evt_counter_set at ??:?
0x400f8401: r_lld_pdu_check at ??:?
0x400f56c6: r_lld_evt_deffered_elt_handler at ??:?
0x400eba77: r_rw_schedule at ??:?
0x400ec0c4: btdm_controller_task at ??:?
Core 1 register dump:
PC : 0x400961ca PS : 0x00060f35 A0 : 0x80093469 A1 : 0x3ffd34f0
0x400961ca: spinlock_release at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_hw_support/include/soc/spinlock.h:159
(inlined by) vPortExitCritical at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/port.c:310
A2 : 0x3ffd087c A3 : 0x3ffd3560 A4 : 0x8009621d A5 : 0x3ffd3440
A6 : 0x00000003 A7 : 0x00060023 A8 : 0x00000000 A9 : 0x0000abab
A10 : 0x00060f23 A11 : 0x3ffd0888 A12 : 0x00000004 A13 : 0x3ffd3564
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Backtrace:0x400961c7:0x3ffd34f00x40093466:0x3ffd3510 0x4010555b:0x3ffd3550 0x400d92a5:0x3ffd3590 0x400d8c8d:0x3ffd35d0 0x40102f35:0x3ffd35f0 0x401028ab:0x3ffd3610 0x400da9b5:0x3ffd3640
0x400961c7: spinlock_release at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_hw_support/include/soc/spinlock.h:159
(inlined by) vPortExitCritical at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/port/xtensa/port.c:310
0x40093466: xQueueReceive at C:/Espressif/frameworks/esp-idf-v4.4.1/components/freertos/queue.c:1432
0x4010555b: i2c_master_cmd_begin at C:/Espressif/frameworks/esp-idf-v4.4.1/components/driver/i2c.c:1466
0x400d92a5: i2c_common_read_mprl_pressure_sensor at C:/Project/components/board/i2c_common.c:149
0x400d8c8d: i2c_3_read_pressure at C:/Project/components/board/MLB_3_X/i2c_3.c:91
0x40102f35: sensor_interface_get_pressure_adc at C:/Project/components/board/MLB_3_X/sensor_implementation_3.c:134
0x401028ab: read_pressure at C:/Project/components/sensors_actuators/sensors.c:170
0x400da9b5: control_task_update_sensors at C:/Project/components/control_task/control_task.c:414
(inlined by) control_task at C:/Project/components/control_task/control_task.c:85
- The IWDT timeout is always on CPU0
- Increasing the IWDT timeout from 3,000ms to 10,000ms (incrementally) appeared to decrease the frequency of the timeout, but it was still observed at 10,000ms.
- Changing the Bluetooth core affinity to Core 1 seems to get rid of the issue. (I didn't realize there was another setting for NimBLE host core so that is still on Core 0, though the timeout is not occurring anymore nevertheless.)
- Upgrading from ESP-IDF v4.4.1 to v4.4.4 didn't change things.
Questions:
- It seems like based on the backtrace, and how changing Bluetooth to core 1 helps this issue, that a UART ISR and the Bluetooth controller are causing a deadlock condition / starvation? Or perhaps having WiFi, Bluetooth, and some i2c all on core 0 is leading to a deadlock condition? Anyone have insight into how to confirm these theories?
- The reason why we had WiFi and Bluetooth pinned to the same core is that we have a task running on other core (core 1) which is running a PID controller and we wanted to avoid starving that task and destabilizing our PID controllers. Is it possible (or advisable) to raise the priority of this controller task above the BLE task to ensure it has priority?
- Any tips for profiling the scheduler and interrupts in general?
Any advice is greatly appreciated...