Page 1 of 1

ESP32 as HCI controller for BlueZ: MTU change causes missing indications!

Posted: Fri Oct 04, 2024 2:32 pm
by danergo
I'm having a fairly complex scenario and hoping some common knowledge to solve this up:

Components:
  • BLE client device (CL)
  • ESP-WROOM-32 (ESP32): details below
  • Linux computer: details below
ESP-WROOM-32 (ESP32)
  • Used as Bluetooth co-processor: it only provides bluetooth functionality to the main computer
  • Firmware is configured and built with esp-idf, with (almost) latest version (ESP-IDF v5.4-dev-3201-g46acfdce96)
  • Firmware is controller_hci_uart_esp32, accessible from esp-idf's examples
  • Connected to the Linux computer with a full-featured UART (RX, TX, CTS, RTS) with 921600bps baud
Computer
  • OS: Debian booksworm (Debian GNU/Linux 12)
  • Kernel: 6.1.21-v8+
  • BlueZ: 5.66
ESP32 runs the "controller_hci_uart" example firmware, and it's being attached to the computer with this command:

Code: Select all

/usr/bin/btattach -S 921600 -B /dev/ttyS0
Therefore we have a hci device:

Code: Select all

# hciconfig 
hci0:   Type: Primary  Bus: UART
        BD Address: AA:BB:CC:DD:EE:FF  ACL MTU: 1021:9  SCO MTU: 255:4
        UP RUNNING 
        RX bytes:7652480 acl:647 sco:0 events:213150 errors:0
        TX bytes:44545 acl:788 sco:0 commands:2064 errors:0
Test scenario: listen to HCI events (similar functionality: hcitool lescan), and occasionally connect to BLE client (CL) with some gatt-write-reqs (similar functionality: gatttool).

Before continuing, let's see some details of the BLE client device (CL)
  • After successful connection, it wants to increase the MTU (from 23 to 247)
  • It can work with the base (23) MTU too, but then it will split it's messages up
So, after running the test scenario for about 10-12 hours, the functionality starts breaking: we can't connect to CL anymore. Scanning still works.

I have investigated the problem a bit further and to be easier to understand, let's define two phases:
  • Phase1 (testing): constant scanning, and occasional successful GATT connection
  • Phase2 (testing failed): constant scanning works, but GATT connection fails
How it fails? CL doesn't reply to our request.

To be more exact: we don't receive ANY message replies (indications) above a certain (23 - assumed) length, and we consider that CL times out.

BUT! I investigated this even further: in case we don't accept the MTU raise request (therefore MTU stays the default 23), CL answers! (Or more fair: we CAN receive CL's reply).

So the problem is that in long-term the configuration can work perfectly, but after a couple of hours, raising MTU causing problems. I don't know, but I feel that ESP32 always receives the CL's response, but it doesn't transfer to us over HCI.

Important! In case I reset the ESP32 (and only that, all other components stays online), we can immediately jump back to Phase1, because MTU raise won't cause problems again for a couple of hours.

I hate these kind of errors, but really curious if anyone has some advice here. What shall we do?

Re: ESP32 as HCI controller for BlueZ: MTU change causes missing indications!

Posted: Fri Oct 04, 2024 5:19 pm
by danergo
One addition: I hooked up an oscilloscope onto RX (from Computer point-of-view), so I can see packets arriving to computer.

Now, when I don't accept MTU change request (and it stays 23), I can see traces on the scope.

When we accept MTU request of the client to be raised to 247, scope stays silent, meaning ESP is NOT PUSHING OUT the longer messages anymore.

NO log messages are printed onto the serial console of the debug port of ESP32, and no watchdog is triggering.

I wish to debug this further, but if I restart, I'll have to wait another 10-12hours. It would be EXTREMELY helpful if someone could help soon.