I have ESP32 that gets data from multiple sensors (via I2C, SPI), convert it to JSON format and sends to MQTT broker via ethernet (external LAN8720 chip, 100Mb/s) once per second. Everything was working fine for some time (measurements, conversion, MQTT pub/sub, data sending), until I tried running it in a network where TP-Link is a router. Previously, I was testing on 5 different routers (non TP-Link) and everything was fine. I was testing with both cloud based MQTT broker and local MQTT broken on the same network.
On TP-Link network I get a lot of TCP retransmissions and basically the esp_mqtt_client_publish() hangs until 10s timeout (standard). What is strange, I also tried running this with a "good" router, but between the router and my device's ethernet cable I put PLC adaptors (close to each other in a power strip) also from TP-Link - I also saw a lot of issues and retransmissions. Then I used a small pocket router from TP-Link that I has in the drawer and ... same issue.
When I was reading about ESP32 and MQTT I saw some people complaining that they had issues connecting their ESP32 IoT based devices to HA or external services via MQTT when they were using Wi-Fi network in TP-Link routers (mostly Archer AX series). Since they were sending small JSON packets, they went through eventually, but people say the delay was 5-30 seconds. When connecting to other (non-TPLink routers) everything was working fine. I am sending larger JSONs, so they are being segmented for TCP packets. Since my application is quasi real-time, I can't afford 5s delays, not to mention I have a limited space for buffering.
I have tried different ethernet cable lengths (0.5m ... 20m). All of them have issues in TP-Link scenario, non of them have issues on other routers. What is strange, when I connect my device to RPI and RPI is sharing Wi-Fi connectivity (from TP-Link) with ethernet (I am not sure if that was bridged or just routed), the issue is gone.
I have tried to recompile with ESP-IDF v5.0, v5.1 and v5.2-beta -> no difference.
Here is a wireshark drop from ASUS communication: Here is a wireshark drop from TP-Link communication: MQTT is non-secure, on 1883. As mentioned the issue is both when broken is cloud based and when is on local network.
Small JSONs after reboot, I can get and send (i.e. pub/sub, getting single configuration attributes), but after that it gets messy.
Here is a sample log from a TP-Link scenario:
Code: Select all
I (2337) ETHERNET: Ethernet Started
I (2340) ETHERNET: Ethernet Link Up
I (2340) ETHERNET: Ethernet HW Addr 24:dc:c3:2a:2c:7f
I (3840) esp_netif_handlers: eth ip: 192.168.1.112, mask: 255.255.255.0, gw: 192.168.1.254
I (3841) ETHERNET: Ethernet Got IP Address
I (3843) ETHERNET: ~~~~~~~~~~~
I (3847) ETHERNET: ETHIP:192.168.1.112
I (3851) ETHERNET: ETHMASK:255.255.255.0
I (3856) ETHERNET: ETHGW:192.168.1.254
I (3860) ETHERNET: ~~~~~~~~~~~
I (3866) SOCKET: Socket created
I (3870) SOCKET: Socket bound, port 3333
I (4878) MQTT: Other event id:7
I (4986) MQTT: MQTT_EVENT_CONNECTED
I (4988) MQTT: sent subscribe successful, msg_id=61612
I (4991) MQTT: sent subscribe successful, res_id=56439
I (5027) MQTT: MQTT_EVENT_SUBSCRIBED, msg_id=61612
I (5067) MQTT: MQTT_EVENT_SUBSCRIBED, msg_id=56439
I (6000) JSON DATA: Attr: {"sys_gain":53936,"sys_phase":2.92}
I (70563) MQTT: MQTT_EVENT_DATA
I (70565) MQTT: Start measurement cmd received
W (83170) transport_base: Poll timeout or error, errno=Success, fd=55, timeout_ms=10000
E (83170) mqtt_client: Writing didn't complete in specified timeout: errno=0
I (83177) MQTT: MQTT_EVENT_DISCONNECTED
W (83180) mqtt_client: Publish: Losing qos0 data when client not connected
Any ideas what could be wrong? I have a default MQTT client config. Is there a way to adjust this config to overcome this issue. It is very strange that it is happening on TP-Link routers. I am not saying that this won't happen on other brands, but I have testing on ASUS, UPC - router from my ISP, Fritzbox and some CN router as well. What else could I test to share more info?
Thanks.