ESP32 mqtt client ws_read() failure

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

ESP32 mqtt client ws_read() failure

Postby PeterR » Fri Sep 25, 2020 2:57 pm

ws_read_payload() is failing on:

Code: Select all

if (bytes_to_read != 0 && (rlen = esp_transport_read(ws->parent, buffer, bytes_to_read, timeout_ms)) <= 0)
Adding some diagnostics:

Code: Select all

static int ws_read_payload(esp_transport_handle_t t, char *buffer, int len, int timeout_ms)
{
    transport_ws_t *ws = esp_transport_get_context_data(t);

    int bytes_to_read;
    int rlen = 0;

    ESP_LOGI(TAG, "%p: ws_read_payload: len: %d, bytes_remaining: %d, timeout: %d", t, len, ws->frame_state.bytes_remaining, timeout_ms);
    if (ws->frame_state.bytes_remaining > len) {
        ESP_LOGD(TAG, "Actual data to receive (%d) are longer than ws buffer (%d)", ws->frame_state.bytes_remaining, len);
        bytes_to_read = len;

    } else {
        bytes_to_read = ws->frame_state.bytes_remaining;
    }

    // Receive and process payload
    if (bytes_to_read != 0 && (rlen = esp_transport_read(ws->parent, buffer, bytes_to_read, timeout_ms)) <= 0) {
        ESP_LOGE(TAG, "(1) Error read data: bytes:%d, rlen %d", bytes_to_read, rlen);
        ESP_LOGI(TAG, "ws_read_payload() end");
        return rlen;
    }
Yields:

Code: Select all

I (29692) TRANSPORT_WS: 0x3ffc2d08: ws_read_payload: len: 1, bytes_remaining: 5, timeout: 0
E (29702) TRANSPORT_WS: (1) Error read data: bytes:1, rlen 0
The background is that I have created an MQTT QoS0 server. The server is attached to my HTTP server (port 80) on uri /mqtt.
I have tested the server with HiveMq and the HiveMq browser client seems quite happy, can connect, subscribe, receieves subscriptions and may publish.
The MQTT server is cross platform and protocol agnostic. The transmit function ends up as

Code: Select all

httpd_ws_send_frame_async()
(because sometime the server must send disconnect outside the context of an incomming packet).

I have now added the ESP32 MQTT client to the application. The client PUBLISHes to the MQTT server and so feeds a number of local web application stations. PUBLISH works fine. I have connected multiple HiveMq browsers the browsers SUBSCRIBE and receives the ESP32 MQTT client's data.

The ESP32 application must accept web application commands and so SUBSCRIBEs to server topics using an ESP32 MQTT client.
The SUBSCRIPTION generates the above error.

It is interresting that during the CONNECT process the ws_read_payload() timeout is reported as '1000' but is '0' during SUBSCRIBE (i.e. differences between grabbing CONACK and SUBACK).

It is also interesting that the MQTT server works well on its own. The ESP32 MQTT client is failing. Could this be because of the combination of websocket server (used in MQTT server) and websocket client?

Any suggestion for logging etc. I have tried with IDF MASTER and same result.
& I also believe that IDF CAN should be fixed.

ESP-Marius
Posts: 74
Joined: Wed Oct 23, 2019 1:49 am

Re: ESP32 mqtt client ws_read() failure

Postby ESP-Marius » Sun Sep 27, 2020 2:34 am

Interesting issue, can't say I know what the problem is, but lets see if we can get to the bottom of it together.
It is interresting that during the CONNECT process the ws_read_payload() timeout is reported as '1000' but is '0' during SUBSCRIBE (i.e. differences between grabbing CONACK and SUBACK).


This is as expected I think. After connect we block and wait network_timeout_ms, but the regular read is non-blocking.

My first suggestion would be to connect the MQTT server using a seperate ESP32 if that's possible. That should help narrow down if the issue is due to running the two in parallel at least.

I assume the error doesn't occur when you SUBSCRIBE, but rather when you receive the SUBACK? It reporting a payload of 5 bytes seems consistent with a SUBACK package.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Sun Sep 27, 2020 10:36 am

Hey, thanks for the support!
I assume the error doesn't occur when you SUBSCRIBE, but rather when you receive the SUBACK? It reporting a payload of 5 bytes seems consistent with a SUBACK package.
Correct, on SUBACK.
I know that the SUBACK packet is in the pipework as I dump during the MQTT server's transmit.
My first suggestion would be to connect the MQTT server using a seperate ESP32 if that's possible.
I will get on that.
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Mon Sep 28, 2020 11:04 am

I get the same result when using two different ESP32.
I did not start the webserver nor the MQTT server on the ESP32 running the client.

Next I tried to connect to Hive but am not sure which settings to use. I tired:

Code: Select all

    sprintf(host, "broker.mqttdashboard.com");
    config.host = host;
    config.port = 8000;
    config.transport = MQTT_TRANSPORT_OVER_WS;
but get:

Code: Select all

E (26913) TRANSPORT_WS: Error read response for Upgrade header GET / HTTP/1.1
Connection: Upgrade
Host: broker.mqttdashboard.com:8000
User-Agent: ESP32 Websocket Client
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: dgP1c41a7p410IefQ9B4pQ==
Sec-WebSocket-Protocol: mqtt
E (26923) MQTT_CLIENT: Error transport connect
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Mon Sep 28, 2020 12:05 pm

I fired up a Mosquitto MQTT broker on Linux & I can SUBSCRIBE to that.

So guess the problem must be with: (1) my ESP32 MQTT server, (2) the ESP32 http webserver or (3) the ESP32 web socket library.

Hive's browser client subscribes to my server ok & displays data published from another Hive browser client. EDIT: So my MQTT server would seem fine. Struggling to see the issue except that ESP32 MQTT client is more fussy than most.

I am struggling to track down where the client reads the web socket frame and stores. Any suggestions?
EDIT: Managed one sucessful subscribe in all my testing. Seems to suggest an RT element.
Gonna try the WS example with my Mosquito server. The host/uri details have me a little foxed however & especially when debugging, perhaps color the setup with Mosquito & Hive examples?
& I also believe that IDF CAN should be fixed.

ESP-Marius
Posts: 74
Joined: Wed Oct 23, 2019 1:49 am

Re: ESP32 mqtt client ws_read() failure

Postby ESP-Marius » Tue Sep 29, 2020 2:19 am

Hmm, at least this rules out that there is any unintended interaction between the server and the client due to running on the same chip.
I am struggling to track down where the client reads the web socket frame and stores. Any suggestions?
Not sure exactly what you mean here, but the MQTT read that triggers the error is probably the call to mqtt_process_receive() and the underlying websocket transport stuff happens in components/tcp_transport/transport_ws.c

It could definitely be a combination between your MQTT server behaving a little bit different than most servers and the ESP MQTT client being more fussy about the details than other clients.

Any chance you could capture the exchange with wireshark?
The host/uri details have me a little foxed however & especially when debugging, perhaps color the setup with Mosquito & Hive examples?
If you could describe in more detail what you mean here I would be happy to update the docs/example. But the rule of thumb is that the URI is the easy way to specify the connection details, so you shouldn't even need the host/port part.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Tue Sep 29, 2020 10:12 am

Hi,
I will try and get a trace together.
I am struggling to track down where the client reads the web socket frame and stores. Any suggestions?
I want to put diagnostics at the point that an ws frame is received e.g. dump the ws frame & start from there.
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Tue Sep 29, 2020 11:58 am

Enclosed are two Wireshark traces along with annotated server console log. The 'Wireshark n' lines are annotations and link the console lines immediately above to Wireshark log lines.
The Hive protocol is MQIsdp where as ESP32 is MQTT. I do not believe that this explains events*.
I have not checked the websocket upgrade.

* EDIT: https://github.com/mqtt/mqtt.github.io/ ... -and-3.1.1

EDIT: I have seen the ws MQTT example 'work' with my server once or twice. The example client very infrequently gets as far PUBLISH and displays the PUBLISHed data. Mostly iI get the read error on the SUBACK. Also, even if the example gets as far as this then eventually the example falls over when processing a PINGACK.
Attachments
wireshark.zip
ESP & Hive client logs
(5.44 KiB) Downloaded 435 times
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Tue Sep 29, 2020 5:05 pm

The ESP32 MQTT client's WS upgrade requests is different to Hive's:

Code: Select all

GET /mqtt HTTP/1.1
Connection: Upgrade
Host: 192.168.80.101:80
User-Agent: ESP32 Websocket Client
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: AltBqVd5o0eZCCz77Bwfrg==
Sec-WebSocket-Protocol: mqtt
The HiveMq client:
GET /mqtt HTTP/1.1
Host: 192.168.0.48
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0
Accept: */*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Sec-WebSocket-Version: 13
Origin: http://www.hivemq.com
Sec-WebSocket-Protocol: mqttv3.1
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Key: 8Mn5YEQqAi7mPPHfAsssFA==
Connection: keep-alive, Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
[/code]

As the WS frame / MQTT packet is going through the ESP32 HTTP server I wonder if the differences are significant?
& I also believe that IDF CAN should be fixed.

ESP-Marius
Posts: 74
Joined: Wed Oct 23, 2019 1:49 am

Re: ESP32 mqtt client ws_read() failure

Postby ESP-Marius » Wed Sep 30, 2020 3:23 am

Thanks for providing all of this information to help me troubleshoot!

The MQTT packets themselves look normal to me.

I wonder if this could be happening due to the TCP payload being fragmented? From the wireshark capture it looks like you first receive the websocket header, and then the TCP segment containing the payload. And from looking at the code for transport_ws I think that could be causing your failure.

This would explain why it sometimes can subscribe successfully (it's timing related, if the payload package is there quickly enough there is no issue) and why other clients are OK (they handle the fragmentation more gracefully than us...)

Maybe try adding a delay before ws_read_payload() and see if that hides the issue.

Unfortunately I'm not at the office at the moment, but as soon as I am back after the Chinese National day holiday I'll investigate this further and see if we can push a fix.

Who is online

Users browsing this forum: No registered users and 76 guests