ESP32 mqtt client ws_read() failure
Re: ESP32 mqtt client ws_read() failure
Thanks; great call!
A task delay works.
At the start of the investigation I added logic to test & set timeout to 1000mS if zero - but it sill failed. Must have put that in the wrong place!
The take away is:
(1) The fault happens when the ESP-IDF MQTT client receives a fragmented WS frames; when the header & payload do not arrive at the same time.
(2) I use the ESP-IDF websocket server to transmit MQTT server packets (both httpd_ws_send_frame_async() & httpd_ws_send_frame() produce this error).
Checking httpd_ws.c & WS header & payload are indeed sent seperately.
I would suggest that this is a fault in the ESP-IDF Websocket transport layer. It is perfectly possibly for header and payload to land at different times & indeed this is guarenteed when using the ESP-IDF Websocket server (albeit the segments may land close enough not to matter)!
I will go back and look at my timeout logic again and see if I can suggest a more polished fix.
Thanks again for the support & happy holidays!
A task delay works.
At the start of the investigation I added logic to test & set timeout to 1000mS if zero - but it sill failed. Must have put that in the wrong place!
The take away is:
(1) The fault happens when the ESP-IDF MQTT client receives a fragmented WS frames; when the header & payload do not arrive at the same time.
(2) I use the ESP-IDF websocket server to transmit MQTT server packets (both httpd_ws_send_frame_async() & httpd_ws_send_frame() produce this error).
Checking httpd_ws.c & WS header & payload are indeed sent seperately.
I would suggest that this is a fault in the ESP-IDF Websocket transport layer. It is perfectly possibly for header and payload to land at different times & indeed this is guarenteed when using the ESP-IDF Websocket server (albeit the segments may land close enough not to matter)!
I will go back and look at my timeout logic again and see if I can suggest a more polished fix.
Thanks again for the support & happy holidays!
& I also believe that IDF CAN should be fixed.
Re: ESP32 mqtt client ws_read() failure
I had my timeout logic in the wrong place.
Setting:
after the header has been grabbed also works.
We discussed the timeout at the start of this topic & I think I now understand why you do not want a timeout on header search but you definitely want a timeout after!
Setting:
Code: Select all
timeout_ms = 1000;
We discussed the timeout at the start of this topic & I think I now understand why you do not want a timeout on header search but you definitely want a timeout after!
& I also believe that IDF CAN should be fixed.
Re: ESP32 mqtt client ws_read() failure
Adding a timeout seems to fix.
There seems to be another websocket issue as detailed here:
There seems to be another websocket issue as detailed here:
& I also believe that IDF CAN should be fixed.
-
- Posts: 74
- Joined: Wed Oct 23, 2019 1:49 am
Re: ESP32 mqtt client ws_read() failure
Mind testing if this works for you? (without any of your timeouts)
Attached a diff
Attached a diff
- Attachments
-
- diff.txt
- (1011 Bytes) Downloaded 430 times
Re: ESP32 mqtt client ws_read() failure
Hi,
The patch does not apply to my IDF: SHA-1: 84b51781c80740fda92784dafcfc96c13b0d8b66
The patch needs to be applied to latest IDF: SHA-1: 8bc19ba893e5544d571a753d82b44a84799b94b1
If I swap over to latest IDF & make -j8 flash then:
I've been down this path before & even with a clean MINGW toolchain I was unable to quickly resolve.
I have a working change (time-out). Happy to try yours though if you make a 84b51781c80740fda92784dafcfc96c13b0d8b66 patch.
The patch does not apply to my IDF: SHA-1: 84b51781c80740fda92784dafcfc96c13b0d8b66
The patch needs to be applied to latest IDF: SHA-1: 8bc19ba893e5544d571a753d82b44a84799b94b1
If I swap over to latest IDF & make -j8 flash then:
Code: Select all
The following Python requirements are not satisfied:
gdbgui==0.13.2.0
pygdbmi<=0.9.0.2
reedsolo>=1.5.3,<=1.5.4
bitstring>=3.1.6
The recommended way to install a packages is via "pacman". Please run "pacman -Ss <package_name>" for searching the package database and if found then "pacman -S mingw-w64-i686-python-<package_name>" for installing it.
NOTE: You may need to run "pacman -Syu" if your package database is older and run twice if the previous run updated "pacman" itself.
Please read https://github.com/msys2/msys2/wiki/Using-packages for further information about using "pacman"
Diagnostic information:
IDF_PYTHON_ENV_PATH: (not set)
Python interpreter used: C:/msys32/mingw32/bin/python.exe
Warning: python interpreter not running from IDF_PYTHON_ENV_PATH
PATH: C:\msys32\mingw32\bin;C:\msys32\opt\xtensa-esp32-elf\bin;C:\msys32\mingw32\bin;C:\msys32\usr\local\bin;C:\msys32\usr\bin;C:\msys32\usr\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\msys32\usr\bin\site_perl;C:\msys32\usr\bin\vendor_perl;C:\msys32\usr\bin\core_perl
I have a working change (time-out). Happy to try yours though if you make a 84b51781c80740fda92784dafcfc96c13b0d8b66 patch.
& I also believe that IDF CAN should be fixed.
Re: ESP32 mqtt client ws_read() failure
@ESP-Marius
Hi,
My timeout logic is not perfect. I still get:
Would you please create your patch for my SHA 84b51781c80740fda92784dafcfc96c13b0d8b66 ?
Hi,
My timeout logic is not perfect. I still get:
Code: Select all
TRANSPORT_WS: Error read data
TRANSPORT_WS: Error reading payload data
& I also believe that IDF CAN should be fixed.
-
- Posts: 74
- Joined: Wed Oct 23, 2019 1:49 am
Re: ESP32 mqtt client ws_read() failure
Hi,PeterR wrote: ↑Thu Oct 22, 2020 10:49 am@ESP-Marius
Hi,
My timeout logic is not perfect. I still get:Would you please create your patch for my SHA 84b51781c80740fda92784dafcfc96c13b0d8b66 ?Code: Select all
TRANSPORT_WS: Error read data TRANSPORT_WS: Error reading payload data
You can try the one I've attached now and see if that applies/helps.
- Attachments
-
- ws.diff.txt
- (8.36 KiB) Downloaded 474 times
Re: ESP32 mqtt client ws_read() failure
Thanks, that looks good. I will comment again in a couple of weeks when it has been bedded in.
Had a couple of whitespace issues fixed with:
EDIT: PS - would you describe the change please? You modified the ESP MQTT client but packet to frame may be (0..1 : 0..1). Not had a chance to review in detail but interested in what you think was wrong with ESP MQTT client.
I know; gift horse etc....
Had a couple of whitespace issues fixed with:
Code: Select all
git apply --whitespace=fix ws.diff.txt
I know; gift horse etc....
& I also believe that IDF CAN should be fixed.
Re: ESP32 mqtt client ws_read() failure
Hi,
I think that the patch fixes the fragmentation issue however there are other issues behind this.
(1) On occasion I get:
This error is generated from my internal MQTT server & (I believe) only as a result of my own MQTT client's PUBLISH.
(2) On occasion I get:
This is my own message and is made from my MQTT server's
The log is made for my IDF MQTT client's socket & (even on MQTT_CLIENT verbose) without any other messages.
(3) and very infrequently:
Now (3) seems to be the clue; emac_esp32_rx_task() was unable to allocate memory & pass the packet on. Wonder if MQTT CLIENT is also scratching around for memory.....?
I suspect that (2) is also the result of memory shortage and; (1) might be an alternative path/race but triggered from a lack of memory (i.e. if IDF MQTT WS transport send in sections...).
It is clear that available memory depends on emac_esp32_rx_task() (i.e. it's malloc()) and so network traffic, processing etc.
Ideally an MQTT connection should be maintained. Droping an Websocket MQTT connection is a big deal for my application & results in 1 second or so outage. AJAX would be relatively imune to this issue because whilst a single request might fail (lack of memory) AJAX is not 'connected'. So if you AJAX request fast enough you'll only see a judder but you are clearly otherwise limited.
SO: memory indeterminism seems to be the result of emac_esp32_rx_task() & its mallocs. This then gives rise to WS frame errors and some other requests to close the client socket which I believe eminate from the ESP-IDF MQTT websocket client service.
I wonder if the MQTT client library and web socket transport could just return 'try again' or fail? Certainly could add better failure logging. I suspect that IDF MQTT client packet send() ends up in multiple parts and that WS transport might also end up in multiple TCP/IP packets. If instead MQTT packet send() was sent in one section then we could both report the transport error back & stop an MQTT server thinking that there was a protocol error due to fragmented WS frames & killing the connection (following emac_esp32_rx_task() 'eating all the pies')- there would no longer be any incomplete MQTT packets & so no reason to kick an ESP-IDF MQTT client into the bin!
Assuming WS fragmentation has been fixed then this is just an IDF MQTT client fragmentation issue.
Would be really keen for pointers and/or a patch!
EDIT: PS Its QoS0 after all (& would be hard to achieve better on an embedded server), so just return fail!
EDIT: PPS Bring on common browser uni/multicast support
I think that the patch fixes the fragmentation issue however there are other issues behind this.
(1) On occasion I get:
Code: Select all
httpd_ws: httpd_ws_recv_frame: WS frame is not properly masked
(2) On occasion I get:
Code: Select all
MQTTS: Session ending for socket: 53
Code: Select all
mqtts_set_session_context(session, &on_session_end, newContext);
(3) and very infrequently:
I have about 35KB data free.no mem for receive buffer
Now (3) seems to be the clue; emac_esp32_rx_task() was unable to allocate memory & pass the packet on. Wonder if MQTT CLIENT is also scratching around for memory.....?
I suspect that (2) is also the result of memory shortage and; (1) might be an alternative path/race but triggered from a lack of memory (i.e. if IDF MQTT WS transport send in sections...).
It is clear that available memory depends on emac_esp32_rx_task() (i.e. it's malloc()) and so network traffic, processing etc.
Ideally an MQTT connection should be maintained. Droping an Websocket MQTT connection is a big deal for my application & results in 1 second or so outage. AJAX would be relatively imune to this issue because whilst a single request might fail (lack of memory) AJAX is not 'connected'. So if you AJAX request fast enough you'll only see a judder but you are clearly otherwise limited.
SO: memory indeterminism seems to be the result of emac_esp32_rx_task() & its mallocs. This then gives rise to WS frame errors and some other requests to close the client socket which I believe eminate from the ESP-IDF MQTT websocket client service.
I wonder if the MQTT client library and web socket transport could just return 'try again' or fail? Certainly could add better failure logging. I suspect that IDF MQTT client packet send() ends up in multiple parts and that WS transport might also end up in multiple TCP/IP packets. If instead MQTT packet send() was sent in one section then we could both report the transport error back & stop an MQTT server thinking that there was a protocol error due to fragmented WS frames & killing the connection (following emac_esp32_rx_task() 'eating all the pies')- there would no longer be any incomplete MQTT packets & so no reason to kick an ESP-IDF MQTT client into the bin!
Assuming WS fragmentation has been fixed then this is just an IDF MQTT client fragmentation issue.
Would be really keen for pointers and/or a patch!
EDIT: PS Its QoS0 after all (& would be hard to achieve better on an embedded server), so just return fail!
EDIT: PPS Bring on common browser uni/multicast support
& I also believe that IDF CAN should be fixed.
Who is online
Users browsing this forum: Baidu [Spider] and 88 guests