I want to try disabling Nagle's algorithm in lwip.
I have tried tcp_nagle_disable(conn) but it doesn't seem to disable the bundling together of small netconn_write(conn, p_data, length, NETCONN_COPY).
How to use TCP_NODELAY with netconn to disable Nagle?
Re: How to use TCP_NODELAY with netconn to disable Nagle?
tcp_nagle_disable(conn->pcb.tcp) does change tcp_nagle_disabled(conn->pcb.tcp) from 0 to 1, but sending lots of small packets still sees them bundled together over 40ms whereas in some cases I'd like to see less latency at the expense of throughput.
Re: How to use TCP_NODELAY with netconn to disable Nagle?
Hi jcsbanks,
Just to make sure I understand: when you look at IP layer packet captures you see some TCP packets contain the results of multiple calls to netconn_send() or netconn_write(), yes?
I think a 40ms delay is probably due to task timing rather than LWIP deliberating "nagling" the packets.
When you send to a socket from a task (using either the netconn or BSD socket algorithms), the packet is added to a queue for the TCP/IP task to handle. When the task runs, it will send all of the waiting data that it can for a particular socket. If other tasks or interrupts in the system prevent the TCP/IP task from running until after multiple writes have been done to that particular socket, the TCP/IP task will combine these writes into a single IP packet (which is desirable, to reduce packet overhead).
ie the TCP/IP task will send packets as fast as it can, but only if it's able to run.
The other possibility is that if an ACK is lost or delayed, the LWIP stack will start queueing up packets to be sent after the un-acked packet in the stream. So this may cause some combining of data.
I wrote a quick bit of test code and I actually was unable to make LWIP combine any writes at all, with or without tcp_nagle_disable() - all packets had 6 byte payloads. I put this down to a fast network, but mostly due to nothing else being active on the ESP32 when the task is running.
The best thing you can do is to lower the priority of other task(s) you are running in the system (and reduce the frequency of any interrupts, if you can), to give the TCP/IP task the maximum possibility of running.
Just to make sure I understand: when you look at IP layer packet captures you see some TCP packets contain the results of multiple calls to netconn_send() or netconn_write(), yes?
I think a 40ms delay is probably due to task timing rather than LWIP deliberating "nagling" the packets.
When you send to a socket from a task (using either the netconn or BSD socket algorithms), the packet is added to a queue for the TCP/IP task to handle. When the task runs, it will send all of the waiting data that it can for a particular socket. If other tasks or interrupts in the system prevent the TCP/IP task from running until after multiple writes have been done to that particular socket, the TCP/IP task will combine these writes into a single IP packet (which is desirable, to reduce packet overhead).
ie the TCP/IP task will send packets as fast as it can, but only if it's able to run.
The other possibility is that if an ACK is lost or delayed, the LWIP stack will start queueing up packets to be sent after the un-acked packet in the stream. So this may cause some combining of data.
I wrote a quick bit of test code and I actually was unable to make LWIP combine any writes at all, with or without tcp_nagle_disable() - all packets had 6 byte payloads. I put this down to a fast network, but mostly due to nothing else being active on the ESP32 when the task is running.
The best thing you can do is to lower the priority of other task(s) you are running in the system (and reduce the frequency of any interrupts, if you can), to give the TCP/IP task the maximum possibility of running.
Re: How to use TCP_NODELAY with netconn to disable Nagle?
Thanks! Lots of great points there I will work on.
I am getting multiple websockets added to one TCP frame yes.
I think I am having this problem due to Windows delayed ACK but need to prove this. If this is so, I need to think of ways around that since changing it on Windows in an application when I do not want users to make registry changes for their whole WiFi interface. Nagle interaction with delayed ACK was the initial thought hence trying to disable Nagle. If not Nagle since it seems I disabled it, perhaps there are some transmit or receive windows I can alter so that more than 1 frame can be in flight?
In the 40ms example I was sending about 600 websockets a second but often 20 or more were in a TCP packet. In another example, if I send a websocket from ESP32 to PC and back again, I could actually see 600 TCP frames per second. The reply from the PC removes the delayed ACK problem.
I have tried 1000Hz tick rate and higher task priorities but think the delayed ACK from Windows is the rate limiting clock here.
I am happy to bundle small websockets but don't always want the latency as the project is a WiFi to CAN gateway and some protocols can have as low as 2ms round trip time.
I will test your great example and report back.
I am getting multiple websockets added to one TCP frame yes.
I think I am having this problem due to Windows delayed ACK but need to prove this. If this is so, I need to think of ways around that since changing it on Windows in an application when I do not want users to make registry changes for their whole WiFi interface. Nagle interaction with delayed ACK was the initial thought hence trying to disable Nagle. If not Nagle since it seems I disabled it, perhaps there are some transmit or receive windows I can alter so that more than 1 frame can be in flight?
In the 40ms example I was sending about 600 websockets a second but often 20 or more were in a TCP packet. In another example, if I send a websocket from ESP32 to PC and back again, I could actually see 600 TCP frames per second. The reply from the PC removes the delayed ACK problem.
I have tried 1000Hz tick rate and higher task priorities but think the delayed ACK from Windows is the rate limiting clock here.
I am happy to bundle small websockets but don't always want the latency as the project is a WiFi to CAN gateway and some protocols can have as low as 2ms round trip time.
I will test your great example and report back.
Re: How to use TCP_NODELAY with netconn to disable Nagle?
ESP_Angus, thanks so much, your example was useful to learn how to do a minimal example with TCP sending.
With ESP32 as AP, using iperf config settings...
netconn_write "OHAI!\n" at 1000Hz:
Nagle disabled: 1000 TCP sent packets per second with every other one being ACK'd by Windows very fast (so 500 TCP received packets per second) This is as fast as USB on 1ms cycle
Nagle enabled: 25 TCP packets per second with each one ACK'd by Windows after 40ms. The throughput is no problem because as the volume of data exceeds one packet then two packets are send and Windows ACKs immediately. But for latency of sending frequently, either Nagle or delayed ACK must be disabled. https://support.microsoft.com/en-us/hel ... by-using-a loks useful for avoiding delayed ACK.
Interestingly, with Nagle disabled or not, Windows will report some multiple "OHAI!\n" in one Socket receive even though Wireshark shows them individually.
With ESP32 as AP, using iperf config settings...
netconn_write "OHAI!\n" at 1000Hz:
Nagle disabled: 1000 TCP sent packets per second with every other one being ACK'd by Windows very fast (so 500 TCP received packets per second) This is as fast as USB on 1ms cycle
Nagle enabled: 25 TCP packets per second with each one ACK'd by Windows after 40ms. The throughput is no problem because as the volume of data exceeds one packet then two packets are send and Windows ACKs immediately. But for latency of sending frequently, either Nagle or delayed ACK must be disabled. https://support.microsoft.com/en-us/hel ... by-using-a loks useful for avoiding delayed ACK.
Interestingly, with Nagle disabled or not, Windows will report some multiple "OHAI!\n" in one Socket receive even though Wireshark shows them individually.
Re: How to use TCP_NODELAY with netconn to disable Nagle?
Hi jcsbanks,
Very glad that was useful for you and you made some useful progress.
There's nothing much you can do at that level. Making sure you have a strong WiFi signal from the AP, and minimising interference from other devices will help. Switching to UDP may also help a bit, if that's an option.
Very glad that was useful for you and you made some useful progress.
It's impressive you got this level of performance & low latency. One thing to keep in mind as you chase this very low latency, WiFi is always going to be latency prone sometimes - maybe some other device transmits heavily on the WiFi, or some other 2.4GHz radio transmits something, or there's some random RF noise from a natural or unnatural source, a microwave oven is running, etc, etc - there's no shortage of things which will introduce random delays when frames are lost and re-sent at the WiFi level (WiFi has a whole system of acks, retries & reliable delivery underneath the IP layer, invisible to application level programs).jcsbanks wrote:This is as fast as USB on 1ms cycle
There's nothing much you can do at that level. Making sure you have a strong WiFi signal from the AP, and minimising interference from other devices will help. Switching to UDP may also help a bit, if that's an option.
Windows scheduler time slice is (from memory) 25ms, so Windows will sometimes be switching away from the program reading the socket and spending time in the OS layer or some other task. If packets are received at this time, they'll all be aggregated when the process returns its socket read.jcsbanks wrote: Interestingly, with Nagle disabled or not, Windows will report some multiple "OHAI!\n" in one Socket receive even though Wireshark shows them individually.
Who is online
Users browsing this forum: Google [Bot] and 140 guests