How to use TCP_NODELAY with netconn to disable Nagle?

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

How to use TCP_NODELAY with netconn to disable Nagle?

Postby jcsbanks » Sun Jul 22, 2018 12:10 am

I want to try disabling Nagle's algorithm in lwip.

I have tried tcp_nagle_disable(conn) but it doesn't seem to disable the bundling together of small netconn_write(conn, p_data, length, NETCONN_COPY).

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

Re: How to use TCP_NODELAY with netconn to disable Nagle?

Postby jcsbanks » Sun Jul 22, 2018 12:57 am

tcp_nagle_disable(conn->pcb.tcp) does change tcp_nagle_disabled(conn->pcb.tcp) from 0 to 1, but sending lots of small packets still sees them bundled together over 40ms whereas in some cases I'd like to see less latency at the expense of throughput.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: How to use TCP_NODELAY with netconn to disable Nagle?

Postby ESP_Angus » Mon Jul 23, 2018 5:42 am

Hi jcsbanks,

Just to make sure I understand: when you look at IP layer packet captures you see some TCP packets contain the results of multiple calls to netconn_send() or netconn_write(), yes?

I think a 40ms delay is probably due to task timing rather than LWIP deliberating "nagling" the packets.

When you send to a socket from a task (using either the netconn or BSD socket algorithms), the packet is added to a queue for the TCP/IP task to handle. When the task runs, it will send all of the waiting data that it can for a particular socket. If other tasks or interrupts in the system prevent the TCP/IP task from running until after multiple writes have been done to that particular socket, the TCP/IP task will combine these writes into a single IP packet (which is desirable, to reduce packet overhead).

ie the TCP/IP task will send packets as fast as it can, but only if it's able to run.

The other possibility is that if an ACK is lost or delayed, the LWIP stack will start queueing up packets to be sent after the un-acked packet in the stream. So this may cause some combining of data.

I wrote a quick bit of test code and I actually was unable to make LWIP combine any writes at all, with or without tcp_nagle_disable() - all packets had 6 byte payloads. I put this down to a fast network, but mostly due to nothing else being active on the ESP32 when the task is running.

The best thing you can do is to lower the priority of other task(s) you are running in the system (and reduce the frequency of any interrupts, if you can), to give the TCP/IP task the maximum possibility of running.

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

Re: How to use TCP_NODELAY with netconn to disable Nagle?

Postby jcsbanks » Mon Jul 23, 2018 9:14 pm

Thanks! Lots of great points there I will work on.

I am getting multiple websockets added to one TCP frame yes.

I think I am having this problem due to Windows delayed ACK but need to prove this. If this is so, I need to think of ways around that since changing it on Windows in an application when I do not want users to make registry changes for their whole WiFi interface. Nagle interaction with delayed ACK was the initial thought hence trying to disable Nagle. If not Nagle since it seems I disabled it, perhaps there are some transmit or receive windows I can alter so that more than 1 frame can be in flight?

In the 40ms example I was sending about 600 websockets a second but often 20 or more were in a TCP packet. In another example, if I send a websocket from ESP32 to PC and back again, I could actually see 600 TCP frames per second. The reply from the PC removes the delayed ACK problem.

I have tried 1000Hz tick rate and higher task priorities but think the delayed ACK from Windows is the rate limiting clock here.

I am happy to bundle small websockets but don't always want the latency as the project is a WiFi to CAN gateway and some protocols can have as low as 2ms round trip time.

I will test your great example and report back.

jcsbanks
Posts: 305
Joined: Tue Mar 28, 2017 8:03 pm

Re: How to use TCP_NODELAY with netconn to disable Nagle?

Postby jcsbanks » Tue Jul 24, 2018 12:18 pm

ESP_Angus, thanks so much, your example was useful to learn how to do a minimal example with TCP sending.

With ESP32 as AP, using iperf config settings...

netconn_write "OHAI!\n" at 1000Hz:

Nagle disabled: 1000 TCP sent packets per second with every other one being ACK'd by Windows very fast (so 500 TCP received packets per second) :) This is as fast as USB on 1ms cycle :)

Nagle enabled: 25 TCP packets per second with each one ACK'd by Windows after 40ms. The throughput is no problem because as the volume of data exceeds one packet then two packets are send and Windows ACKs immediately. But for latency of sending frequently, either Nagle or delayed ACK must be disabled. https://support.microsoft.com/en-us/hel ... by-using-a loks useful for avoiding delayed ACK.

Interestingly, with Nagle disabled or not, Windows will report some multiple "OHAI!\n" in one Socket receive even though Wireshark shows them individually.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: How to use TCP_NODELAY with netconn to disable Nagle?

Postby ESP_Angus » Tue Jul 24, 2018 11:11 pm

Hi jcsbanks,

Very glad that was useful for you and you made some useful progress.
jcsbanks wrote:This is as fast as USB on 1ms cycle :)
It's impressive you got this level of performance & low latency. One thing to keep in mind as you chase this very low latency, WiFi is always going to be latency prone sometimes - maybe some other device transmits heavily on the WiFi, or some other 2.4GHz radio transmits something, or there's some random RF noise from a natural or unnatural source, a microwave oven is running, etc, etc - there's no shortage of things which will introduce random delays when frames are lost and re-sent at the WiFi level (WiFi has a whole system of acks, retries & reliable delivery underneath the IP layer, invisible to application level programs).

There's nothing much you can do at that level. Making sure you have a strong WiFi signal from the AP, and minimising interference from other devices will help. Switching to UDP may also help a bit, if that's an option.
jcsbanks wrote: Interestingly, with Nagle disabled or not, Windows will report some multiple "OHAI!\n" in one Socket receive even though Wireshark shows them individually.
Windows scheduler time slice is (from memory) 25ms, so Windows will sometimes be switching away from the program reading the socket and spending time in the OS layer or some other task. If packets are received at this time, they'll all be aggregated when the process returns its socket read.

Who is online

Users browsing this forum: ESPBoards and 117 guests