[FIXED] High bandwidth TCP stops sending ACKs
Posted: Mon Mar 26, 2018 2:49 am
I was partway through writing this post when I decided to retest on the most recent master from github, and that seems to have fixed it. (Then again, the problem comes and goes, so maybe not...)
But in case somebody else is having this problem, or the sample code is otherwise useful, here's the original problem:
My HTTP client stops working after receiving several kilobytes. If I rate-limit the receive severely, it's ok, but slow enough to be unusable.
To reproduce, I took the sample HTTP client from examples/protocols/http_request, commented out the line "putchar(recv_buf);" (the UART rate-limits it enough to prevent the problem), and set the server to a test file I put on S3:
I have a slightly more elaborate repro at https://gist.github.com/piquan/6a9cdebb ... ddb76d44d1 that also demonstrates the problem, with some extra features to make it easier to see. That repro is based on the sample code, but with several new features (most of which are controlled by #ifdefs near the top):
--- Note: This is the point where I was at when I decided to pull the latest master (mine was from Mar 2, about three weeks old). Since that fixed the problem, I didn't finish this post.
I'll add that the symptom I saw with tcpdump was that the ESP32 wasn't sending ACKs anymore, and so the server stopped transmitting. The server would resend data packets, and the ESP32 wouldn't ACK them. If I sent new data from the ESP32 while it was in this mode, the data wouldn't be sent (even though the peer's TCP window had lots of space).
If somebody knows what fixed it, I'd appreciate knowing. I was pulling my hair out for weeks thinking it was a problem in my application!
But in case somebody else is having this problem, or the sample code is otherwise useful, here's the original problem:
My HTTP client stops working after receiving several kilobytes. If I rate-limit the receive severely, it's ok, but slow enough to be unusable.
To reproduce, I took the sample HTTP client from examples/protocols/http_request, commented out the line "putchar(recv_buf);" (the UART rate-limits it enough to prevent the problem), and set the server to a test file I put on S3:
Code: Select all
#define WEB_SERVER "s3-us-west-2.amazonaws.com"
#define WEB_PORT 80
#define WEB_URL "/13c554ee-e4f7-4932-9183-0684b27f9bf8/testfile"
- Uses a larger receive buffer (the sample code is a 64-byte buffer)
- Displays HTTP headers
- Displays the MD5 hash of the HTTP body (Note that for files served from S3, like mine, the Etag: is the MD5. This is not true on most other web servers).
- Displays progress as dots, with a light enough UART load to still demonstrate the problem, and be able to visualize when it occurs
- Retries the read on timeouts (shows "_" on the status)
- (Disabled by default) Blinks an LED on GPIO 5 as it reads data
- (Disabled by default) Rate-limiting: delays a certain number of ticks after each read, based on the number of bytes read
--- Note: This is the point where I was at when I decided to pull the latest master (mine was from Mar 2, about three weeks old). Since that fixed the problem, I didn't finish this post.
I'll add that the symptom I saw with tcpdump was that the ESP32 wasn't sending ACKs anymore, and so the server stopped transmitting. The server would resend data packets, and the ESP32 wouldn't ACK them. If I sent new data from the ESP32 while it was in this mode, the data wouldn't be sent (even though the peer's TCP window had lots of space).
If somebody knows what fixed it, I'd appreciate knowing. I was pulling my hair out for weeks thinking it was a problem in my application!