Inconsistent WiFi TCP throughput (1.6→0.48 MBit/s)

PanicanWhyasker
Posts: 45
Joined: Sun Jan 06, 2019 12:42 pm

Inconsistent WiFi TCP throughput (1.6→0.48 MBit/s)

Postby PanicanWhyasker » Thu Sep 23, 2021 10:13 am

Hi,
I'm having issues with slow and inconsistent transfer speed, transmitting from an ESP32-WROVER-32D to a PC through a WiFi AP. The changes in speed are somewhat random, and while initially it starts as not-so-bad (1.6-2MBit or 200-250kbytes/sec), it works like this for a minute or two, then drops down to ~0.5 MBit (actually varying 320 to 480 kbit).

The ESP32-WROVER-32D is actually part of a Sparkfun ESP32 Thing Plus board (with a PCB strip antenna). While the old versions of ESP32 Thing were bad, this one isn't: I ran the IPerf example, and it achieves 5-10 MBit/s in the scenario I'm interested (ESP->PC, TCP transfer, through a home AP, which is two concrete walls away from the ESP32). If I move the module in the same room as the AP, it gets to 25 MBit/s. The ESP32 Thing plus is connected to another device, and an SD card:

Code: Select all

+--------------+                    +-------------+
|    Other     |                    | ESP32 Thing |                         +------+
| ESP32-based  |<------UART-------->|    Plus     |- - - - WiFi TCP - - - ->|  AP  |----->PC Server
|   device     |                    |(problematic)|                         +------+
+--------------+                    +-------------+
                                      ^
                                      |
+--------------+                      | 
|    SD Card   |<--------SPI----------+
+--------------+
The other ESP32-based device also uses some WiFi, but very minimal (just a few small UDP packets per second).

My test code reads a large file from the SD card, and sends it to the PC over a plain TCP socket.
Example transfer progress from the receiving app on the PC:

Code: Select all

size:  23721272
00:05;   21 MB left;   223 kbytes/s
00:10;   20 MB left;   171 kbytes/s
00:15;   19 MB left;   152 kbytes/s
00:20;   19 MB left;   122 kbytes/s
00:25;   18 MB left;   148 kbytes/s
00:30;   17 MB left;   185 kbytes/s
00:35;   17 MB left;   121 kbytes/s
00:40;   16 MB left;   182 kbytes/s
00:45;   15 MB left;   159 kbytes/s
00:50;   14 MB left;   195 kbytes/s
00:55;   13 MB left;   197 kbytes/s
01:00;   12 MB left;   243 kbytes/s
01:05;   11 MB left;   254 kbytes/s
01:10;   10 MB left;   202 kbytes/s
01:15;    9 MB left;   173 kbytes/s
01:20;    8 MB left;   179 kbytes/s
01:25;    7 MB left;   135 kbytes/s
01:30;    7 MB left;    93 kbytes/s
01:35;    6 MB left;    90 kbytes/s
01:40;    6 MB left;   105 kbytes/s
01:45;    5 MB left;   130 kbytes/s
01:50;    4 MB left;   178 kbytes/s
01:55;    4 MB left;    64 kbytes/s
02:00;    4 MB left;    45 kbytes/s
02:05;    3 MB left;    47 kbytes/s
02:10;    3 MB left;    46 kbytes/s
02:15;    3 MB left;    49 kbytes/s
02:20;    3 MB left;    50 kbytes/s
02:25;    2 MB left;    50 kbytes/s
02:30;    2 MB left;    50 kbytes/s
02:36;    2 MB left;    43 kbytes/s
02:41;    2 MB left;    51 kbytes/s
02:46;    2 MB left;    50 kbytes/s
02:51;    1 MB left;    52 kbytes/s
02:56;    1 MB left;    48 kbytes/s
03:01;    1 MB left;    47 kbytes/s
03:06;    1 MB left;    52 kbytes/s
03:11;    0 MB left;    49 kbytes/s
03:16;    0 MB left;    53 kbytes/s
03:21;    0 MB left;    46 kbytes/s
03:26;    0 MB left;    50 kbytes/s
receiving finished
As you can see, the transfer speed decreases 4x around the 2-minute mark.

The test PC app is a very simple python script that uses blocking TCP socket to receive the data in full.
The real PC app is written in C++, with non-blocking sockets and transfers the file in parts.
The transfer speeds between the test and real app are perfectly comparable, thus I'm inclined to think the problem is in the ESP-side code.

On the ESP-side, for the best performance, I've copied most if not all of the sdkconfig tweaks from the IPerf example.
I cannot share all of the ESP sending code, but the procedure is roughly like this:

Code: Select all

        uint32_t startProc = xthal_get_ccount();
        uint32_t maxCyclesToSpend = clockCyclesPerMicrosecond() * 5000; // 5ms
        uint32_t x = startProc, y;
        
        do {
          procFileRead();
          y = xthal_get_ccount();
          m_timeInFread += y - x;
          procFileSend();
          x = xthal_get_ccount();
          m_timeInSend += x - y;
        } while (!m_fileError && xthal_get_ccount() - startProc < maxCyclesToSpend);
It tries, for 5ms, to send data via non-blocking TCP socket, and if it succeeds any, it refills the buffer by reading more data from the file.
The code only works for 5ms, because I want to serve other tasks, in this case commands from the other ESP-device via the UART. But these take very little time, e.g. less than a millisecond, and then it get again into the loop above, spending another 5ms there, and so on.
I have some code that prints m_timeInFread and m_timeInSend, and I can see that almost all the time is spent in the code above, mostly in the procFileSend() part.
The procFileSend calls the socket send() function repeatedly, which mostly answers with EAGAIN (tx buffer full I guess).

I believe this might be written in a better way (e.g. the UART handling is in a separate task, so it can answer quickly, while the TCP send uses blocking calls). I'm not sure whether that would help. Actually, the top question I want to understand is why the transfer starts out not-so-bad (200-250 kbytes/s) and then falls abruptly, as the environment has not changed at all in the meantime. This looks like a physical-layer issue to me, and I don't know how to delve into this further, e.g.:

- any settings in sdkconfig that could help avoid this throughput drop?
- is there any way to get more info about what the physical layer is doing?
- how to restore the transfer speed back to 200-250 kbps? (as it seems to be always stuck at 0.32-0.48 Mbit/s once it gets there, and this gets resolved only after a reset.)

Any ideas are greatly appreciated!

EDIT: I forgot to add, I've analyzed the TCP stream with Wireshark on the PC end, and it seems good - nothing suspicious wrt TCP window size, packet size, or dropped packets.

PanicanWhyasker
Posts: 45
Joined: Sun Jan 06, 2019 12:42 pm

Re: Inconsistent WiFi TCP throughput (1.6→0.48 MBit/s)

Postby PanicanWhyasker » Thu Sep 30, 2021 7:01 pm

Any ideas? At least why the transfer speed dropped without any interaction from my side?

Who is online

Users browsing this forum: No registered users and 32 guests