lwip blocking send problem

gabriel.c
Posts: 2
Joined: Tue Aug 29, 2017 8:29 am

lwip blocking send problem

Postby gabriel.c » Tue Aug 29, 2017 9:26 am

Hi all,

I've searched this forum and quite a lot on google as well, but I can't find an explanation.
I am working on a project which uses the ESP32 as a TCP socket client which only sends data to a TCP server (no receiving, just sending). I am using the lwIP stack with standard UNIX sockets API (connect, send, recv, etc), and I see a strange (to me, at least) phenomenon happening: the blocking send() function that I am using blocks for a duration of time which is dependent on the frequency at which I call the function!

I am using the ESP32 dev kit with the ESP-WROOM-32 module running a slightly modified version of the 'tcp_perf' example as a client, while for the TCP server I use netcat (started with 'nc -l 1985'). I use the esp-idf release v.2.1 for development.

My scenario is: call send() with ~28kB of data to send, then sleep for an amount of milliseconds. I have made some tests, with the server being on my local network and on an Amazon EC2 VM. Here are my results:
- local network: 100 ms sleep: send takes ~27 ms
500 ms sleep: send takes ~110 ms
1000 ms sleep: send takes ~120 ms
- Amazon EC2 VM: 100 ms sleep: send takes ~900 ms
500 ms sleep: send takes ~810 ms
1000 ms sleep: send takes ~820 ms
My connection to the EC2 VM is not the limiting factor, because I can wget to/from it with a rate of about 8 Mbit/s.

So my question is, why is this happening? What causes send() to block for longer amounts of time when calling more rarely?
Is this a known lwIP issue?
I am using the tcp_perf example with the default configuration.
Here is the code:

Code: Select all

char databuff[19*EXAMPLE_DEFAULT_PKTSIZE];
...
//send data
void send_data(void *pvParameters)
{
    int len = 0;
    
    memset(databuff, EXAMPLE_PACK_BYTE_IS, EXAMPLE_DEFAULT_PKTSIZE);
    vTaskDelay(100/portTICK_RATE_MS);
    ESP_LOGI(TAG, "start sending...");
#if EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO
    //delaytime
    struct timeval tv_start;
    struct timeval tv_finish;
    unsigned long send_delay_ms;
#endif /*EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO*/
    while(1) {

#if EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO
    	total_pack++;
    	gettimeofday(&tv_start, NULL);
#endif /*EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO*/
    	
	//send function
    gpio_set_level(14, 0);
    len = send(connect_socket, databuff, sizeof(databuff), 0);
    gpio_set_level(14, 1);
    vTaskDelay(100/portTICK_RATE_MS);
#if EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO
    	gettimeofday(&tv_finish, NULL);
#endif /*EXAMPLE_ESP_TCP_PERF_TX && EXAMPLE_ESP_TCP_DELAY_INFO*/
I have tried to increase the TCP_WND size to 4*TCP_MSS, but the results are the same.
I have also written a small C application which runs on Linux, based on the tcp_perf example, to check the behavior of the TCP/IP stack on a PC, but I don't get the variation of send() duration as I see on the ESP32. Also, I get from 60 to 90 ms send() duration when sending data to Amazon EC2.
Here is the PC code (the entire C file).

Code: Select all

/* tcp_perf Example

   This example code is in the Public Domain (or CC0 licensed, at your option.)

   Unless required by applicable law or agreed to in writing, this
   software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
   CONDITIONS OF ANY KIND, either express or implied.
*/
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#include <sys/time.h>

#define EXAMPLE_DEFAULT_SERVER_IP "52.53.111.152"
#define EXAMPLE_DEFAULT_PORT    (1985)
#define EXAMPLE_PACK_BYTE_IS 97 //'a'
#define EXAMPLE_DEFAULT_PKTSIZE (1460)

/*socket*/
static struct sockaddr_in server_addr;
static int connect_socket = 0;

int total_data = 0;



char databuff[19*EXAMPLE_DEFAULT_PKTSIZE];

struct timeval timeBefore, timeAfter;

//send data
void send_data(void *pvParameters)
{
    memset(databuff, EXAMPLE_PACK_BYTE_IS, EXAMPLE_DEFAULT_PKTSIZE);

    while(1) {
    
    (void)gettimeofday(&timeBefore, NULL);
	//send function
    (void)send(connect_socket, databuff, sizeof(databuff), 0);
    (void)gettimeofday(&timeAfter, NULL);
    long int elapsedTimeMs = (timeAfter.tv_sec * 1000000) + timeAfter.tv_usec - ((timeBefore.tv_sec * 1000000) + timeBefore.tv_usec);
    printf("-- Elapsed time: %ldms\n", elapsedTimeMs);
    usleep(100*1000);
    }
}


//use this esp32 as a tcp client. return ESP_OK:success ESP_FAIL:error
void create_tcp_client()
{
    
    connect_socket = socket(AF_INET, SOCK_STREAM, 0);
    if (connect_socket < 0) {
    	printf("could not create client socket!\n");
	    return;
    }
    printf("socket = %d\n", connect_socket);
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(EXAMPLE_DEFAULT_PORT);
    server_addr.sin_addr.s_addr = inet_addr(EXAMPLE_DEFAULT_SERVER_IP);
    
    if (connect(connect_socket, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0) {
    	printf("could not connect to server socket!\n");
    }
}


//wifi_init_softap
int main(void)
{
    create_tcp_client();
    send_data(NULL);
    return 0;
}

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: lwip blocking send problem

Postby ESP_Angus » Wed Aug 30, 2017 1:18 am

It's hard to tell without taking a packet capture, but here are some factors to consider:
  • On your local network, the behaviour is strange but I suspect this has something to do with sleep modes - either the AP is not listening to the ESP32, the server OS has task switched away from the client program or put its own WiFi interface to sleep, or the ESP32 has gone into modem sleep mode (if this is enabled). Or some combination of these. But like I said I'm not certain, the difference is quite high!
  • A small amount of packet loss can cause a large delay in sending TCP data.
  • Linux can afford to buffer the full 28kB of data in an OS-level socket buffer, which is why send() is returning almost immediately on the desktop computer. LWIP on the ESP32 only buffers 5744 bytes per socket by default, so send() will be waiting until most of the data has already been sent out over the wire and can be discarded (this means waiting for ACKs, also).
  • To work around this, you can increase the default send buffer size in LWIP TCP settings to >28kB. But bear in mind this would increase the RAM usage of every TCP socket in the system by this amount. You can also set a per-socket sendbuffer size with setsockopt(s, TCP_SNDBUF, ...).
  • This buffering is probably also why calling send() rapidly in the EC2 example slows down a small amount, there is still some TCP data remaining to be flushed the next time around.
  • It's possible the ESP32 is applying Nagle's Algorithm to the last piece of data in the send(). This shouldn't effect the time for that call to send(), but it will effect the subsequent call (as buffered data which could have been flushed after the last call isn't flushed yet.) To work around this use setsockopt() to enable TCP_NODELAY.
These are all educated guesses, and may not fully explain the behaviour you are seeing.

The best thing to do next is to take packet captures on the server computers (using wireshark or tcpdump) and look at the comparative packet timing, windowing, etc.

gabriel.c
Posts: 2
Joined: Tue Aug 29, 2017 8:29 am

Re: lwip blocking send problem

Postby gabriel.c » Wed Aug 30, 2017 8:18 am

Hi ESP_Angus,

Thank you very much for your answer!
Indeed, I also thought about the fact that the entire send buffer would fit inside the TCP window on the Ubuntu PC, but not on the ESP, this could cause the long delays that I am seeing (most probably). I did not attempt to increase the TCP_WND on the ESP yet, because in my project I am unfortunately very short on memory :( But I will try it on the tcp_perf example and come back with the results.

Regarding the WiFi sleep mode on the ESP32, I did not activate it in the tcp_perf example. I did activate it however in my project, but I did not see any difference in timings, the delays are the same in both cases...

I will look into this in more depth then, although I was hoping I would be a simple tweak/fix that I am not aware of.

Thanks again!

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: lwip blocking send problem

Postby ESP_Angus » Thu Aug 31, 2017 12:04 am

gabriel.c wrote: Regarding the WiFi sleep mode on the ESP32, I did not activate it in the tcp_perf example. I did activate it however in my project, but I did not see any difference in timings, the delays are the same in both cases...
OK. Consider the possibility that a power saving mode be enabled on the computer you are using on the local network, as well. Depending on OS/driver, it can be hard to know if this is the case. Running some other WiFi traffic (like streaming some low bandwidth audio or video) on the "server" at the same time you test is probably the easiest way to rule it out.

Who is online

Users browsing this forum: Google [Bot], MicroController and 90 guests