Page 1 of 1
Random ERRNO 9 (EBADF) on recv() or send()
Posted: Tue Jul 14, 2020 9:03 am
by sharkx
The context
Several nodes running as APs connecting to each other in a tree-like network; all nodes run the same code.
There's a pretty basic TCP server that listens for small packets of data; the server is running a 'classic' soccket() -> bind() -> listen() -> select() - accept() -> recv() scenario, using the recv() in a separate task; the socket file descriptor is released immediately after the recv completes.
The server processes an average of 2 packets per second. When a packet is received, it is relayed to some of the stations that are connected to the module, or to the 'parent' node depending on data.
The problem
Every now and then, the recv() or send() calls return error code 9, EBADF, and I can't seem to figure out why ...
Sometimes everything goes well for minutes, then there's a 'burst' of EBADFs; sometimes there's one EBADF every minute ort so... there doesn't seem to be a particular pattern.
Any help about debugging this issue is appeciated.
Thanks.
Re: Random ERRNO 9 (EBADF) on recv() or send()
Posted: Wed Jul 15, 2020 2:43 pm
by ESP_Sprite
Are you sure you're handling your file descriptors correctly? No race conditions there?
Re: Random ERRNO 9 (EBADF) on recv() or send()
Posted: Thu Jul 16, 2020 5:01 am
by sharkx
Well... That's part of the problem; there none that I could find. Stripping all error handling and other minor stuff, here's how it's done (in a simplified, c-like language)
Code: Select all
function serverTask() {
server_socket = socket();
bind(server_socket);
listen(server_socket);
while (1) {
if (select(server_socket) >= 0) {
server_accept = accept(server_socket);
createTask(recvTask, server_accept);
}
}
}
function recvTask(server_accept) {
if (recv(server_accept) > 0) {
process(received_data);
}
closeSocket(server_accept)
}
It's pretty basic. This is how it's supposed to be done, right ?
There is some synchronization done in the processing section, but it does not involve the socket; besides, the socket fd is sent to the task and should become available for another operation after the close occurrs in the recvTask.
Re: Random ERRNO 9 (EBADF) on recv() or send()
Posted: Thu Oct 24, 2024 1:03 pm
by FrancoisB-HEX
Did you ever get a grip on this issue?
I am seeing something similar with an AP setup on ESP32-S3 using ESP-IDF v5.3.1. I use the BSD sockets api to setup a TCP server:
- assign server socket file descriptor (serverFD) using bind() and then listen() with O_NONBLOCK flag set
- periodically poll file descriptors using select() with non-blocking timeout (0)
- if serverFD is set for reading - attempt to accept() a client connection as client socket file descriptor clientFD. If clientFD == -1 - close serverFD, otherwise assign clientFD
- if clientFD is set for reading - attempt to read data from clientFD using recv(). If zero bytes read - client closed the connection so close clientFD, if less than zero bytes read (and errno != EAGAIN/EINPROGRESS/EWOULDBLOCK) - connection error so close clientFD, if bytesRead - TCP data received from client - handle and queue responseBuffer
- if clientFD is set for writing and responseBuffer has data - attempt to transmit data to clientFD using send(). This is where I intermittently get errno == 9 (EBADF), which does not make sense because I have just received data on that FD and it has not closed yet
With the exact same sockets code using a STA Wifi connection, I DO NOT have the same issue.
Re: Random ERRNO 9 (EBADF) on recv() or send()
Posted: Fri Oct 25, 2024 9:54 am
by FrancoisB-HEX
FrancoisB-HEX wrote: ↑Thu Oct 24, 2024 1:03 pm
Did you ever get a grip on this issue?
I am seeing something similar with an AP setup on ESP32-S3 using ESP-IDF v5.3.1. I use the berkley sockets api to setup a TCP server:
- assign server socket file descriptor (
serverFD) using
bind() and then
listen() with
O_NONBLOCK flag set
- periodically poll file descriptors using
select() with non-blocking timeout (0)
- if
serverFD is set for reading - attempt to
accept() a client connection as client socket file descriptor
clientFD. If
clientFD == -1 - close
serverFD, otherwise assign
clientFD
- if
clientFD is set for reading - attempt to read data from
clientFD using
recv(). If zero bytes read - client closed the connection so close
clientFD, if less than zero bytes read (and
errno != EAGAIN/EINPROGRESS/EWOULDBLOCK) - connection error so close
clientFD, if bytesRead - TCP data received from client - handle and queue responseBuffer
- if
clientFD is set for writing and responseBuffer has data - attempt to transmit data to
clientFD using
send().
This is where I intermittently get errno == 9 (EBADF), which does not make sense because I have just received data on that FD and it has not closed yet
With the exact same sockets code using a STA Wifi connection, I DO NOT have the same issue.
Here is the LWIP Debug logs when this happens:
Code: Select all
D (174239) lwip: lwip_recvfrom(56): addr=
D (174239) lwip: 192.168.4.2
D (174240) lwip: port=51689 len=2
D (174284) lwip: lwip_select(57, 0x3fca8310, 0x3fca8318, 0x0, tvsec=0 tvusec=0)
D (174285) lwip: lwip_selscan: fd=56 ready for reading
D (174285) lwip: ip_input: iphdr->dest 0xa493214 netif->ip_addr 0x104a8c0 (0x493214, 0x4a8c0, 0xa000000)
D (174286) lwip: ip_input: iphdr->dest 0xa493214 netif->ip_addr 0x100007f (0x14, 0x7f, 0xa493200)
D (174286) lwip: ip4_input: packet not for us.
D (174287) lwip: lwip_selscan: fd=56 ready for writing
D (174287) lwip: lwip_select: nready=2
D (174287) lwip: lwip_recvfrom(56, 0x3fcb7084, 1440, 0x1, ..)
D (174288) lwip: lwip_recv_tcp: top while sock->lastdata=0x0
D (174288) lwip: lwip_recv_tcp: netconn_recv err=-15, pbuf=0x0
D (174289) lwip: lwip_recv_tcp: p == NULL, error is "Connection closed."!
D (174289) lwip: lwip_recvfrom(56): addr=
D (174289) lwip: 192.168.4.2
D (174289) lwip: port=51689 len=0
D (174290) lwip: lwip_close(56)
D (174290) lwip: tcp_close: closing in
D (174290) lwip: State: CLOSE_WAIT
D (174290) lwip: ip4_output_if: ap2
D (174291) lwip: IP header:
D (174291) lwip: +-------------------------------+
D (174291) lwip: | 4 | 5 | 0x00 | 68 | (v, hl, tos, len)
D (174291) lwip: +-------------------------------+
D (174292) lwip: | 179 |000| 0 | (id, flags, offset)
D (174292) lwip: +-------------------------------+
D (174292) lwip: | 64 | 6 | 0xf0ad | (ttl, proto, chksum)
D (174292) lwip: +-------------------------------+
D (174292) lwip: | 192 | 168 | 4 | 1 | (src)
D (174293) lwip: +-------------------------------+
D (174293) lwip: | 192 | 168 | 4 | 2 | (dest)
D (174293) lwip: +-------------------------------+
D (174293) lwip: ip4_output_if: call netif->output()
D (174295) lwip: lwip_send(56, data=0x3fcf21e4, size=4, flags=0x0)
W (174295) ./main/hsm/wifi/tcp-server-hsm.c: tcpServerCb_ClientTransmit: Error sending to client 56 on instance 1: 9
Re: Random ERRNO 9 (EBADF) on recv() or send()
Posted: Mon Oct 28, 2024 7:44 am
by FrancoisB-HEX
FrancoisB-HEX wrote: ↑Fri Oct 25, 2024 9:54 am
FrancoisB-HEX wrote: ↑Thu Oct 24, 2024 1:03 pm
Did you ever get a grip on this issue?
I am seeing something similar with an AP setup on ESP32-S3 using ESP-IDF v5.3.1. I use the berkley sockets api to setup a TCP server:
- assign server socket file descriptor (
serverFD) using
bind() and then
listen() with
O_NONBLOCK flag set
- periodically poll file descriptors using
select() with non-blocking timeout (0)
- if
serverFD is set for reading - attempt to
accept() a client connection as client socket file descriptor
clientFD. If
clientFD == -1 - close
serverFD, otherwise assign
clientFD
- if
clientFD is set for reading - attempt to read data from
clientFD using
recv(). If zero bytes read - client closed the connection so close
clientFD, if less than zero bytes read (and
errno != EAGAIN/EINPROGRESS/EWOULDBLOCK) - connection error so close
clientFD, if bytesRead - TCP data received from client - handle and queue responseBuffer
- if
clientFD is set for writing and responseBuffer has data - attempt to transmit data to
clientFD using
send().
This is where I intermittently get errno == 9 (EBADF), which does not make sense because I have just received data on that FD and it has not closed yet
With the exact same sockets code using a STA Wifi connection, I DO NOT have the same issue.
Here is the LWIP Debug logs when this happens:
Code: Select all
D (174239) lwip: lwip_recvfrom(56): addr=
D (174239) lwip: 192.168.4.2
D (174240) lwip: port=51689 len=2
D (174284) lwip: lwip_select(57, 0x3fca8310, 0x3fca8318, 0x0, tvsec=0 tvusec=0)
D (174285) lwip: lwip_selscan: fd=56 ready for reading
D (174285) lwip: ip_input: iphdr->dest 0xa493214 netif->ip_addr 0x104a8c0 (0x493214, 0x4a8c0, 0xa000000)
D (174286) lwip: ip_input: iphdr->dest 0xa493214 netif->ip_addr 0x100007f (0x14, 0x7f, 0xa493200)
D (174286) lwip: ip4_input: packet not for us.
D (174287) lwip: lwip_selscan: fd=56 ready for writing
D (174287) lwip: lwip_select: nready=2
D (174287) lwip: lwip_recvfrom(56, 0x3fcb7084, 1440, 0x1, ..)
D (174288) lwip: lwip_recv_tcp: top while sock->lastdata=0x0
D (174288) lwip: lwip_recv_tcp: netconn_recv err=-15, pbuf=0x0
D (174289) lwip: lwip_recv_tcp: p == NULL, error is "Connection closed."!
D (174289) lwip: lwip_recvfrom(56): addr=
D (174289) lwip: 192.168.4.2
D (174289) lwip: port=51689 len=0
D (174290) lwip: lwip_close(56)
D (174290) lwip: tcp_close: closing in
D (174290) lwip: State: CLOSE_WAIT
D (174290) lwip: ip4_output_if: ap2
D (174291) lwip: IP header:
D (174291) lwip: +-------------------------------+
D (174291) lwip: | 4 | 5 | 0x00 | 68 | (v, hl, tos, len)
D (174291) lwip: +-------------------------------+
D (174292) lwip: | 179 |000| 0 | (id, flags, offset)
D (174292) lwip: +-------------------------------+
D (174292) lwip: | 64 | 6 | 0xf0ad | (ttl, proto, chksum)
D (174292) lwip: +-------------------------------+
D (174292) lwip: | 192 | 168 | 4 | 1 | (src)
D (174293) lwip: +-------------------------------+
D (174293) lwip: | 192 | 168 | 4 | 2 | (dest)
D (174293) lwip: +-------------------------------+
D (174293) lwip: ip4_output_if: call netif->output()
D (174295) lwip: lwip_send(56, data=0x3fcf21e4, size=4, flags=0x0)
W (174295) ./main/hsm/wifi/tcp-server-hsm.c: tcpServerCb_ClientTransmit: Error sending to client 56 on instance 1: 9
OK so my issue was that the client socket was in fact just closed before calling
send() on its file descriptor. The sequence was:
- select(readFDs,writeFDs) -> readFD set for clientSocketFD, and writeFD set for clientSocketFD (and writeBuffer has data to be sent)
- read(clientSocketFD) with 0 bytes -> clientSocket closed and clientSocketFD set for dereferencing
- write(clientSocketFD) ->
EBADF since socket closed
My underlying issue is still that TCP data is getting adversely delayed when using Soft AP which causes the TCP client to close the connection. With STA WiFi this does not occur. I have also disabled Nagle's algorithm (with TCP_NODELAY) which improves the situation, but does not solve it for this scenario.