httpd_queue_work() - memory leak when client fails

Volvox
Posts: 21
Joined: Sat Mar 18, 2017 7:54 pm

httpd_queue_work() - memory leak when client fails

Postby Volvox » Wed May 10, 2023 9:13 am

Hello,
We have an ethernet device based on ESP32. So no Wifi, but I do not think it is related to Ethernet or Wifi frontend.
Using ADF2.5 (and IDF4.4.4)
I am struggling with a problem concerning open WebSocket connections.
We have a listener that holds open Websocket connections to control the ESP via a website.
So far everything is working, but:
If we have an open Websocket connection from a laptop/computer to the ESP and then plugging the network cable off the laptop, then instantly the heap decreases until empty :)
I tried to figured out where this is happening, and it seems to be a problem that the TCP connection is still opened, but the remote client is not present any longer.
Attached the piece of my code where the problem comes up:

Code: Select all

static void ws_async_send(void *arg)
{
    struct async_resp_arg *resp_arg = (async_resp_arg_t*)arg;
    httpd_handle_t hd = resp_arg->hd;
    int fd = resp_arg->fd;
    httpd_ws_frame_t ws_pkt;
    memset(&ws_pkt, 0, sizeof(httpd_ws_frame_t));
    if(resp_arg != NULL) {
    	ws_pkt.payload = (uint8_t*)resp_arg->message;
    	ws_pkt.len = strlen(resp_arg->message);
    }
    ws_pkt.type = HTTPD_WS_TYPE_TEXT;
    httpd_ws_send_frame_async(hd, fd, &ws_pkt);
    free(resp_arg->message);
    free(resp_arg);
}

void _ws_server_send_messages(const char* mymessage, httpd_req_t* req)
{
	if (!_is_websocket_api_init) { // httpd might not have been created by now
		ESP_LOGD(TAG, "Websocket not yet initialized");
		return;
    }
	if(req == NULL) { //broadcast to all clients
		size_t clients = WS_MAX_CLIENTS;
		int    client_fds[WS_MAX_CLIENTS];
		if (httpd_get_client_list(wsServer, &clients, client_fds) == ESP_OK) {
			for (size_t i=0; i < clients; ++i) {
				int sock = client_fds[i];
				if (httpd_ws_get_fd_info(wsServer, sock) == HTTPD_WS_CLIENT_WEBSOCKET) {
					//ESP_LOGE(TAG, "Active client (fd=%d) -> sending async message", sock);
					struct async_resp_arg *resp_arg =  (async_resp_arg_t*)malloc(sizeof(struct async_resp_arg));
					resp_arg->hd = wsServer;
					resp_arg->fd = sock;
					if(mymessage != NULL)
						resp_arg->message = strdup(mymessage);
					if (httpd_queue_work(resp_arg->hd, ws_async_send, resp_arg) != ESP_OK) {
						free(resp_arg->message);
						free(resp_arg);
						ESP_LOGE(TAG, "httpd_queue_work failed!");
						break;
					}
				}
				else
					ESP_LOGV(TAG, "Client (fd=%d) no WS:%d", sock, httpd_ws_get_fd_info(wsServer, sock));
			}
		} else {
			ESP_LOGE(TAG, "httpd_get_client_list failed!");
			return;
		}
	} else { //send to specific Client
		struct async_resp_arg *resp_arg = (async_resp_arg_t*)malloc(sizeof(struct async_resp_arg));
		resp_arg->hd = req->handle;
		resp_arg->fd = httpd_req_to_sockfd(req);
		if(mymessage != NULL)
			resp_arg->message = strdup(mymessage);
		if (httpd_queue_work(req->handle, ws_async_send, resp_arg) != ESP_OK) {
			free(resp_arg->message);
			free(resp_arg);
			ESP_LOGE(TAG, "httpd_queue_work failed!");
			return;
		}
	}
}
So if the remote (laptop) is disconnected, the callback "ws_async_send " of the httpd_queue_work is not called any more, but httpd_queue_work does return "ESP_OK". Then our messages are occupying memory more and more until no more memory left, because this socket will never be closed or freed or something, it just stocks in this state being a memory hole every time when calling the httpd_queue_work() on the bad socket :)

What I tried so far:
- Setting up the SO_LINGER, that the socket will be forced to close. -> Better because the socket is freed after some time and not blocked, but the free heap decreases too fast to wait for this, because we are sending lots of messages to the client when connected.
- Build a FreeRTOS Queue around our "need to send" packets to have an upper limit of memory consumption, even if the callback is not called any more, then the queue is full and drops needed packets and clears on the next try when Websocket is recovered -> helps me to ensure the rest of the ESP having enough space to run properly, but it is only a bad workaround for the Websocket API itself.

Do you have an idea what is happening there and how to avoid this state that the callback is not called anymore, how to do a better memory management when using the httpd_queue_work() function, or is this a bug somewhere inside IDF?

If you need further information I can share, I tried to reduce the posted code to a minimum.
Thanks a lot if someone has an idea.

MicroController
Posts: 1725
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: httpd_queue_work() - memory leak when client fails

Postby MicroController » Wed May 10, 2023 1:55 pm

What's the (last) return value from httpd_ws_send_frame_async when the connection is broken?

Volvox
Posts: 21
Joined: Sat Mar 18, 2017 7:54 pm

Re: httpd_queue_work() - memory leak when client fails

Postby Volvox » Wed May 10, 2023 2:14 pm

The problem ist that this function (httpd_ws_send_frame_async) is never called any longer because it is inside the callback function ws_async_send() that is never called any longer once the connection is broken

MicroController
Posts: 1725
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: httpd_queue_work() - memory leak when client fails

Postby MicroController » Wed May 10, 2023 5:31 pm

Volvox wrote:
Wed May 10, 2023 2:14 pm
The problem ist that this function (httpd_ws_send_frame_async) is never called any longer because it is inside the callback function ws_async_send() that is never called any longer once the connection is broken
Actually, the problem is that you keep allocating memory and trying to send more data irrespective of whether or not the data can be received and acknowledged by the other side fast enough or at all.

As you are aware, in face of non-infinite buffer space/memory one valid solution is to limit the amount of messages/data pending transmission, and decide what to do if/when this threshold is reached.

I still suggest to first try and detect a broken connection by checking return values or the state of the socket before pushing more data into the buffers.

Volvox
Posts: 21
Joined: Sat Mar 18, 2017 7:54 pm

Re: httpd_queue_work() - memory leak when client fails

Postby Volvox » Wed May 24, 2023 12:24 pm

Yes, thanks, I try to optimize this workaround.
What I do not understand is, that the httpd_queue_work() function is nearly always returning ESP_OK even if there is no remote available any more. My assumption was that the callback will be called somewhen even if the remote is lost on this socket.(that is where I freed the memory at the moment). Otherwise I have no possibility to access the memory any more I sent to this httpd_queue_work(). And I am never sure if a special packet sent to the httpd_queue_work() is really sent or processed. It is more a "should be sent".

I think it is a general problem to detect when a tcp connection is broken, and then trying to free failed prepared messages in a queue or buffer or whatever to ensure no memory leak. I think it would be great if there can be implemented some "callback on failure" as well because otherwise every message allocated needs to be stored in a separate queue or saving the pointers until it is really sent by the async callback to free it afterwards.

MicroController
Posts: 1725
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: httpd_queue_work() - memory leak when client fails

Postby MicroController » Wed May 24, 2023 8:35 pm

the httpd_queue_work() function is nearly always returning ESP_OK even if there is no remote available
This is as expected. The "work queue" is not correlated to any connection or the like, and does neither know nor care if the "work" you enqueue operates on a connection or does something completely different.

As you say that your callback ceases to be executed, I imagine that in case of a broken connection httpd_ws_send_frame_async may block until a timeout notices the break. This would in turn block the processing of the work queue, i.e. subsequent callbacks.
Again, the solution would be to limit either the amount of messages/data "in flight" or the time a queued message is allowed to remain "unsent". If you limit the max. pending messages to a constant number you can potentially use a ring buffer to ping-pong messages and available message buffers between the producing and the consuming tasks. The edge case is to only allow a single message to be pending, in which case the ring buffer basically degenerates to a single binary flag.

Who is online

Users browsing this forum: Bing [Bot] and 200 guests