IP suddenly stops responding after several hours running

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

IP suddenly stops responding after several hours running

Postby mbratch » Mon Jun 21, 2021 12:16 am

I have two separate ESP32 Pico kits running a simple REST API server. The applications are identical in function. One is written with the Arduino framework. The other I ported to ESP-IDF.

I am using 'curl' on a Linux server in a script to generate REST API calls to each of the ESP32's every second just as an endurance test. I let these tests run simultaneously against each ESP32. I am sending requests to one ESP32 every second and the other every two seconds (half the rate).

The ESP32s respond fine to the REST API calls for a several hours (8 to 18) and then suddenly and simultaneously stop responding to any requests. Neither do they respond to 'ping'. I reset the ESP32s and they will start responding again with no changes to the client side (Linux test tool).

I finally did capture an error that appears on the console when the error occurs (IP stops responding):

Code: Select all

I (10605501) wifi:state: run -> init (ec0)
I (10605501) wifi:pm stop, total sleep time: 9582020957 us / 10604598223 us

I (10605501) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (10605511) httpd_txrx: httpd_sock_err: error in recv : 113
It would seem to me that there must be something happening on the network that both ESP32s are reacting to that is for some reason killing the IP link. I can start up both ESP32s at totally different times, but they fail simultaneously. Be it known that this network has both Windows and Linux machines on it.

Has anyone encountered this before? I've done a lot of searching and have found one or two similar problems, but they were occurring within 10 or 15 minutes. In that case the issue has something to do with NetBIOS. Since I have Windows clients on my network, some mysterious network activities may be occurring. ;)
Last edited by mbratch on Thu Jun 24, 2021 10:32 am, edited 7 times in total.

ESP_Sprite
Posts: 9724
Joined: Thu Nov 26, 2015 4:08 am

Re: AsyncWebServer suddenly stops responding to http requests after a couple of hours running

Postby ESP_Sprite » Mon Jun 21, 2021 1:11 am

Possibly a memory leak? You may want to print the amount of free memory (using e.g. heap_caps_get_free_size(MALLOC_CAP_8BIT)) to see if that consistently goes down.

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: AsyncWebServer suddenly stops responding to http requests after a couple of hours running

Postby mbratch » Mon Jun 21, 2021 2:13 am

ESP_Sprite wrote:
Mon Jun 21, 2021 1:11 am
Possibly a memory leak? You may want to print the amount of free memory (using e.g. heap_caps_get_free_size(MALLOC_CAP_8BIT)) to see if that consistently goes down.
Possibly. Although I'm sending packets to one ESP32 at double the rate that I'm sending them to the other, yet they stop responding at exactly (I mean EXACTLY) the same time. I've also had both of them run for nearly a full day before they stopped simultaneously. That's why I suspect there's something happening on the network that the ESP'32's are reacting to that I don't know about.

I'll try displaying the free memory during the test. That's a good suggestion. Ultimately, maybe WireShark...

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: AsyncWebServer suddenly stops responding to http requests after a couple of hours running

Postby mbratch » Thu Jun 24, 2021 12:23 am

ESP_Sprite wrote:
Mon Jun 21, 2021 1:11 am
Possibly a memory leak? You may want to print the amount of free memory (using e.g. heap_caps_get_free_size(MALLOC_CAP_8BIT)) to see if that consistently goes down.
I ran `heap_caps_get_free_size(MALLOC_CAP_8BIT)` at regular intervals. The output was steady as a rock, so at least it did not show any sign of memory leak.
Last edited by mbratch on Thu Jun 24, 2021 10:31 am, edited 1 time in total.

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: IP suddenly stops responding after several hours running

Postby mbratch » Thu Jun 24, 2021 10:21 am

I converted my application over to ESP-IDF and I still have the identical problem. Here's a message I see on the monitor when the wifi connection drops:

Code: Select all

I (10605501) wifi:state: run -> init (ec0)
I (10605501) wifi:pm stop, total sleep time: 9582020957 us / 10604598223 us

I (10605501) wifi:new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1
W (10605511) httpd_txrx: httpd_sock_err: error in recv : 113

MasterExploder
Posts: 26
Joined: Wed Feb 08, 2023 6:17 pm

Re: IP suddenly stops responding after several hours running

Postby MasterExploder » Thu Dec 07, 2023 4:18 pm

Did you manage to solve this?

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: IP suddenly stops responding after several hours running

Postby mbratch » Thu Dec 07, 2023 6:01 pm

MasterExploder wrote:
Thu Dec 07, 2023 4:18 pm
Did you manage to solve this?
Yes.

Originally, I worked around the problem by writing my own wifi event handler in ESP-IDF and, when an unexpected disconnect occurred, I would reconnect to the AP.

As far as root cause... there's a problem that happens with only some AP routers. When I wrote my own handler, I displayed the reason code for the disconnect, which indicated a "MIC failure". I submitted a ticket on github, and all the details are on that ticket. I believe there was eventually a fix made in the ESP-IDF library. But I was unable to test whether it fixed the root cause in my case because I had since also changed my router, and my new router did not exhibit the issue.

Who is online

Users browsing this forum: No registered users and 63 guests