WIFI sockets fail after a while

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

WIFI sockets fail after a while

Postby FrankJensen » Sat Mar 16, 2024 10:47 pm

Hi there.

I am running 2 sockets, a http_client and a webserver on a ESP32-S3.
I have in configmenu changed the number of sockets to 16.

But after around 15 minutes, it fails. In a way, that all services fail at the same time.
If I run one service at a time, it seems like they run forever.

My suspesion is, that when one of them fails, due to a bad connection, faulty data or something else, the whole house of cards falls. And it never recovers. Even though, I make a delay for 5 minutes on all services, when they fail, in order to any TIME_WAIT to time out. But the situation never recovers. Only a reboot helps.

I do use SO_REUSEADDR on the sockets, but not on the client and webserver. And I do close the sockets, after each request. If I didnt, I would run out of sockets a lot earlier, since the sockets and client runs every 10 seconds.

What I really would like, is a way to list all the active sockets, and what that are/was connected to. That would make it a lot easier to find the fault, in case something hangs. So how do I list all the actice sockets?

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

Re: WIFI sockets fail after a while

Postby FrankJensen » Sun Mar 17, 2024 12:36 am

As seen here, all servives fail. They run in separate tasks. In this particular case, it happened, when I opened a webpage. But even though I have 7-8 connections, I think 16 sockets should be enough, unless some are left in TIME_WAIT ..... There are 1 Huawei inverter, and 3 MUST inverters. So I would expect that to be 4 sockets?

I (776928) Must: Data received successfully
W (777878) crc: Telegram read. CRC: 4CDF = 4CDF. PASS = YES
W (779438) Must: Inverter: 2, packet: 0
I (779448) Must: Data sent successfully
I (779718) Must: Data received successfully
E (780888) uart_mbusmeter: Status: 0c
W (782458) Must: Inverter: 2, packet: 1
I (782458) Must: Data sent successfully
I (782788) Must: Data received successfully
W (785468) Must: Inverter: 2, packet: 2
I (785468) Must: Data sent successfully
I (785858) Must: Data received successfully
I (786928) Huawei: Data sent successfully
I (787928) Huawei: Data received successfully
W (787928) Huawei: Sleeping
E (787928) Huawei: read ERROR
W (788068) crc: Telegram read. CRC: 4D05 = 05D0. PASS = NO
W (788488) Must: Inverter: 0, packet: 0
I (788498) Must: Data sent successfully
I (788928) Must: Data received successfully
E (789418) uart_mbusmeter: Status: 0c
W (791508) Must: Inverter: 0, packet: 1
I (791508) Must: Data sent successfully
I (791998) Must: Data received successfully
W (794518) Must: Inverter: 0, packet: 2
I (794538) Must: Data sent successfully
I (795068) Must: Data received successfully
W (797528) Must: Inverter: 1, packet: 0
I (797528) Must: Data sent successfully
I (797838) Must: Data received successfully
I (797928) Huawei: Data sent successfully
E (798018) uart_mbusmeter: Status: 0c
W (798138) crc: Telegram read. CRC: 0D9F = 0D9F. PASS = YES
I (798928) Huawei: Data received successfully
W (798928) Huawei: Sleeping
E (798928) Huawei: read ERROR
W (800538) Must: Inverter: 1, packet: 1
I (800578) Must: Data sent successfully
I (800898) Must: Data received successfully
W (803548) Must: Inverter: 1, packet: 2
E (806618) uart_mbusmeter: Status: 0c
W (807308) crc: Telegram read. CRC: 632C = 632C. PASS = YES
E (815068) uart_mbusmeter: Status: 0c
W (817508) crc: Telegram read. CRC: DB22 = DB22. PASS = YES
E (823638) uart_mbusmeter: Status: 0c
W (827648) crc: Telegram read. CRC: 0FE9 = 0FE9. PASS = YES
E (832228) uart_mbusmeter: Status: 0c
W (837918) crc: Telegram read. CRC: BBFB = BBFB. PASS = YES
E (840798) uart_mbusmeter: Status: 0c
W (848018) crc: Telegram read. CRC: FC9D = FC9D. PASS = YES
E (849358) uart_mbusmeter: Status: 0c
E (857868) uart_mbusmeter: Status: 0c
W (858158) crc: Telegram read. CRC: 0686 = 0686. PASS = YES
E (866898) uart_mbusmeter: Status: 0c
W (867428) crc: Telegram read. CRC: 093E = 71D3. PASS = NO
E (875948) uart_mbusmeter: Status: 0c
W (877518) crc: Telegram read. CRC: 9C55 = 9C55. PASS = YES
E (885108) uart_mbusmeter: Status: 0c
W (887668) crc: Telegram read. CRC: B5DE = B5DE. PASS = YES
E (893618) uart_mbusmeter: Status: 0c
W (897858) crc: Telegram read. CRC: 077C = 077C. PASS = YES
E (902228) uart_mbusmeter: Status: 0c
W (908028) crc: Telegram read. CRC: 6537 = 6537. PASS = YES
E (910848) uart_mbusmeter: Status: 0c
W (918268) crc: Telegram read. CRC: 6F6A = F2D4. PASS = NO
E (919618) uart_mbusmeter: Status: 0c
W (927438) crc: Telegram read. CRC: 2ED0 = 7F94. PASS = NO
E (928128) uart_mbusmeter: Status: 0c
E (936738) uart_mbusmeter: Status: 0c
W (937538) crc: Telegram read. CRC: 2279 = 2279. PASS = YES

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

Re: WIFI sockets fail after a while

Postby FrankJensen » Sun Mar 17, 2024 9:17 pm

I now have only one service running, together with the webserver. And everyghing just halts after some minutes. I have anabled all logging, but no errors are thrown. And the service have running, uses only reuses one socket (57), so its not using up the sockets.

The Json is loaded from the webserver with a ajax call, around every 20 seconds.
Any ideas?

I (1384325) wifi: Connected..... -73 God
I (1385285) wifi: Connected..... -73 God
W (1385945) Must: Inverter: 1, packet: 1
W (1385945) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must2, last: Must1
I (1385945) Must: Data sent successfully
I (1386345) wifi: Connected..... -74 God
I (1386445) Must: Data received successfully
W (1386445) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must2
I (1387305) wifi: Connected..... -73 God
I (1388265) wifi: Connected..... -73 God
W (1388965) Must: Inverter: 1, packet: 2
W (1388965) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must2, last: Must2
I (1388965) Must: Data sent successfully
I (1389325) wifi: Connected..... -73 God
I (1389515) Must: Data received successfully
W (1389515) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must2
I (1390285) wifi: Connected..... -74 God
I (1391345) wifi: Connected..... -73 God
W (1391975) Must: Inverter: 2, packet: 1
W (1391975) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must3, last: Must2
I (1391975) Must: Data sent successfully
I (1392285) Must: Data received successfully
W (1392285) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must3
I (1392305) wifi: Connected..... -73 God
E (1392755) uart_mbusmeter: Status: 0c
I (1393265) wifi: Connected..... -73 God
I (1394325) wifi: Connected..... -73 God
W (1394985) Must: Inverter: 2, packet: 2
W (1394985) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must3, last: Must3
I (1394985) Must: Data sent successfully
I (1395285) wifi: Connected..... -74 God
I (1395355) Must: Data received successfully
W (1395365) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must3
I (1396345) wifi: Connected..... -75 God
I (1397305) wifi: Connected..... -74 God
W (1398005) Must: Inverter: 0, packet: 1
W (1398005) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must1, last: Must3
I (1398015) Must: Data sent successfully
I (1398265) wifi: Connected..... -74 God
I (1398415) Must: Data received successfully
W (1398415) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must1
I (1399325) wifi: Connected..... -74 God
I (1400285) wifi: Connected..... -74 God
W (1401025) Must: Inverter: 0, packet: 2
W (1401025) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must1, last: Must1
I (1401025) Must: Data sent successfully
I (1401345) wifi: Connected..... -73 God
I (1401485) Must: Data received successfully
W (1401495) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must1
E (1401775) uart_mbusmeter: Status: 0c
I (1402305) wifi: Connected..... -73 God
I (1403265) wifi: Connected..... -73 God
W (1404035) Must: Inverter: 1, packet: 1
W (1404035) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must2, last: Must1
I (1404035) Must: Data sent successfully
I (1404325) wifi: Connected..... -73 God
I (1404555) Must: Data received successfully
W (1404555) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must2
I (1405285) wifi: Connected..... -73 God
I (1406345) wifi: Connected..... -73 God
W (1407045) Must: Inverter: 1, packet: 2
W (1407045) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must2, last: Must2
I (1407045) Must: Data sent successfully
I (1407305) wifi: Connected..... -73 God
I (1407625) Must: Data received successfully
W (1407635) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must2
I (1408265) wifi: Connected..... -73 God
I (1409325) wifi: Connected..... -73 God
W (1410055) Must: Inverter: 2, packet: 1
W (1410055) SOCKET_STAT: OPEN No: 0, Socket: 57, app: Must3, last: Must2
I (1410055) Must: Data sent successfully
I (1410285) wifi: Connected..... -73 God
E (1410315) uart_mbusmeter: Status: 0c
I (1410395) Must: Data received successfully
W (1410395) SOCKET_STAT: CLOSE No: 0, Socket: 57, app: Closed, last: Must3
I (1411345) wifi: Connected..... -73 God
W (1412235) webserver: Enter
I (1412265) GETJSON: Received: getDeviceJson /getDeviceJson?packet=19
I (1412285) JSON_maker: {"datestamp":"2024.03.16 23:00:00", "timestamp":"2024.03.17 21:10:41", "packetstamp":"2024.03.17 21:10:31", "Device":[{"reg":0, "nam":"Time", "val":0.00, "txt":"17.03.24 21:47:33", "unt":""}, {"reg":0, "nam":"Model", "val":0.00, "txt":"PV 1800", "unt":""}, {"reg":0, "nam":"Status", "val":0.00, "txt":"ON", "unt":""}, {"reg":8, "nam":"Output prioriry", "val":0.00, "txt":"SBU", "unt":""}, {"reg":18, "nam":"Op/aflad limit2", "val":44.00, "txt":"44.0", "unt":"[V]"}, {"reg":22, "nam":"Oplad BULK", "val":56.00, "txt":"56.0",
"unt":"[V]"}, {"reg":21, "nam":"Oplad FLOAT", "val":55.00, "txt":"55.0", "unt":"[V]"}, {"reg":17, "nam":"Aflad stop", "val":45.00, "txt":"45.0", "unt":"[V]"}, {"reg":26, "nam":"Aflad end stop", "val":43.20, "txt":"43.2", "unt":"[V]"}, {"reg":24, "nam":"Max grid", "val":1.00, "txt":"1.0", "unt":"[A]"}, {"reg":42, "nam":"Lade prioritet", "val":1.00, "txt":"SOL or UTI", "unt":""}, {"reg":3, "nam":"Search mode", "val":1.00, "txt":"ON", "unt":""}, {"reg":0, "nam":"Mode", "val":0.00, "txt":"Grid tie", "unt":""}, {"reg":6, "nam":"Grid", "val":232.50, "txt":"232.5", "unt":"[V]"}, {"reg":5, "nam":"Load", "val":232.20, "txt":"232.2", "unt":"[V]"}, {"reg":4, "nam":"Batteri", "val":53.10, "txt":"53.1", "unt":"[V]"}, {"reg":10, "nam":"Grid", "val":2.60, "txt":"2.6", "unt":"[A]"}, {"reg":11, "nam":"Load", "val":3.90, "txt":"3.9", "unt":"[A]"}, {"reg":9, "nam":"Batteri", "val":3.10, "txt":"3.1", "unt":"[A]"}, {"reg":13, "nam":"Grid", "val":-73.00, "txt":"-73", "unt":"[W]"}, {"reg":14, "nam":"Load", "val":790.00, "txt":"790", "unt":"[W]"}, {"reg":12, "nam":"Batteri", "val":-729.00, "txt":"-729", "unt":"[W]"}, {"reg":15, "nam":"Load", "val":15.00, "txt":"15", "unt":"[%]"}]}
I (1412305) wifi: Connected..... -73 God
W (1412445) webserver: Sending
W (1412445) webserver: After sending
W (1413075) Must: Inverter: 2, packet: 2
I (1413295) wifi: Connected..... -73 God
I (1414255) wifi: Connected..... -73 God
I (1415315) wifi: Connected..... -73 God
I (1416275) wifi: Connected..... -73 God
I (1417335) wifi: Connected..... -73 God
I (1418295) wifi: Connected..... -73 God
E (1419005) uart_mbusmeter: Status: 0c
I (1419255) wifi: Connected..... -73 God
I (1420315) wifi: Connected..... -72 God
I (1421275) wifi: Connected..... -72 God
I (1422335) wifi: Connected..... -72 God
I (1423295) wifi: Connected..... -73 God
I (1424255) wifi: Connected..... -74 God
I (1425315) wifi: Connected..... -74 God
I (1426275) wifi: Connected..... -74 God
I (1427335) wifi: Connected..... -78 OK
E (1427575) uart_mbusmeter: Status: 0c
I (1428295) wifi: Connected..... -75 God
I (1429255) wifi: Connected..... -76 OK
I (1430315) wifi: Connected..... -76 OK
W (1430345) wifi:[ADDBA]rx delba, code:39, delete tid:0
I (1431275) wifi: Connected..... -76 OK
I (1432335) wifi: Connected..... -76 OK
I (1433295) wifi: Connected..... -76 OK
I (1434255) wifi: Connected..... -76 OK
I (1435315) wifi: Connected..... -75 God
E (1436045) uart_mbusmeter: Status: 0c
I (1436275) wifi: Connected..... -76 OK
I (1437335) wifi: Connected..... -76 OK
I (1438295) wifi: Connected..... -76 OK
I (1439255) wifi: Connected..... -76 OK

ESP_Sprite
Posts: 9764
Joined: Thu Nov 26, 2015 4:08 am

Re: WIFI sockets fail after a while

Postby ESP_Sprite » Mon Mar 18, 2024 2:50 am

Maybe a memory leak? You could try spinning up a separate task that prints the amount of available memory every x seconds, see if that goes down.

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

Re: WIFI sockets fail after a while

Postby FrankJensen » Mon Mar 18, 2024 12:37 pm

Enabling logging for esp_wifi, communication ends with:

W (1198238) wifi:m f null

what does that mean?
Wifi is still connected, but all communication on webserver, http_client and sockets stop.


I (1195228) Must: Data sent successfully
I (1195538) Must: Data received successfully
W (1195538) SOCKET_STAT: CLOSE No: 0, Socket: 51, app: Closed, last: Must2
I (1196088) wifi: Connected..... -82 OK
I (1197048) wifi: Connected..... -82 OK
I (1198108) wifi: Connected..... -81 OK
W (1198238) Must: Inverter: 2, packet: 1
W (1198238) wifi:m f null

I (1199068) wifi: Connected..... -82 OK
I (1200028) wifi: Connected..... -82 OK
E (1200068) uart_mbusmeter: Status: 0c
I (1201088) wifi: Connected..... -82 OK
I (1202048) wifi: Connected..... -83 OK
I (1203108) wifi: Connected..... -83 OK
W (1203848) crc: Telegram read. CRC: 74A9 = 74A9. PASS = YES
I (1204068) wifi: Connected..... -82 OK
I (1205028) wifi: Connected..... -83 OK

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

Re: WIFI sockets fail after a while

Postby FrankJensen » Mon Mar 18, 2024 1:31 pm

You might be right.... My SRAM seems very low... How ever, my heap seems way too big?
But my SRAM counts down, and around 12Kb the wifi_m f null occurs....

Is this correct?

Code: Select all

size_t free_sram = heap_caps_get_free_size(MALLOC_CAP_INTERNAL);
size_t free_heap = esp_get_free_heap_size();
size_t free_psram = heap_caps_get_free_size(MALLOC_CAP_SPIRAM);
I (1197338) MEMORY: Free heap memory: 8343.90 Kb, free PSRAM: 8331.59 Kb, , free SRAM: 12.31 Kb
I (1197338) wifi: Connected..... -75 God
W (1198198) Must: Inverter: 2, packet: 1
W (1198198) wifi:m f null

I (1198298) MEMORY: Free heap memory: 8343.40 Kb, free PSRAM: 8331.59 Kb, , free SRAM: 11.81 Kb
I (1198298) wifi: Connected..... -75 God

ESP_Sprite
Posts: 9764
Joined: Thu Nov 26, 2015 4:08 am

Re: WIFI sockets fail after a while

Postby ESP_Sprite » Mon Mar 18, 2024 2:10 pm

That sounds like your issue. WiFi in particular needs some things to be in SRAM rather than external RAM, and that 12K that is left is likely fragmented enough not to be able to give up a buffer. Note that your total heap is SRAM plus external RAM, that's why it's so big.

Craige Hales
Posts: 94
Joined: Tue Sep 07, 2021 12:07 pm

Re: WIFI sockets fail after a while

Postby Craige Hales » Mon Mar 18, 2024 2:54 pm

agree about memory leaks. Something like this

Code: Select all

unsigned int totFreeBytes() { // Total free bytes in the heap. Equivalent to multi_free_heap_size().
    heap_caps_get_info(&sMHIT, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
    return sMHIT.total_free_bytes;
}
static unsigned int total_allocated_bytes() { // Total bytes allocated to data in the heap.
    return sMHIT.total_allocated_bytes;
}
static unsigned int largest_free_block() { // Size of largest free block in the heap. This is the largest malloc-able size.
    return sMHIT.largest_free_block;
}
static unsigned int minimum_free_bytes() { // Lifetime minimum free heap size. Equivalent to multi_minimum_free_heap_size().
    return sMHIT.minimum_free_bytes;
}
unsigned int minFreeBytes(){
    heap_caps_get_info(&sMHIT, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
    return sMHIT.minimum_free_bytes;
}
That code expects a monitor task to call totFreeBytes() to reload the structure, then the other functions.
I find keeping an eye on the minFreeBytes really helpful; in my typical ESP32 projects >50K is good and <20K is going to fail eventually.
Here's a web page that watches several values:
meters.png
meters.png (67.87 KiB) Viewed 1882 times
That's 65 days since the last reboot.
Craige

FrankJensen
Posts: 52
Joined: Sun Mar 10, 2024 9:34 pm

Re: WIFI sockets fail after a while

Postby FrankJensen » Mon Mar 18, 2024 8:35 pm

ESP_Sprite, Craige - thank you.

I found my malloc, that was not free'ed sometimes.

The min Heap is very usefull :-)

Code: Select all

multi_heap_info_t info;
heap_caps_get_info(&info, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
ESP_LOGI("MEMORY2", "Total free: %.2f Kb, total alloc:  %.2f Kb, , Largest block:  %.2f Kb, Min free: %.2f Kb", (float)info.total_free_bytes/1000, (float)info.total_allocated_bytes/1000, (float)info.largest_free_block/1000, (float)info.minimum_free_bytes/1000);
And I scooped some of my structs to MALLOC_CAP_SPIRAM

So looks like, I am running stable > 90 kB free. Lets see in a couple of days :)

joglz8
Posts: 20
Joined: Wed Jan 09, 2019 9:46 am

Re: WIFI sockets fail after a while

Postby joglz8 » Tue Mar 19, 2024 10:09 am

This is off topic, but I have to ask Craige about that web page with gauges.

Did you program that? Is it available on line?

I want to make a gauge display for my combine and those gauges would be ideal.

Thanks,
JP.
Old controls guy, SW Ontario, Canada .

Who is online

Users browsing this forum: atx823, Google [Bot], MicroController and 146 guests