CORRUPT HEAP after reconnect

boarchuz
Posts: 606
Joined: Tue Aug 21, 2018 5:28 am

Re: CORRUPT HEAP after reconnect

Postby boarchuz » Mon Oct 24, 2022 11:28 pm

marclee wrote:
Sat Oct 15, 2022 5:33 pm
Can anybody confirm that bug?
I commented in your GitHub issue, I'm getting it too. In my case the ESP32 provides an AP and HTTP server for configuring WiFi connection and so forth - pretty typical scenario - and when the ESP32 does a WiFi scan for access points, my laptop often disconnects and reconnects to the ESP32's AP, which has a high probability of triggering the behaviour you describe and log.

It looks like a use-after-free on the old "bss" structure after reallocation. It's 824 bytes, and you'll find the heap corruption is always 768 (0x300) bytes from the logged bss pointer. Hopefully it's a quick fix.

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: CORRUPT HEAP after reconnect

Postby mbratch » Tue Oct 25, 2022 2:53 pm

The original application is much more complex but I reduced the code to the relevant functions. According to the documentation "Expected 0xfefefefe got 0x..." indicates the app has a use-after-free bug.
It could also be caused by an event handler or task or timer using too much stack.

You mentioned you are showing simplified code and the problem occurs with your more complex app. Are you actually running the simplified version you are showing and it is exhibiting the failure? I don't see anything in the source you are showing that would lead to the error.

marclee
Posts: 51
Joined: Fri Apr 09, 2021 1:09 pm

Re: CORRUPT HEAP after reconnect

Postby marclee » Wed Oct 26, 2022 8:56 am

mbratch wrote:
The original application is much more complex but I reduced the code to the relevant functions. According to the documentation "Expected 0xfefefefe got 0x..." indicates the app has a use-after-free bug.
It could also be caused by an event handler or task or timer using too much stack.

You mentioned you are showing simplified code and the problem occurs with your more complex app. Are you actually running the simplified version you are showing and it is exhibiting the failure? I don't see anything in the source you are showing that would lead to the error.

I also wrote: "As there is no dynamic allocation in the application code, I suppose there is a bug somewhere else in the IDE."
And of course the error occurs with the simplified version that I published.

marclee
Posts: 51
Joined: Fri Apr 09, 2021 1:09 pm

Re: CORRUPT HEAP after reconnect

Postby marclee » Wed Oct 26, 2022 9:04 am

boarchuz wrote:
Mon Oct 24, 2022 11:28 pm
marclee wrote:
Sat Oct 15, 2022 5:33 pm
Can anybody confirm that bug?
I commented in your GitHub issue, I'm getting it too. In my case the ESP32 provides an AP and HTTP server for configuring WiFi connection and so forth - pretty typical scenario - and when the ESP32 does a WiFi scan for access points, my laptop often disconnects and reconnects to the ESP32's AP, which has a high probability of triggering the behaviour you describe and log.

It looks like a use-after-free on the old "bss" structure after reallocation. It's 824 bytes, and you'll find the heap corruption is always 768 (0x300) bytes from the logged bss pointer. Hopefully it's a quick fix.

Thank you for your feedback. Could you publish your code and error log?

dmitrij999
Posts: 71
Joined: Sat Mar 02, 2019 8:06 pm

Re: CORRUPT HEAP after reconnect

Postby dmitrij999 » Fri Oct 28, 2022 12:08 am

The frequent cause of corrupt heaps is the lack of stack given to FreeRTOS task, getting array/list item by index out of its range or calling function or object which pointer is null.
Could you please check stack first?

marclee
Posts: 51
Joined: Fri Apr 09, 2021 1:09 pm

Re: CORRUPT HEAP after reconnect

Postby marclee » Fri Oct 28, 2022 9:30 am

dmitrij999 wrote: The frequent cause of corrupt heaps is the lack of stack given to FreeRTOS task, getting array/list item by index out of its range or calling function or object which pointer is null.
Could you please check stack first?

Which task are you talking about? Have you read my source code? Please have a look at my source code, which I have published with my first post.

According the documentation https://docs.espressif.com/projects/esp ... -detection:
If a call to malloc() or realloc() causes a crash because it expected to find the pattern 0xFEFEFEFE in free memory and a different pattern was found, then this indicates the app has a use-after-free bug where it is writing to memory which has already been freed.
There's no dynamic memory allocation within my code. So it must be a bug within the SDK (I already mentioned that).

Is there anybody from Espressif staff who is willing to look after this serious bug?

boarchuz
Posts: 606
Joined: Tue Aug 21, 2018 5:28 am

Re: CORRUPT HEAP after reconnect

Postby boarchuz » Sat Oct 29, 2022 2:52 pm

marclee wrote:
Fri Oct 28, 2022 9:30 am
Is there anybody from Espressif staff who is willing to look after this serious bug?
Someone has been assigned recently. Keep an eye on the issue on GitHub.

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: CORRUPT HEAP after reconnect

Postby mbratch » Sat Oct 29, 2022 3:16 pm

Just as some added information... I tried your code on my ESP32 I'm running ESP-IDF 4.4.2. I am using the default sdkconfig settings.

I attempted your steps as listed:
  1. Smartphone connects to ESP32-AP (channel 1) - I saw the AP in the list on my smartphone and manually connected to it successfully
  2. ESP32 connects to external access point (channel 3 .. 11) - this connection is successful
  3. Smartphone is not capable to switch channel during a connection, so it disconnects - From prior step my smartphone is connected to the ESP32 AP. I manually on the smartphone in the current step connect to my wifi's AP, which causes it to disconnect from the ESP32 AP.
  4. The ESP32 doesn't recognise the disconnect - My ESP32 recognizes the disconnect as shown in the log. It looks like your code isn't checking for the correct event. The event is WIFI_EVENT_AP_STADISCONNECTED not WIFI_EVENT_STA_DISCONNECT.
  5. The smartphone connects again to the ESP32-AP - I reconnected to the ESP32 AP manually on the phone which worked succesfully.
  6. CORRUPT HEAP: Invalid data at 0x3ffdd4a0. Expected 0xfefefefe got 0x00000000 - after several repeats of the above steps, I did not see this error on my system.
Note that I updated your event handler to explicitly identify more events just to see better what was happening. So the log output reflects this change:

Code: Select all

static void wifi_event_handler(void* arg, esp_event_base_t event_base, int32_t event_id, void* event_data)
{
  ESP_LOGI(TAG, "event_base = %i, event_id = %i", (int16_t) *event_base, (int16_t) event_id);

  if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_STA_START)) {
      ESP_LOGI(TAG, "STA start");
      connect_timer = 1;
  }
  else if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_AP_START)) {
      ESP_LOGI(TAG, "AP start");
      connect_timer = 1;
  }
  else if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_STA_CONNECTED)) {
      ESP_LOGI(TAG, "STA connected");
  }
  else if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_AP_STACONNECTED)) {
      ESP_LOGI(TAG, "AP STA connected");
  }
  else if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_STA_DISCONNECTED)) {
      ESP_LOGI(TAG, "STA Disconnect");
      connect_timer = 1;
  }
  else if ((event_base == WIFI_EVENT) && (event_id == WIFI_EVENT_AP_STADISCONNECTED)) {
      ESP_LOGI(TAG, "AP STA Disconnect");
      connect_timer = 1;
  }
  else if ((event_base == IP_EVENT) && (event_id == IP_EVENT_STA_GOT_IP)) {
      ip_event_got_ip_t* event = (ip_event_got_ip_t*) event_data;
      ESP_LOGI(TAG, "got ip:" IPSTR, IP2STR(&event->ip_info.ip));
  }
  else {
      ESP_LOGE(TAG, "Unknown event");
  }
}
Console output:

Code: Select all

I (27) boot: ESP-IDF v4.4.2-dirty 2nd stage bootloader
I (27) boot: compile time 09:50:21
I (27) boot: chip revision: 1
I (31) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (38) boot.esp32: SPI Speed      : 40MHz
I (42) boot.esp32: SPI Mode       : DIO
I (47) boot.esp32: SPI Flash Size : 2MB
I (51) boot: Enabling RNG early entropy source...
I (57) boot: Partition Table:
I (60) boot: ## Label            Usage          Type ST Offset   Length
I (68) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (75) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (83) boot:  2 factory          factory app      00 00 00010000 00100000
I (90) boot: End of partition table
I (94) boot_comm: chip revision: 1, min. application chip revision: 0
I (101) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=12dd4h ( 77268) map
I (138) esp_image: segment 1: paddr=00022dfc vaddr=3ffb0000 size=03804h ( 14340) load
I (144) esp_image: segment 2: paddr=00026608 vaddr=40080000 size=09a10h ( 39440) load
I (160) esp_image: segment 3: paddr=00030020 vaddr=400d0020 size=6d8b8h (448696) map
I (323) esp_image: segment 4: paddr=0009d8e0 vaddr=40089a10 size=0aad0h ( 43728) load
I (341) esp_image: segment 5: paddr=000a83b8 vaddr=50000000 size=00010h (    16) load
I (351) boot: Loaded app from partition at offset 0x10000
I (351) boot: Disabling RNG early entropy source...
I (363) cpu_start: Pro cpu up.
I (364) cpu_start: Starting app cpu, entry point is 0x40081198
I (0) cpu_start: App cpu up.
I (378) cpu_start: Pro cpu start user code
I (378) cpu_start: cpu freq: 160000000
I (378) cpu_start: Application information:
I (382) cpu_start: Project name:     app-template
I (388) cpu_start: App version:      1
I (392) cpu_start: Compile time:     Oct 29 2022 09:47:23
I (398) cpu_start: ELF file SHA256:  d4f52920418d8785...
I (404) cpu_start: ESP-IDF:          v4.4.2-dirty
I (410) heap_init: Initializing. RAM available for dynamic allocation:
I (417) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (423) heap_init: At 3FFB74C8 len 00028B38 (162 KiB): DRAM
I (429) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (435) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (442) heap_init: At 400944E0 len 0000BB20 (46 KiB): IRAM
I (449) spi_flash: detected chip: gd
I (452) spi_flash: flash io: dio
W (456) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Usin                                 g the size in the binary image header.
I (470) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (589) wifi:wifi driver task: 3ffbfef0, prio:23, stack:6656, core=0
I (589) system_api: Base MAC address is not set
I (589) system_api: read default base MAC address from EFUSE
I (609) wifi:wifi firmware version: eeaa27d
I (609) wifi:wifi certification version: v7.0
I (609) wifi:config NVS flash: enabled
I (609) wifi:config nano formating: disabled
I (619) wifi:Init data frame dynamic rx buffer num: 32
I (619) wifi:Init management frame dynamic rx buffer num: 32
I (629) wifi:Init management short buffer num: 32
I (629) wifi:Init dynamic tx buffer num: 32
I (639) wifi:Init static rx buffer size: 1600
I (639) wifi:Init static rx buffer num: 10
I (639) wifi:Init dynamic rx buffer num: 32
I (649) wifi_init: rx ba win: 6
I (649) wifi_init: tcpip mbox: 32
I (659) wifi_init: udp mbox: 6
I (659) wifi_init: tcp mbox: 6
I (659) wifi_init: tcp tx win: 5744
I (669) wifi_init: tcp rx win: 5744
I (669) wifi_init: tcp mss: 1440
I (669) wifi_init: WiFi IRAM OP enabled
I (679) wifi_init: WiFi RX IRAM OP enabled
I (699) main.c: start AP STA
I (699) main.c: IP: 192.168.4.1
I (699) main.c: GW: 192.168.4.1
I (699) main.c: Mask: 255.255.255.0
I (709) main.c: wifi_init_softap finished. SSID: test123 password: mypassword
I (709) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (809) wifi:mode : sta (30:83:98:fa:77:dc) + softAP (30:83:98:fa:77:dd)
I (809) wifi:enable tsf
I (819) wifi:Total power save buffer number: 16
I (819) wifi:Init max length of beacon: 752/752
I (819) wifi:Init max length of beacon: 752/752
I (819) main.c: event_base = 87, event_id = 2
I (829) main.c: STA start
I (829) main.c: event_base = 87, event_id = 12
I (839) main.c: AP start
I (3829) wifi:ap channel adjust o:1,1 n:9,2
I (3829) wifi:new:<9,0>, old:<1,1>, ap:<9,2>, sta:<9,0>, prof:1
I (3839) wifi:state: init -> auth (b0)
I (3839) wifi:state: auth -> assoc (0)
I (3839) wifi:state: assoc -> run (10)
W (3849) wifi:<ba-add>idx:0 (ifx:0, 3c:84:6a:82:57:68), tid:7, ssn:0, winSize:64
I (3859) wifi:connected with externalSSID, aid = 12, channel 9, BW20, bssid = 3c:84:6a:82:57:68
I (3859) wifi:security: WPA2-PSK, phy: bgn, rssi: -33
I (3869) wifi:pm start, type: 1

I (3869) main.c: event_base = 87, event_id = 4
I (3869) main.c: STA connected
W (3879) wifi:<ba-add>idx:1 (ifx:0, 3c:84:6a:82:57:68), tid:6, ssn:0, winSize:64
I (3889) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (4579) main.c: event_base = 73, event_id = 0
I (4579) main.c: got ip:192.168.1.67
I (4579) esp_netif_handlers: sta ip: 192.168.1.67, mask: 255.255.255.0, gw: 192.168.1.1
I (17399) wifi:new:<9,0>, old:<9,0>, ap:<9,2>, sta:<9,0>, prof:1
I (17399) wifi:station: 98:b8:ba:01:f9:37 join, AID=1, bgn, 20
I (17429) main.c: event_base = 87, event_id = 14
I (17439) main.c: AP STA connected
W (17509) wifi:<ba-add>idx:2 (ifx:1, 98:b8:ba:01:f9:37), tid:0, ssn:0, winSize:64
I (17689) esp_netif_lwip: DHCP server assigned IP to a station, IP is: 192.168.4.2
I (43909) wifi:station: 98:b8:ba:01:f9:37 leave, AID = 1, bss_flags is 658547, bss:0x3ffc6edc
I (43909) wifi:new:<9,0>, old:<9,0>, ap:<9,2>, sta:<9,0>, prof:1
W (43909) wifi:<ba-del>idx
I (43919) main.c: event_base = 87, event_id = 15
I (43919) main.c: AP STA Disconnect
I (65349) wifi:new:<9,0>, old:<9,0>, ap:<9,2>, sta:<9,0>, prof:1
I (65349) wifi:station: 98:b8:ba:01:f9:37 join, AID=1, bgn, 20
I (65379) main.c: event_base = 87, event_id = 14
I (65379) main.c: AP STA connected
W (65429) wifi:<ba-add>idx:2 (ifx:1, 98:b8:ba:01:f9:37), tid:0, ssn:0, winSize:64
I (65559) esp_netif_lwip: DHCP server assigned IP to a station, IP is: 192.168.4.2


marclee
Posts: 51
Joined: Fri Apr 09, 2021 1:09 pm

Re: CORRUPT HEAP after reconnect

Postby marclee » Sat Oct 29, 2022 4:38 pm

mbratch wrote:
Sat Oct 29, 2022 3:16 pm
Just as some added information... I tried your code on my ESP32 I'm running ESP-IDF 4.4.2. I am using the default sdkconfig settings.
If you want to figure out a bug in ESP-IDF v5.0, you need to use ESP-IDF v5.0, not ESP-IDF v4.4.2.

mbratch wrote:
Sat Oct 29, 2022 3:16 pm
  1. ...
  2. Smartphone is not capable to switch channel during a connection, so it disconnects - From prior step my smartphone is connected to the ESP32 AP. I manually on the smartphone in the current step connect to my wifi's AP, which causes it to disconnect from the ESP32 AP.
  3. The ESP32 doesn't recognise the disconnect - My ESP32 recognizes the disconnect as shown in the log. It looks like your code isn't checking for the correct event. The event is WIFI_EVENT_AP_STADISCONNECTED not WIFI_EVENT_STA_DISCONNECT.
You manually disconnected your smartphone, so your smartphone sent a disconnect command to the ESP32. This is a different situation.
In my case the smartphone just cut the connection without sending a disconnect command. Hence the ESP32 didn't receive a disconnect command. The smartphone reconnected to the ESP32 while ESP32 was thinking that it is still connected.

If you want to reproduce the described error, your smartphone has to act as I described in my first post. It has to reconnect, while ESP32 is thinking that it is still connected.

mbratch wrote:
Sat Oct 29, 2022 3:16 pm
It looks like your code isn't checking for the correct event. The event is WIFI_EVENT_AP_STADISCONNECTED not WIFI_EVENT_STA_DISCONNECT.
As you can see in my code from my first post, every event gets logged. And, according to documentation, WIFI_EVENT_STA_DISCONNECTED is the correct event designator? https://docs.espressif.com/projects/esp ... CONNECTEDE

User avatar
mbratch
Posts: 303
Joined: Fri Jun 11, 2021 1:51 pm

Re: CORRUPT HEAP after reconnect

Postby mbratch » Sat Oct 29, 2022 4:53 pm

Yes your code logs every event. It doesn't specifically handle the correct specific disconnect event however.

Sorry I missed your note you were using v5.0. My apologies for having wasted both my time and yours. Have a nice day.

Who is online

Users browsing this forum: VinayDand and 456 guests