Diagnosing heap corruption

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Diagnosing heap corruption

Postby permal » Sun Oct 15, 2017 8:48 pm

Hi,

How do I get some usable information out of a "corrupted heap"-message like the one below (I included the entire output for completeness)?
Leaving...
Hard resetting...
MONITOR
--- idf_monitor on /dev/ttyUSB0 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jun 8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
ets Jun 8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0010,len:4
load:0x3fff0014,len:5404
load:0x40078000,len:0
load:0x40078000,len:12040
entry 0x40078f30
I (31) boot: ESP-IDF v3.0-dev-265-g969f1bb 2nd stage bootloader
I (31) boot: compile time 11:00:35
I (110) boot: Enabling RNG early entropy source...
I (110) boot: SPI Speed : 40MHz
I (120) boot: SPI Mode : DIO
I (158) boot: SPI Flash Size : 4MB
I (196) boot: Partition Table:
I (230) boot: ## Label Usage Type ST Offset Length
I (298) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (368) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (438) boot: 2 factory factory app 00 00 00010000 00100000
I (508) boot: End of partition table
I (547) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x29874 (170100) map
I (937) esp_image: segment 1: paddr=0x0003989c vaddr=0x3ffb0000 size=0x02f58 ( 12120) load
I (961) esp_image: segment 2: paddr=0x0003c7fc vaddr=0x40080000 size=0x00400 ( 1024) load
0x40080000: _WindowOverflow4 at /home/permal/esp/esp-idf/components/freertos/./xtensa_vectors.S:1675

I (1001) esp_image: segment 3: paddr=0x0003cc04 vaddr=0x40080400 size=0x0340c ( 13324) load
I (1109) esp_image: segment 4: paddr=0x00040018 vaddr=0x400d0018 size=0x89f6c (565100) map
0x400d0018: _flash_cache_start at ??:?

I (2187) esp_image: segment 5: paddr=0x000c9f8c vaddr=0x4008380c size=0x0c558 ( 50520) load
0x4008380c: spi_flash_read at /home/permal/esp/esp-idf/components/spi_flash/./flash_ops.c:421

I (2289) esp_image: segment 6: paddr=0x000d64ec vaddr=0x400c0000 size=0x00000 ( 0) load
I (2319) boot: Loaded app from partition at offset 0x10000
I (2327) boot: Disabling RNG early entropy source...
I (257) cpu_start: Pro cpu up.
I (261) cpu_start: Starting app cpu, entry point is 0x4008107c
0x4008107c: call_start_cpu1 at /home/permal/esp/esp-idf/components/esp32/./cpu_start.c:219

I (0) cpu_start: App cpu up.
I (271) heap_init: Initializing. RAM available for dynamic allocation:
D (278) heap_init: New heap initialised at 0x3ffae6e0
I (283) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
D (289) heap_init: New heap initialised at 0x3ffb9e50
I (295) heap_init: At 3FFB9E50 len 000261B0 (152 KiB): DRAM
I (301) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (307) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
D (314) heap_init: New heap initialised at 0x4008fd64
I (319) heap_init: At 4008FD64 len 0001029C (64 KiB): IRAM
I (325) cpu_start: Pro cpu start user code
D (337) clk: RTC_SLOW_CLK calibration value: 3405607
V (48) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (48) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xE
D (53) intr_alloc: Connected src 46 to int 2 (cpu 0)
V (58) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (64) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xC0E
D (72) intr_alloc: Connected src 57 to int 3 (cpu 0)
V (78) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (84) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xE
D (91) intr_alloc: Connected src 16 to int 9 (cpu 0)
V (96) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (103) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0x40E
D (111) intr_alloc: Connected src 24 to int 12 (cpu 0)
I (116) cpu_start: Starting scheduler on PRO CPU.
V (0) intr_alloc: esp_intr_alloc_intrstatus (cpu 1): checking args
V (0) intr_alloc: esp_intr_alloc_intrstatus (cpu 1): Args okay. Resulting flags 0x40E
D (10) intr_alloc: Connected src 25 to int 2 (cpu 1)
I (10) cpu_start: Starting scheduler on APP CPU.
D (166) heap_init: New heap initialised at 0x3ffe0440
D (166) heap_init: New heap initialised at 0x3ffe4350
V (176) Queue: Creating queue 'system_event-TaskEventQueue', with 10 items of size 48.
D (186) nvs: nvs_flash_init_custom partition=nvs start=9 count=6
V (206) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (206) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xE
D (206) intr_alloc: Connected src 22 to int 13 (cpu 0)
V (216) Queue: Creating queue 'mqtt_data-TaskEventQueue', with 10 items of size 36.
V (226) Queue: Creating queue 'TX_empty-TaskEventQueue', with 5 items of size 8.
V (236) Queue: Creating queue 'data_available-TaskEventQueue', with 5 items of size 4.
V (246) Queue: Creating queue 'connection_status-TaskEventQueue', with 5 items of size 12.
V (246) Queue: Creating queue 'timer_events-TaskEventQueue', with 5 items of size 4.
V (256) Queue: Creating queue 'control_event-TaskEventQueue', with 5 items of size 4.
V (266) Queue: Creating queue 'system_event-TaskEventQueue', with 5 items of size 48.
I (276) gpio: GPIO[5]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
V (286) Task: Initializing task 'main', 0x3ffc1adc
V (286) Queue: Creating queue 'SocketDispatcher-TaskEventQueue', with 10 items of size 48.
V (296) Task: Initializing task 'SocketDispatche', 0x3ffc9c04
D (296) nvs: nvs_open_from_partition misc 1
V (306) Task: Task 'SocketDispatche' initialized, 0x3ffc9c04
D (316) nvs: nvs_get_str_or_blob log
I (316) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (326) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
D (336) nvs: nvs_open_from_partition nvs.net80211 1
D (336) nvs: nvs_get opmode 1
D (346) nvs: nvs_get_str_or_blob sta.ssid
D (346) nvs: nvs_get_str_or_blob sta.mac
D (346) nvs: nvs_get sta.authmode 1
D (356) nvs: nvs_get_str_or_blob sta.pswd
D (356) nvs: nvs_get_str_or_blob sta.pmk
D (366) nvs: nvs_get sta.chan 1
D (366) nvs: nvs_get auto.conn 1
D (366) nvs: nvs_get bssid.set 1
D (376) nvs: nvs_get_str_or_blob sta.bssid
D (376) nvs: nvs_get sta.phym 1
D (376) nvs: nvs_get sta.phybw 1
D (386) nvs: nvs_get_str_or_blob sta.apsw
D (386) nvs: nvs_get_str_or_blob sta.apinfo
D (396) nvs: nvs_get sta.scan_method 1
D (396) nvs: nvs_get sta.sort_method 1
D (396) nvs: nvs_get sta.minrssi 1
D (406) nvs: nvs_get sta.minauth 1
D (406) nvs: nvs_get_str_or_blob ap.ssid
D (406) nvs: nvs_get_str_or_blob ap.mac
D (416) nvs: nvs_get_str_or_blob ap.passwd
D (416) nvs: nvs_get_str_or_blob ap.pmk
D (426) nvs: nvs_get ap.chan 1
D (426) nvs: nvs_get ap.authmode 1
D (426) nvs: nvs_get ap.hidden 1
D (436) nvs: nvs_get ap.max.conn 1
D (436) nvs: nvs_get bcn.interval 2
D (436) nvs: nvs_get ap.phym 1
D (446) nvs: nvs_get ap.phybw 1
D (446) nvs: nvs_get ap.sndchan 1
D (446) nvs: nvs_set_blob sta.mac 6
D (466) nvs: nvs_set_blob ap.mac 6
D (466) phy_init: loading PHY init data from application binary
D (466) nvs: nvs_open_from_partition phy 0
D (466) nvs: nvs_get cal_version 4
V (466) phy_init: phy_get_rf_cal_version: 359

D (476) nvs: nvs_get_str_or_blob cal_mac
D (476) nvs: nvs_get_str_or_blob cal_data
D (486) nvs: nvs_close 3
V (486) phy_init: register_chipv7_phy, init_data=0x3f4131c8, cal_data=0x3ffd02f4, mode=0
I (506) phy: phy_version: 359.0, e79c19d, Aug 31 2017, 17:06:07, 1, 0
D (506) event: SYSTEM_EVENT_STA_START
V (506) event: enter default callback
V (516) event: exit default callback
V (516) Application: ESP32 station start
V (526) Task: Initializing task 'G2', 0x3ffd2a60
V (526) Task: Task 'main' initialized, 0x3ffc1adc
V (526) Task: Task 'G2' initialized, 0x3ffd2a60
V (1286) esp_pthread: pthread_once: call init_routine 0x3ffb1c20
V (1296) esp_pthread: pthread_once: call init_routine 0x3ffb1c18
D (1316) event: SYSTEM_EVENT_STA_CONNECTED, ssid:Our place, ssid_len:9, bssid:f0:9f:c2:aa:b0:3f, channel:11, authmode:3
V (1316) event: enter default callback
V (1316) event: exit default callback
V (1326) Application: ESP32 station connected to AP
D (3646) event: SYSTEM_EVENT_STA_GOT_IP, ip:192.168.10.43, mask:255.255.255.0, gw:192.168.10.254
V (3646) event: enter default callback
I (3646) event: ip: 192.168.10.43, mask: 255.255.255.0, gw: 192.168.10.254
V (3656) event: exit default callback
V (3656) Application: ESP32 station got IP from connected AP
V (3676) Socket: [192.168.10.44, 1883, 0, 0x3ffd09d4]: Created socket
V (3676) Socket: [192.168.10.44, 1883, 0, 0x3ffd09d4]: Connecting
V (3696) Socket: [192.168.10.44, 1883, 0, 0x3ffd09d4]: Connected
CORRUPT HEAP: multi_heap.c:395 detected at 0x3ffd2ec0
abort() was called at PC 0x40086c69 on core 0
0x40086c69: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap_platform.h:55
(inlined by) multi_heap_free_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:395


Backtrace: 0x40087214:0x3ffc4650 0x40087313:0x3ffc4670 0x40086c69:0x3ffc4690 0x4008291a:0x3ffc46b0 0x40082d19:0x3ffc46d0 0x4000bec7:0x3ffc46f0 0x400fd209:0x3ffc4710 0x400f898c:0x3ffc4730
0x40087214: invoke_abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:553

0x40087313: abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:553

0x40086c69: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap_platform.h:55
(inlined by) multi_heap_free_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:395

0x4008291a: heap_caps_free at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:282

0x40082d19: _free_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:42

0x400fd209: sys_timeouts_mbox_fetch at /home/permal/esp/esp-idf/components/lwip/core/timers.c:570

0x400f898c: tcpip_thread at /home/permal/esp/esp-idf/components/lwip/api/tcpip.c:474


Rebooting...
ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0010,len:4
load:0x3fff0014,len:5404
load:0x40078000,len:0
load:0x40078000,len:12040
entry 0x40078f30

ESP_igrr
Posts: 2071
Joined: Tue Dec 01, 2015 8:37 am

Re: Diagnosing heap corruption

Postby ESP_igrr » Mon Oct 16, 2017 1:23 am

Several suggestions are given here: http://esp-idf.readthedocs.io/en/latest ... corruption

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Mon Oct 16, 2017 5:52 am

Sorry, that was a really badly worded question on my part. Let me rephrase.

Given a stack dump like the above, is there anyway to get an initial clue to what static buffer or caller that it detected the corruption in, if you haven't added any calls to the diagnostic/integrity checking functions yet? That backtrack looks interesting, but what is it a trace of, I can't see that mentioned in the docs?

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Diagnosing heap corruption

Postby ESP_Angus » Mon Oct 16, 2017 7:57 am

permal wrote: Given a stack dump like the above, is there anyway to get an initial clue to what static buffer or caller that it detected the corruption in, if you haven't added any calls to the diagnostic/integrity checking functions yet? That backtrack looks interesting, but what is it a trace of, I can't see that mentioned in the docs?
There's probably some context missing from the heap docs in this case, because those docs are supposed to take you from this kind of error to the steps to diagnosing it. (This is a criticism of the docs - which I wrote - not a criticism of you.)

The thing to focus on in this case is this line:

Code: Select all

CORRUPT HEAP: multi_heap.c:395 detected at 0x3ffd2ec0
Which tells you the address in heap (0x3ffd2ec0) where corruption was detected. This is a dynamic (allocated by malloc) address, not a static memory address.

The backtrace:

Code: Select all

0x40087214: invoke_abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:553

0x40087313: abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:553

0x40086c69: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap_platform.h:55
(inlined by) multi_heap_free_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:395

0x4008291a: heap_caps_free at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:282

0x40082d19: _free_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:42

0x400fd209: sys_timeouts_mbox_fetch at /home/permal/esp/esp-idf/components/lwip/core/timers.c:570

0x400f898c: tcpip_thread at /home/permal/esp/esp-idf/components/lwip/api/tcpip.c:474
Tells you that the heap corruption came when sys_timeouts_mbox_fetch (in the TCP/IP thread) called free(), and the free() implementation discovered that the heap was corrupt.

This is not the call which caused the heap corruption, though. It's just the call where the system discovered that corruption had occurred. Corruption happened silently at some previous time.

With that context in mind, try reading through the heap corruption docs again. If you have more questions about what exactly the docs are suggesting, please ask and I can explain further:
https://esp-idf.readthedocs.io/en/lates ... -detection

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Mon Oct 16, 2017 8:02 am

I will, thanks for the added explanation.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Mon Oct 16, 2017 9:46 pm

If I'm reading the docs correctly, by calling heap_caps_check_integrity_addr() I can check for out-of-bounds writes of any dynamically allocated memory area, including the buffer of std::vector<uint8_t>, retrieved via the data() method. Correct?

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Mon Oct 16, 2017 10:24 pm

Another question: Why does heap_caps_check_integrity_addr() take an intptr_t instead of a pointer? Taking a pointer would save everyone the need to cast to intptr_t which isn't a pointer, despite its name.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Diagnosing heap corruption

Postby ESP_Angus » Tue Oct 17, 2017 1:34 am

permal wrote:If I'm reading the docs correctly, by calling heap_caps_check_integrity_addr() I can check for out-of-bounds writes of any dynamically allocated memory area, including the buffer of std::vector<uint8_t>, retrieved via the data() method. Correct?
That's right. This is a performance boost over checking all the heap regions with heap_caps_check_integrity(), you can check only the heap containing a particular address rather than all heaps matching a particular capability (due to the ESP32's memory structure there are some non-contiguous heap regions in DRAM.)
permal wrote:Another question: Why does heap_caps_check_integrity_addr() take an intptr_t instead of a pointer? Taking a pointer would save everyone the need to cast to intptr_t which isn't a pointer, despite its name.
This is something of a "swings and roundabouts" situation. The use case I imagined for this function is chasing up a message like:

Code: Select all

CORRUPT HEAP: multi_heap.c:395 detected at 0x3ffd2ec0
So to narrow down when this corruption occurs you may want to run heap_caps_check_integrity_addr(0x3ffd2ec0, true) regularly, at least while debugging. This checks the heap containing this address. If this was a void * you would need to cast the other way, here.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Tue Oct 17, 2017 10:57 am

Ok, lets say I get an address of 0x1234. I add additional checks to heap_caps_check_integrity(0x1234) and recompile. What guarantees that the dynamic allocation of the overflown buffer places it in the same heap this time?

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Diagnosing heap corruption

Postby ESP_Angus » Wed Oct 18, 2017 2:16 am

permal wrote:Ok, lets say I get an address of 0x1234. I add additional checks to heap_caps_check_integrity(0x1234) and recompile. What guarantees that the dynamic allocation of the overflown buffer places it in the same heap this time?
No guarantee, it entirely depends on your code. If you see that heap corruption is happening at random places rather than the same place each time then call heap_caps_check_integrity() to check everything instead of heap_caps_check_integrity_addr().

Who is online

Users browsing this forum: No registered users and 83 guests