Diagnosing heap corruption

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Wed Oct 18, 2017 4:48 am

Ok, that's what I thought. Thanks.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Wed Oct 18, 2017 5:26 am

Code: Select all

CORRUPT HEAP: Bad head at 0x3ffd42dc. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd424c. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd5158. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd424c. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd424c. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd4270. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd424c. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd424c. Expected 0xabba1234 got 0xfefefefe
CORRUPT HEAP: Bad head at 0x3ffd4270. Expected 0xabba1234 got 0xfefefefe
The above messages are fairly reproducable, but which of the two cases in the docs do they indicate? The following two descriptions are very similar, I can't make out the difference.
If an application crashes reading/writing an address related to 0xFEFEFEFE, this indicates it is reading heap memory after it has been freed (a “use after free bug”.) The application should be changed to not access heap memory after it has been freed.

If the IDF heap allocator fails because the pattern 0xFEFEFEFE was not found in freed memory then this indicates the app has a use-after-free bug where it is writing to memory which has already been freed.
Also, there is no mention of 0xabba1234, is that value of any significance?

I'm guessing on the latter, since the application continues to run until I eventually get a crash, though I don't know if they are actually related:

Code: Select all

Guru Meditation Error of type InstrFetchProhibited occurred on core  0. Exception was unhandled.
Register dump:
PC      : 0x00000000  PS      : 0x00060530  A0      : 0x8013c1d6  A1      : 0x3ffd3a60  
A2      : 0x3ffc7bc0  A3      : 0x3f40f774  A4      : 0x3ffd3b20  A5      : 0x00000000  
A6      : 0x3ffd3a70  A7      : 0x3ffca3a0  A8      : 0x80144368  A9      : 0x3ffd3a30  
A10     : 0x3ffc7bc0  A11     : 0x00000000  A12     : 0x00000004  A13     : 0x3ffd3b20  
A14     : 0x00000000  A15     : 0xff000000  SAR     : 0x00000018  EXCCAUSE: 0x00000014  
EXCVADDR: 0x00000000  LBEG    : 0x4000c28c  LEND    : 0x4000c296  LCOUNT  : 0x00000000  

Backtrace: 0x00000000:0x3ffd3a60 0x4013c1d3:0x3ffd3c10 0x4013c259:0x3ffd3c30 0x4013fbb9:0x3ffd3c50 0x40145829:0x3ffd3c80 0x4015a15d:0x3ffd3ca0 0x4013c26c:0x3ffd3cc0 0x40138161:0x3ffd3ce0 0x401381dd:0x3ffd3d40
0x4013c1d3: _ZN6smooth11application7network4mqtt10MqttClient11send_packetERNS2_6packet10MQTTPacketE at /home/permal/esp/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/xtensa-esp32-elf/bits/gthr-default.h:778

0x4013c259: _ZThn80_N6smooth11application7network4mqtt10MqttClient11send_packetERNS2_6packet10MQTTPacketE at ??:?

0x4013fbb9: _ZN6smooth11application7network4mqtt11Publication12publish_nextERNS2_11IMqttClientE at /home/permal/code/SmoothTest/components/Smooth/application/network/mqtt/Publication.cpp:116

0x40145829: _ZN6smooth11application7network4mqtt5state8RunState4tickEv at /home/permal/code/SmoothTest/components/Smooth/application/network/mqtt/state/RunState.cpp:25

0x4015a15d: _ZN6smooth11application7network4mqtt5state7MqttFSMINS3_13MQTTBaseStateEE4tickEv at /home/permal/code/SmoothTest/components/Smooth/include/smooth/application/network/mqtt/state/MqttFSM.h:81

0x4013c26c: _ZN6smooth11application7network4mqtt10MqttClient4tickEv at /home/permal/esp/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/5.2.0/xtensa-esp32-elf/bits/gthr-default.h:778

0x40138161: _ZN6smooth4core4Task4execEv at /home/permal/code/SmoothTest/components/Smooth/core/Task.cpp:106

0x401381dd: _ZZN6smooth4core4Task5startEvENUlPvE_4_FUNES2_ at /home/permal/code/SmoothTest/components/Smooth/core/Task.cpp:63
 (inlined by) _FUN at /home/permal/code/SmoothTest/components/Smooth/core/Task.cpp:63

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Diagnosing heap corruption

Postby ESP_Angus » Wed Oct 18, 2017 6:52 am

permal wrote: The above messages are fairly reproducable, but which of the two cases in the docs do they indicate? The following two descriptions are very similar, I can't make out the difference.
If an application crashes reading/writing an address related to 0xFEFEFEFE, this indicates it is reading heap memory after it has been freed (a “use after free bug”.) The application should be changed to not access heap memory after it has been freed.
Also, there is no mention of 0xabba1234, is that value of any significance?
I have some good news and some bad news. I was just looking into a bug which looked very similar, and turns out it's a race condition bug in the "Comprehensive" level heap allocator. This error does not indicate heap corruption and the program can keep running normally after it is displayed.

0xABBA1234 is the poison "head" word which is written before any block of memory which is allocated in heap, when the debug level is set to "Light Impact" or "Comprehensive". When the debug level is set to "Comprehensive", memory is also overwritten to 0xFEFEFEFE when freed. heap_caps_check_integrity() verifies both these things for all heap blocks, at these heap debugging levels.

The race is that the heap poisoning implementation doesn't lock the heap before setting 0xFEFEFEFE in multi_heap_free(). This means that there is a brief window where all the data (including the 0xABBA1234 "head word") is written to 0xFEFEFEFE. This doesn't matter for normal operation, because the memory is in the process of being freed so noone should be using it. But there is a race where heap_caps_check_integrity() may come to verify the block at this exact moment, and it sees all 0xFEFEFEFE instead of the expected head word.

Will have a fix ASAP, but for now you can disregard "errors" that meet the form "Bad head at X. Expected 0xabba1234 got 0xfefefefe".

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Wed Oct 18, 2017 7:14 am

Alright. Then I can concentrate on looking at the logic surrounding the crash instead of trying to find a non-existent corrupted heap. Thanks for the quick reply.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Diagnosing heap corruption

Postby ESP_Angus » Fri Oct 20, 2017 9:07 am

ESP_Angus wrote: race condition bug in the "Comprehensive" level heap allocator. This error does not indicate heap corruption and the program can keep running normally after it is displayed.
ESP_Angus wrote: Will have a fix ASAP, but for now you can disregard "errors" that meet the form "Bad head at X. Expected 0xabba1234 got 0xfefefefe".
This fix is now in the IDF master branch on github.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Diagnosing heap corruption

Postby permal » Fri Oct 20, 2017 9:09 am

Thanks. Good job.

Who is online

Users browsing this forum: No registered users and 87 guests