Heap corruption diagnostics causing heap corruption?.
Posted: Sat Feb 03, 2018 6:31 pm
Hi,
This became rather lengthy, please bear with me.
Assuming Heap Debugging is set to "Comprehensive", when allocating a block of memory (multi_heap_malloc()), an additional poison_head_t and poison_tail_t are allocated. Then, it goes on to poison the allocated buffer in poison_allocated_region(), setting the head and tail with HEAD_CANARY_PATTERN (0xABBA1234) and TAIL_CANARY_PATTERN (0xBAAD5678), respectively. When back in multi_heap_malloc, we call verify_fill_pattern(), to verify that the buffer in between head an tail contains the FREE_FILL_WORD (0xfefefefe), and replacing it with MALLOC_FILL_WORD (0xcececece)
First question: What fills the memory with 0xfe in the first place?
Ok, so we now know that whenever we allocate a buffer, we can expect it to be filled with 0xce, correct?
Moving on to freeing memory.
Again, assuming that Heap Debugging is set to "Comprehensive", when freeing a block of memory (multi_heap_free()), the region being freed is verified via verify_allocated_region() such that the head and tail values are intact, i.e. no buffer under/overruns have happened (at least none that wrote any data). Next, the entire buffer, including head and tail are filled with 0xfe, after which the entire buffer is handed back to the heap.
All correct so far?
Now, to my actual problem. I am consistently (as in 100%) getting the stack trace at the bottom of this post very shortly after start up.
First off, I've replaced cJSON with version 1.7.1 as 1.6.0 which ships with ESP-IDF has a memory bug which causes a too small buffer to be allocated.
While calling cJSON_free in smooth::core::json::Value::to_string[abi:cxx11]() in Value.cpp:285 (using default mapping to free()) I expect that to eventually call function multi_heap_free(), which it seems to do just before entering newlib-code (the line numbers seems off once it enters this part of the code):
Deeper in the stack it also wants to allocate a block of memory for whatever reason(?), which eventually ends up in the corrupt heap message and total stop.
A newly allocated block should have a tail value of 0xBAAD5678, which the message also states, but look closer at what it got:
That is the MALLOC_FILL_WORD, the value used to fill the buffer between head and tail in multi_heap_malloc()/verify_fill_pattern().
I 'm not certain that I'm right, but I've literally spent days with cppcheck, valgrind and -fsanitize trying to come to the conclusion that the error lies in my code and not in the code for heap poisoning, but I always come back to the later one over and over. The read failing value in the tail is no random number, it is always MALLOC_FILL_WORD and there is no user code that ever writes that specific value.
I've also attempted to use a breakpoint to stop the application when it hits 0x3ffe04d1:
If I set it at the very start of the program, it seems the memory is first accessed by the Wifi:
Setting the same breakpoint slightly prior to where cJSON_free is called (as in the problem description above), like so:
it seems that cJSON_print() is the culprit:
And yes, if I don't call cJSON_Print(), I no longer have an issue. However, I can also just not start a task that literally only sleeps in the current use case and the issue also "goes away". I'm not sure what this tells us.
If I'm looking at all this the wrong way, please tell me. No one will be happier than me if it is possible to limit the search for this issue to my own code.
Update:
I'm currently running the exact same code with "Light impact" mode in which the part where it writes the *_FILL_WORD bytes to the buffer is disabled and I'm not getting any issues with destroyed tails.
This is the complete stacktrace from which the snippets above are taken:
This became rather lengthy, please bear with me.
Assuming Heap Debugging is set to "Comprehensive", when allocating a block of memory (multi_heap_malloc()), an additional poison_head_t and poison_tail_t are allocated. Then, it goes on to poison the allocated buffer in poison_allocated_region(), setting the head and tail with HEAD_CANARY_PATTERN (0xABBA1234) and TAIL_CANARY_PATTERN (0xBAAD5678), respectively. When back in multi_heap_malloc, we call verify_fill_pattern(), to verify that the buffer in between head an tail contains the FREE_FILL_WORD (0xfefefefe), and replacing it with MALLOC_FILL_WORD (0xcececece)
First question: What fills the memory with 0xfe in the first place?
Ok, so we now know that whenever we allocate a buffer, we can expect it to be filled with 0xce, correct?
Moving on to freeing memory.
Again, assuming that Heap Debugging is set to "Comprehensive", when freeing a block of memory (multi_heap_free()), the region being freed is verified via verify_allocated_region() such that the head and tail values are intact, i.e. no buffer under/overruns have happened (at least none that wrote any data). Next, the entire buffer, including head and tail are filled with 0xfe, after which the entire buffer is handed back to the heap.
All correct so far?
Now, to my actual problem. I am consistently (as in 100%) getting the stack trace at the bottom of this post very shortly after start up.
First off, I've replaced cJSON with version 1.7.1 as 1.6.0 which ships with ESP-IDF has a memory bug which causes a too small buffer to be allocated.
While calling cJSON_free in smooth::core::json::Value::to_string[abi:cxx11]() in Value.cpp:285 (using default mapping to free()) I expect that to eventually call function multi_heap_free(), which it seems to do just before entering newlib-code (the line numbers seems off once it enters this part of the code):
Code: Select all
0x4012bc71: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)
0x40088dbf: multi_heap_free at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
A newly allocated block should have a tail value of 0xBAAD5678, which the message also states, but look closer at what it got:
Code: Select all
CORRUPT HEAP: Bad tail at 0x3ffe04d1. Expected 0xbaad5678 got 0xcececece
I 'm not certain that I'm right, but I've literally spent days with cppcheck, valgrind and -fsanitize trying to come to the conclusion that the error lies in my code and not in the code for heap poisoning, but I always come back to the later one over and over. The read failing value in the tail is no random number, it is always MALLOC_FILL_WORD and there is no user code that ever writes that specific value.
I've also attempted to use a breakpoint to stop the application when it hits 0x3ffe04d1:
Code: Select all
esp_set_watchpoint(0, (void *)0x3ffe04d1, 4, ESP_WATCHPOINT_STORE);
Code: Select all
0x4013d814: tcpip_adapter_set_hostname_api at /home/permal/esp/esp-idf/components/tcpip_adapter/./tcpip_adapter_lwip.c:1150
V (1566) Task: Task 'I2CTask' initialized
Guru Meditation Error: Core 0 panic'ed (Unhandled debug exception)
Debug exception reason: Watchpoint 0 triggered
Core 0 register dump:
PC : 0x40088c92 PS : 0x00060a36 A0 : 0x80088d74 A1 : 0x3ffcea30
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
A2 : 0x3ffe04d0 A3 : 0x000005d8 A4 : 0xcececece A5 : 0x00000001
A6 : 0x00000001 A7 : 0xfefefefe A8 : 0x3ffe0aa8 A9 : 0x00000001
A10 : 0x00000001 A11 : 0x3ffe3ff9 A12 : 0x00003548 A13 : 0xfefefefe
A14 : 0x00003b9c A15 : 0x00000000 SAR : 0x00000018 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x4000c46c LEND : 0x4000c477 LCOUNT : 0x00000000
Backtrace: 0x40088c92:0x3ffcea30 0x40088d71:0x3ffcea60 0x40082e74:0x3ffcea80 0x40082ea5:0x3ffceaa0 0x40083349:0x3ffceac0 0x4000beaf:0x3ffceae0 0x4008535d:0x3ffceb00 0x400f1481:0x3ffceb20 0x400f085d:0x3ffceb50 0x400ec422:0x3ffcebb0
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40088d71: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37
0x4008535d: wifi_malloc at /home/permal/esp/esp-idf/components/esp32/./wifi_internal.c:28
0x400f1481: esf_buf_alloc at ??:?
0x400f085d: wdevProcessRxSucDataAll at ??:?
0x400ec422: ppTask at ??:?
Code: Select all
std::string Value::to_string()
{
esp_set_watchpoint(0, (void *)0x3ffe04d1, 4, ESP_WATCHPOINT_STORE);
auto* p = cJSON_Print(data);
esp_clear_watchpoint(0);
std::string s{p};
cJSON_free(p);
return s;
}
Code: Select all
Guru Meditation Error: Core 0 panic'ed (Unhandled debug exception)
Debug exception reason: Watchpoint 0 triggered
Core 0 register dump:
PC : 0x40088c92 PS : 0x00060636 A0 : 0x80088d74 A1 : 0x3ffbe800
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
A2 : 0x3ffe04d0 A3 : 0x00000094 A4 : 0xcececece A5 : 0x00000001
A6 : 0x00000001 A7 : 0xfefefefe A8 : 0x3ffe0564 A9 : 0x00000001
A10 : 0x00000001 A11 : 0x3ffe3ff9 A12 : 0x00003a8c A13 : 0xfefefefe
A14 : 0x00003b9c A15 : 0xff000000 SAR : 0x0000001e EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x4000c46c LEND : 0x4000c477 LCOUNT : 0x00000000
Backtrace: 0x40088c92:0x3ffbe800 0x40088d71:0x3ffbe830 0x40082e74:0x3ffbe850 0x40082ea5:0x3ffbe870 0x40083349:0x3ffbe890 0x4000beaf:0x3ffbe8b0 0x40193370:0x3ffbe8d0 0x40193b1c:0x3ffbe920 0x40174777:0x3ffbe940 0x401206c7:0x3ffbe960 0x40120878:0x3ffbeab0 0x40110809:0x3ffbead0 0x4015e80a:0x3ffbeb10 0x4015ea17:0x3ffbec40 0x4011265c:0x3ffbeca0 0x400d1112:0x3ffbfc30
0x40088c92: verify_fill_pattern at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40088d71: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37
0x40193370: print at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82
0x40193b1c: cJSON_Print at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82
0x40174777: smooth::core::json::Value::to_string[abi:cxx11]() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/json/Value.cpp:282
0x401206c7: G2Alarm::event(AnalogValue const&) at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90
0x40120878: non-virtual thunk to G2Alarm::event(AnalogValue const&) at ??:?
0x40110809: smooth::core::ipc::TaskEventQueue<AnalogValue>::forward_to_event_queue() at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90
0x4015e80a: smooth::core::Task::exec() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:131
0x4015ea17: smooth::core::Task::start() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:56
0x4011265c: app_main at /home/permal/code/G2-Alarmsystem/main/./main.cpp:21
0x400d1112: main_task at /home/permal/esp/esp-idf/components/esp32/./cpu_start.c:455
CPU halted.
If I'm looking at all this the wrong way, please tell me. No one will be happier than me if it is possible to limit the search for this issue to my own code.
Update:
I'm currently running the exact same code with "Light impact" mode in which the part where it writes the *_FILL_WORD bytes to the buffer is disabled and I'm not getting any issues with destroyed tails.
This is the complete stacktrace from which the snippets above are taken:
Code: Select all
CORRUPT HEAP: Bad tail at 0x3ffe04d1. Expected 0xbaad5678 got 0xcececece
CORRUPT HEAP: multi_heap.c:370 detected at 0x3ffe04d8
abort() was called at PC 0x400889bb on core 0
0x400889bb: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:339
(inlined by) multi_heap_malloc_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:370
Backtrace: 0x40089108:0x3ffbe3e0 0x400892ab:0x3ffbe400 0x400889bb:0x3ffbe420 0x40088d5d:0x3ffbe440 0x40082e74:0x3ffbe460 0x40082ea5:0x3ffbe480 0x40083349:0x3ffbe4a0 0x4000bbc2:0x3ffbe4c0 0x4005918d:0x3ffbe4e0 0x40058d62:0x3ffbe540 0x40130002:0x3ffbe560 0x4012bd01:0x3ffbe820 0x4012bc71:0x3ffbe870 0x40088dbf:0x3ffbe8a0 0x40082f62:0x3ffbe8c0 0x40083355:0x3ffbe8e0 0x4000bec7:0x3ffbe900 0x40193cae:0x3ffbe920 0x40174785:0x3ffbe940 0x401206c7:0x3ffbe960 0x40120878:0x3ffbeab0 0x40110809:0x3ffbead0 0x4015e7fe:0x3ffbeb10 0x4015ea0b:0x3ffbec40 0x4011265c:0x3ffbeca0 0x400d1112:0x3ffbfc30
0x40089108: invoke_abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:648
0x400892ab: abort at /home/permal/esp/esp-idf/components/esp32/./panic.c:648
0x400889bb: multi_heap_assert at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:339
(inlined by) multi_heap_malloc_impl at /home/permal/esp/esp-idf/components/heap/./multi_heap.c:370
0x40088d5d: multi_heap_malloc at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40082e74: heap_caps_malloc at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40082ea5: heap_caps_malloc_default at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40083349: _malloc_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:37
0x40130002: _vfiprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vfprintf.c:865 (discriminator 3)
0x4012bd01: fiprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/fiprintf.c:50
0x4012bc71: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)
0x40088dbf: multi_heap_free at /home/permal/esp/esp-idf/components/heap/./multi_heap_poisoning.c:339
0x40082f62: heap_caps_free at /home/permal/esp/esp-idf/components/heap/./heap_caps.c:136
0x40083355: _free_r at /home/permal/esp/esp-idf/components/newlib/./syscalls.c:42
0x40193cae: cJSON_free at /home/permal/esp/esp-idf/components/json/library/cJSON.c:82
0x40174785: smooth::core::json::Value::to_string[abi:cxx11]() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/json/Value.cpp:285
0x401206c7: G2Alarm::event(AnalogValue const&) at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90
0x40120878: non-virtual thunk to G2Alarm::event(AnalogValue const&) at ??:?
0x40110809: smooth::core::ipc::TaskEventQueue<AnalogValue>::forward_to_event_queue() at /home/permal/code/G2-Alarmsystem/components/Smooth/include/smooth/core/ipc/TaskEventQueue.h:90
0x4015e7fe: smooth::core::Task::exec() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:131
0x4015ea0b: smooth::core::Task::start() at /home/permal/code/G2-Alarmsystem/components/Smooth/core/Task.cpp:56
0x4011265c: app_main at /home/permal/code/G2-Alarmsystem/main/./main.cpp:20
0x400d1112: main_task at /home/permal/esp/esp-idf/components/esp32/./cpu_start.c:455