heap corruption
Posted: Thu Apr 26, 2018 1:57 pm
I enabled heap corruption detection and I'm seeing strange things.
Using EspStackTraceDecoder.jar I see:
This was triggered by sending a response to a REST request to the mongoose webserver.
The code that sends the response is this:
Now, my first thought is that maybe the buffer isn't null terminated, and the heap corruption detection has caused the end of the buffer not to be zeroed, so I put in a logging statement:
Now, the code runs without triggering the heap corruption, and the logging looks fine:
Hmm, didn't expect that. The length looks fine. Maybe adding the line of code triggered a fortunate series of events and the string is now accidentally null terminated, and the real error is that json_dumps doesn't return a null terminated string? It has to though really, as it doesn't return a length. Yep, see https://github.com/akheron/jansson/blob ... src/dump.c and https://github.com/akheron/jansson/blob ... c/memory.c
Hmm. OK, let's remove the logging again and run again to prove the problem is repeatable. No, the function now works without heap corruption. Tried it a bunch of times.
I'm now struggling to understand what could be happening. Look at the corrupt heap error: "Expected 0xbaad5678 got 0xbaac5678" Whatever this is, it's off by exactly 64K?
Adding and removing a statement "fixed" a heap error with a "bad tail" off by 64k.
I know the ESP32 uses a complex "multi-heap" with different areas. Are there any caveats to this?
Code: Select all
CORRUPT HEAP: Bad tail at 0x3ffe270a. Expected 0xbaad5678 got 0xbaac5678
assertion "head != NULL" failed: file "/Users/tim/ESP/esp-idf/components/heap/multi_heap_poisoning.c", line 201, function: multi_heap_free
abort() was called at PC 0x400dca43 on core 0
Backtrace: 0x4008e150:0x3ffc3a60 0x4008e2f3:0x3ffc3a80 0x400dca43:0x3ffc3aa0 0x4008de61:0x3ffc3ad0 0x4008292e:0x3ffc3af0 0x40082949:0x3ffc3b10 0x400829f2:0x3ffc3b30 0x40082df4:0x3ffc3b50 0x40082e57:0x3ffc3ba0 0x400fa29f:0x3ffc3bc0 0x400fa31d:0x3ffc3be0 0x400fea50:0x3ffc3c00 0x400feb63:0x3ffc3c30 0x400fefa3:0x3ffc3c60 0x400ff1fd:0x3ffc3ca0 0x400fb071:0x3ffc3cf0 0x400d3ebc:0x3ffc3d10
CPU halted.
Code: Select all
0x400dca43: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
0x4008e150: invoke_abort at /Users/tim/ESP/esp-idf/components/esp32/panic.c:648
0x4008e2f3: abort at /Users/tim/ESP/esp-idf/components/esp32/panic.c:648
0x400dca43: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)
0x4008de61: multi_heap_free at /Users/tim/ESP/esp-idf/components/heap/multi_heap_poisoning.c:299
0x4008292e: heap_caps_free at /Users/tim/ESP/esp-idf/components/heap/heap_caps.c:123
0x40082949: heap_caps_realloc at /Users/tim/ESP/esp-idf/components/heap/heap_caps.c:123
0x400829f2: heap_caps_realloc_default at /Users/tim/ESP/esp-idf/components/heap/heap_caps.c:123
0x40082df4: trace_realloc at /Users/tim/ESP/esp-idf/components/heap/heap_trace.c:367
0x40082e57: __wrap_realloc at /Users/tim/ESP/esp-idf/components/heap/heap_trace.c:402
0x400fa29f: mbuf_resize at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400fa31d: mbuf_trim at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400fea50: mg_if_sent_cb at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400feb63: mg_write_to_socket at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400fefa3: mg_mgr_handle_conn at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400ff1fd: mg_socket_if_poll at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400fb071: mg_mgr_poll at /Users/tim/ESP/workspace/firmware/components/Mongoose/mongoose.c:11327
0x400d3ebc: net_task at /Users/tim/ESP/workspace/firmware/main/net_task.c:390 (discriminator 1)
The code that sends the response is this:
Code: Select all
char *msg=json_dumps(jmsg,JSON_COMPACT);
if (msg){
int msglen=strlen(msg);
mg_printf(nc, "HTTP/1.1 200 OK\r\nContent-Length: %lu\r\n\r\n%.*s",
(unsigned long) msglen, (int) msglen, msg);
vPortFree(msg);
}
Code: Select all
char *msg=json_dumps(jmsg,JSON_COMPACT);
if (msg){
int msglen=strlen(msg);
ESP_LOGI(tag,"sending %s, length %i",msg,msglen);
mg_printf(nc, "HTTP/1.1 200 OK\r\nContent-Length: %lu\r\n\r\n%.*s",
(unsigned long) msglen, (int) msglen, msg);
vPortFree(msg);
}
Code: Select all
I (27886) Rest: sending {"acknowledged":true}, length 21
Hmm. OK, let's remove the logging again and run again to prove the problem is repeatable. No, the function now works without heap corruption. Tried it a bunch of times.
I'm now struggling to understand what could be happening. Look at the corrupt heap error: "Expected 0xbaad5678 got 0xbaac5678" Whatever this is, it's off by exactly 64K?
Adding and removing a statement "fixed" a heap error with a "bad tail" off by 64k.
I know the ESP32 uses a complex "multi-heap" with different areas. Are there any caveats to this?