Hello, I've got an app which was originally built on ESP-IDF V4.2 and was ported up to IDF 4.4
After upgrading the framework I've had to work through a number of compatibility issues, or minor bugs in our code.
We have it mostly stable now, but still have some lingering heap corruption/stack clobbering going on. And I'm having a hard time tracing down the root cause of this one, so would definitely appreciate any suggestions/ideas that anyone has for how I can track down the root cause of this one...
What we're seeing on the console is CORRUPT HEAP: Bad head at 0x3f8819c8. Expected 0xabba1234 got 0x6d656761
which indicates a buffer overrun someplace, and if we decode that hex we see that it's a string "mega" which is not in our codebase anywhere, or in the IDF anywhere that I can see. So I unfortunately couldn't find an obvious culprit by just tracing that string...
I've attempted to turn on the GCC flags for stack smashing protection, but unfortunately I can only enable it at basic level (any higher level and my binary consumes too much iram). And the basic protection hasn't caught any offenders yet, so that's not going to help much.
Unfortunately jtag isn't an option with our hardware setup.
The above heap corruption seems to regularly be detected down in the bowels of esp-idf-4.4-release/components/heap/multi_heap_poisoning.c
But appears to commonly be triggered inside of a section of our code which is using the json component (cJSON) to generate json string data which gets shipped back via MQTT to our cloud services. I've been over all strings assembled into that json, and none of them contain "mega" either (including all strings generated at runtime that aren't in code).
The issue occurs reliably every 2-4 hours. Which makes it difficult to trace reliably (I've yet to narrow it down enough to be able to reliably increase this occurrance).
As I said above, any and all suggestions/feedback greatly appreciated.
Here is an example of one of the full error/stack trace:
[code]CORRUPT HEAP: Bad head at 0x3f8819c8. Expected 0xabba1234 got 0x6d656761
assert failed: multi_heap_free multi_heap_poisoning.c:253 (head != NULL)
Backtrace: 0x40082409:0x3ffe8f20 0x40096fc9:0x3ffe8f40 0x4009e18d:0x3ffe8f60 0x4009d20b:0x3ffe9080 0x40082931:0x3ffe90a0 0x4008f765:0x3ffe90c0 0x4016d5b2:0x3ffe90e0 0x4015fd22:0x3ffe9100 0x40160aa5:0x3ffe9120 0x4015b149:0x3ffe9140 0x4015b159:0x3ffe9160 0x4015ad42:0x3ffe9180 0x4015bf32:0x3ffe91a0 0x401c2973:0x3ffe91c0 0x4012b281:0x3ffe91e0 0x4012d169:0x3ffe9200 0x400f5bc5:0x3ffe9240 0x400f5c05:0x3ffe9260 0x400f5c36:0x3ffe9280 0x400f76b5:0x3ffe92a0 0x400f7714:0x3ffe94f0 0x400f559c:0x3ffe9510
0x40082409: panic_abort at /esp/esp-idf-4.4-release/components/esp_system/panic.c:402
0x40096fc9: esp_system_abort at /esp/esp-idf-4.4-release/components/esp_system/esp_system.c:128
0x4009e18d: __assert_func at /esp/esp-idf-4.4-release/components/newlib/assert.c:85
0x4009d20b: multi_heap_free at /esp/esp-idf-4.4-release/components/heap/multi_heap_poisoning.c:253
(inlined by) multi_heap_free at /esp/esp-idf-4.4-release/components/heap/multi_heap_poisoning.c:245
0x40082931: heap_caps_free at /esp/esp-idf-4.4-release/components/heap/heap_caps.c:367
0x4008f765: esp_mbedtls_mem_free at /esp/esp-idf-4.4-release/components/mbedtls/port/esp_mem.c:46
0x4016d5b2: mbedtls_free at /esp/esp-idf-4.4-release/components/mbedtls/mbedtls/library/platform.c:66
0x4015fd22: mbedtls_ssl_free at /esp/esp-idf-4.4-release/components/mbedtls/mbedtls/library/ssl_tls.c:6877
0x40160aa5: __wrap_mbedtls_ssl_free at /esp/esp-idf-4.4-release/components/mbedtls/port/dynamic/esp_ssl_tls.c:242
0x4015b149: esp_mbedtls_cleanup at /esp/esp-idf-4.4-release/components/esp-tls/esp_tls_mbedtls.c:309
0x4015b159: esp_mbedtls_conn_delete at /esp/esp-idf-4.4-release/components/esp-tls/esp_tls_mbedtls.c:255
0x4015ad42: esp_tls_conn_destroy at /esp/esp-idf-4.4-release/components/esp-tls/esp_tls.c:105
0x4015bf32: base_close at /esp/esp-idf-4.4-release/components/tcp_transport/transport_ssl.c:276
0x401c2973: esp_transport_close at /esp/esp-idf-4.4-release/components/tcp_transport/transport.c:222
0x4012b281: esp_mqtt_abort_connection at /esp/esp-idf-4.4-release/components/mqtt/esp-mqtt/mqtt_client.c:703
0x4012d169: esp_mqtt_client_publish at /esp/esp-idf-4.4-release/components/mqtt/esp-mqtt/mqtt_client.c:1846
0x400f5bc5: mqtt_publish at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_mqtt.c:296
0x400f5c05: mqtt_publish_message at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_mqtt.c:228
0x400f5c36: flow_mqtt_publish at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_mqtt.c:237
0x400f76b5: logship_handler at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_logshipper.c:54
0x400f7714: check_logshipper_queue at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_logshipper.c:37
0x400f559c: task_flow_cloud at /Development/Flow/fgv3-firmware/components/flow_cloud/flow_cloud_core.c:80
[/code]
Corrupt Heap in app built on IDF 4.4, looking for some suggestions in troubleshooting
-
- Posts: 2
- Joined: Thu Nov 17, 2022 11:56 pm
-
- Posts: 9749
- Joined: Thu Nov 26, 2015 4:08 am
Re: Corrupt Heap in app built on IDF 4.4, looking for some suggestions in troubleshooting
Can you try to turn on the (forensic) gdb stub? When the crash happens, idf.py monitor should dump you to a gdb prompt, that at least allows you to inspect the offended address a bit more, to see if there's text around the 'mega' that gives a hint to where it comes from.
Who is online
Users browsing this forum: danpf1, MicroController and 131 guests