Heap corruption diagnostics causing heap corruption?.
Re: Heap corruption diagnostics causing heap corruption?.
Note that cJSON printing reallocates memory as it appends values to the string. If the issue was somewhere else in the system cJSON’s print does a ton of calls that result in heap checking.
Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
Re: Heap corruption diagnostics causing heap corruption?.
Yes, they do and so am I. I've been up and down that code, run my own code through both -fsanitize=nnn, Valgrind and cppcheck. Can't find anything.cmorgan wrote:Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
@ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.
Out of curiosity, why is newlib doing allocating new memory during our call to free(), as seen in the stack trace in the first post? I see it uses vfprintf, is it logging something?
Re: Heap corruption diagnostics causing heap corruption?.
Note that my implementation does not crash immediately, I have to leave some memory unfreed (in a separate part of the application, not related to cJSON) and let the heap usage build up a bit before I get the crash. I don't know what this means, but if it's not crashing, see if you can put the cJSON_Print function in a loop, and don't free all the memory? Maybe do:permal wrote: @ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.
Code: Select all
while(1) {
vTaskDelay(10);
cJSON *json = cJSON_CreateObject();
cJSON_AddObject(....);
char *text = cJSON_Print(json);
cJSON_Delete(json);
// Don't free the text
}
Re: Heap corruption diagnostics causing heap corruption?.
OK.permal wrote:Yes, they do and so am I. I've been up and down that code, run my own code through both -fsanitize=nnn, Valgrind and cppcheck. Can't find anything.cmorgan wrote:Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
@ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.
newlib initializes a per-task lock for stdout on demand. abort() prints some error messages. So if an abort message is the first printf that the task does, it will try to malloc() in this code path to create the lock. Hence the double-crash on some heap corruption. There's no super-easy way around this, unfortunately - we can swap heap corruption aborts to print with ets_printf, but then they stop being thread safe so will print line noise under some situations.Out of curiosity, why is newlib doing allocating new memory during our call to free(), as seen in the stack trace in the first post? I see it uses vfprintf, is it logging something?
Re: Heap corruption diagnostics causing heap corruption?.
Ok, thanks for that explanation.
Just an update - my device has now been running for 28h with "light impact", creating, printing and destroying cJSON objects (all with different contents) approx 10 times per second with the rest of the system working at as normal. If cJSON actually is corrupting the heap I find this rather remarkable.
Tomorrow I will add a much higher pressure on the heap by allocating a large amount of one byte sized memory segments just before the call to cJSON_print(), then releasing them after the call, with calls to heap_caps_check_integrity_all() in between. I expect this to increase increase the chance of any errors in cJSON to hit these segments, if there actually is an error in the library. I'm opting for many small segments so that the majority of the memory will be the tail and head canary bytes, easily triggering the diagnostics if something corrupts the heap. Thoughts on this?
FYI, I made an experiment to increase the canary tail to 3 * uint32_t, and all three bytes gets overwritten with the MALLOC_FILL_WORD.
Just an update - my device has now been running for 28h with "light impact", creating, printing and destroying cJSON objects (all with different contents) approx 10 times per second with the rest of the system working at as normal. If cJSON actually is corrupting the heap I find this rather remarkable.
Tomorrow I will add a much higher pressure on the heap by allocating a large amount of one byte sized memory segments just before the call to cJSON_print(), then releasing them after the call, with calls to heap_caps_check_integrity_all() in between. I expect this to increase increase the chance of any errors in cJSON to hit these segments, if there actually is an error in the library. I'm opting for many small segments so that the majority of the memory will be the tail and head canary bytes, easily triggering the diagnostics if something corrupts the heap. Thoughts on this?
FYI, I made an experiment to increase the canary tail to 3 * uint32_t, and all three bytes gets overwritten with the MALLOC_FILL_WORD.
Re: Heap corruption diagnostics causing heap corruption?.
Hi folks,
Thanks everyone for your patience with this issue. I've found a bug in realloc() which is almost certainly responsible for this. It's not in the "Comprehensive" heap checking feature as such, but it is very significantly more likely to be triggered when Comprehensive heap corruption is turned on (because we don't resize buffers in place in Comprehensive mode).
cJSON_Print() shrinks its output buffer using realloc before it returns, and this is why it's so good at triggering this bug.
Fix should be available shortly.
Angus
Thanks everyone for your patience with this issue. I've found a bug in realloc() which is almost certainly responsible for this. It's not in the "Comprehensive" heap checking feature as such, but it is very significantly more likely to be triggered when Comprehensive heap corruption is turned on (because we don't resize buffers in place in Comprehensive mode).
cJSON_Print() shrinks its output buffer using realloc before it returns, and this is why it's so good at triggering this bug.
Fix should be available shortly.
Angus
Re: Heap corruption diagnostics causing heap corruption?.
Good news, Angus. I'm very happy to hear that.
Re: Heap corruption diagnostics causing heap corruption?.
Fantastic Angus! Thank you so much, keep me from tearing my hair out.
Re: Heap corruption diagnostics causing heap corruption?.
I just updated to the latest IDF incl. submodules and am now running with comprehensive mode. No longer crashed on startup so that's a good sign. Will leave it overnight and report back.
Re: Heap corruption diagnostics causing heap corruption?.
permal wrote:I just updated to the latest IDF incl. submodules and am now running with comprehensive mode. No longer crashed on startup so that's a good sign. Will leave it overnight and report back.
Code: Select all
- memcpy(new_p, ptr, old_size);
+ memcpy(new_p, ptr, MIN(size, old_size));
Who is online
Users browsing this forum: No registered users and 143 guests