Heap corruption diagnostics causing heap corruption?.

cmorgan
Posts: 89
Joined: Thu Aug 24, 2017 12:52 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby cmorgan » Wed Feb 07, 2018 5:17 pm

Note that cJSON printing reallocates memory as it appends values to the string. If the issue was somewhere else in the system cJSON’s print does a ton of calls that result in heap checking.

Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Heap corruption diagnostics causing heap corruption?.

Postby permal » Wed Feb 07, 2018 5:56 pm

cmorgan wrote:Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
Yes, they do and so am I. I've been up and down that code, run my own code through both -fsanitize=nnn, Valgrind and cppcheck. Can't find anything.

@ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.

Out of curiosity, why is newlib doing allocating new memory during our call to free(), as seen in the stack trace in the first post? I see it uses vfprintf, is it logging something?

caseymdk
Posts: 15
Joined: Wed Feb 07, 2018 2:35 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby caseymdk » Wed Feb 07, 2018 11:30 pm

permal wrote: @ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.
Note that my implementation does not crash immediately, I have to leave some memory unfreed (in a separate part of the application, not related to cJSON) and let the heap usage build up a bit before I get the crash. I don't know what this means, but if it's not crashing, see if you can put the cJSON_Print function in a loop, and don't free all the memory? Maybe do:

Code: Select all

while(1) {
vTaskDelay(10);
cJSON *json = cJSON_CreateObject();
cJSON_AddObject(....);
char *text = cJSON_Print(json);
cJSON_Delete(json);
// Don't free the text
}
I need to get some of this code finished for work, but as soon as I'm done I'll work on a test case.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby ESP_Angus » Thu Feb 08, 2018 4:00 am

permal wrote:
cmorgan wrote:Not saying it isn’t an issue in cJSON but they run their unit tests under valgrind I think so I’m surprised we are catching these issues here.
Yes, they do and so am I. I've been up and down that code, run my own code through both -fsanitize=nnn, Valgrind and cppcheck. Can't find anything.

@ESP_Angus I'm not been able to modify my code such that it runs without I2C and still crashes. Anything that changes the timing seems to affect this issue.
OK.
Out of curiosity, why is newlib doing allocating new memory during our call to free(), as seen in the stack trace in the first post? I see it uses vfprintf, is it logging something?
newlib initializes a per-task lock for stdout on demand. abort() prints some error messages. So if an abort message is the first printf that the task does, it will try to malloc() in this code path to create the lock. Hence the double-crash on some heap corruption. There's no super-easy way around this, unfortunately - we can swap heap corruption aborts to print with ets_printf, but then they stop being thread safe so will print line noise under some situations.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Heap corruption diagnostics causing heap corruption?.

Postby permal » Thu Feb 08, 2018 11:08 pm

Ok, thanks for that explanation.

Just an update - my device has now been running for 28h with "light impact", creating, printing and destroying cJSON objects (all with different contents) approx 10 times per second with the rest of the system working at as normal. If cJSON actually is corrupting the heap I find this rather remarkable.

Tomorrow I will add a much higher pressure on the heap by allocating a large amount of one byte sized memory segments just before the call to cJSON_print(), then releasing them after the call, with calls to heap_caps_check_integrity_all() in between. I expect this to increase increase the chance of any errors in cJSON to hit these segments, if there actually is an error in the library. I'm opting for many small segments so that the majority of the memory will be the tail and head canary bytes, easily triggering the diagnostics if something corrupts the heap. Thoughts on this?

FYI, I made an experiment to increase the canary tail to 3 * uint32_t, and all three bytes gets overwritten with the MALLOC_FILL_WORD.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby ESP_Angus » Fri Feb 09, 2018 3:50 am

Hi folks,

Thanks everyone for your patience with this issue. I've found a bug in realloc() which is almost certainly responsible for this. It's not in the "Comprehensive" heap checking feature as such, but it is very significantly more likely to be triggered when Comprehensive heap corruption is turned on (because we don't resize buffers in place in Comprehensive mode).

cJSON_Print() shrinks its output buffer using realloc before it returns, and this is why it's so good at triggering this bug.

Fix should be available shortly.


Angus

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Heap corruption diagnostics causing heap corruption?.

Postby permal » Fri Feb 09, 2018 7:06 am

Good news, Angus. I'm very happy to hear that.

caseymdk
Posts: 15
Joined: Wed Feb 07, 2018 2:35 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby caseymdk » Fri Feb 09, 2018 7:44 pm

Fantastic Angus! Thank you so much, keep me from tearing my hair out. :)

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Heap corruption diagnostics causing heap corruption?.

Postby permal » Fri Feb 09, 2018 7:49 pm

I just updated to the latest IDF incl. submodules and am now running with comprehensive mode. No longer crashed on startup so that's a good sign. Will leave it overnight and report back.

caseymdk
Posts: 15
Joined: Wed Feb 07, 2018 2:35 am

Re: Heap corruption diagnostics causing heap corruption?.

Postby caseymdk » Fri Feb 09, 2018 8:33 pm

permal wrote:I just updated to the latest IDF incl. submodules and am now running with comprehensive mode. No longer crashed on startup so that's a good sign. Will leave it overnight and report back.

Code: Select all

-        memcpy(new_p, ptr, old_size);
+        memcpy(new_p, ptr, MIN(size, old_size));
Whew...that seems like a major buffer overrun bug! Was that a serious one or am I misreading/misunderstanding?

Who is online

Users browsing this forum: ESP_Sprite and 155 guests