Understanding/Debugging Stack Smashing

0xffff
Posts: 41
Joined: Tue Jun 19, 2018 1:53 am

Understanding/Debugging Stack Smashing

Postby 0xffff » Thu Aug 02, 2018 7:39 pm

I have configured with extensive stack smashing and gdbstub on panic. On startup, as I launch various tasks I get stack smashing panic on WiFi.begin:

Code: Select all

I (2742) wifi: Init dynamic tx buffer num: 32
I (2742) wifi: Init data frame dynamic rx buffer num: 64
I (2745) wifi: Init management frame dynamic rx buffer num: 64
I (2751) wifi: wifi driver task: 3ffc80f4, prio:23, stack:3584
I (2756) wifi: Init static rx buffer num: 5
I (2760) wifi: Init dynamic rx buffer num: 0
I (2764) wifi: wifi power manager task: 0x3ffcaf68 prio: 21 stack: 2560

Stack smashing protect failure!

abort() was called at PC 0x400d309c on core 0
0x400d309c: __stack_chk_fail at /dev/projA/Firmware/esp-idf/components/esp32/./stack_check.c:36

Backtrace: 0x4008a670:0x3ffc2940 0x4008a73b:0x3ffc2960 0x400d309c:0x3ffc2980 0x400fadbf:0x3ffc29a0 0x400fb096:0x3ffc29d0 0x400fb39d:0x3ffc29f0 0x400d6c35:0x3ffc2b10 0x400d7127:0x3ffc2b30
0x4008a670: invoke_abort at /dev/projA/Firmware/esp-idf/components/esp32/./panic.c:572

0x4008a73b: abort at /dev/projA/Firmware/esp-idf/components/esp32/./panic.c:572

0x400d309c: __stack_chk_fail at /dev/projA/Firmware/esp-idf/components/esp32/./stack_check.c:36

0x400fadbf: WiFiGenericClass::getMode() at /dev/projA/Firmware/solsense_esp32/components/arduino-esp32/libraries/WiFi/src//WiFiGeneric.cpp:517

0x400fb096: WiFiGenericClass::enableSTA(bool) at /dev/projA/Firmware/solsense_esp32/components/arduino-esp32/libraries/WiFi/src//WiFiGeneric.cpp:517

0x400fb39d: WiFiSTAClass::begin(char const*, char const*, int, unsigned char const*, bool) at /dev/projA/Firmware/solsense_esp32/components/arduino-esp32/libraries/WiFi/src//WiFiSTA.cpp:604

0x400d6c35: wifi_init() at /dev/projA/Firmware/solsense_esp32/main/./connection_manager.cpp:236

0x400d7127: connection_manager(void*) at /dev/projA/Firmware/solsense_esp32/main/./connection_manager.cpp:1128


Entering gdb stub now.
$T0b#e6GNU gdb (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-build_apple-darwin16.3.0 --target=xtensa-esp32-elf".
If my understanding is correct, with "strong" stack smashing it places canaries at the end of each function's stack space and upon return checks to make sure the canary value is not overwritten, and the trace is showing that WiFiGenericClass::getMode() is somehow overwriting the bounds of its stack (which I find hard to believe). In any case, the stack for that function starts at 0x3ffc29a0 and moves downward. So I should be able to inspect the stack from the backtrace:

Code: Select all

Backtrace: 0x4008a670:0x3ffc2940 0x4008a73b:0x3ffc2960 0x400d309c:0x3ffc2980 0x400fadbf:0x3ffc29a0 0x400fb096:0x3ffc29d0 0x400fb39d:0x3ffc29f0 0x400d6c35:0x3ffc2b10 0x400d7127:0x3ffc2b30
Ignoring the panic code, the method of interest is at address 0x400fadbf, which has a stack address starting at 0x3ffc29a0. Looking at stack around there:

Code: Select all

(gdb) x/20x 0x3ffc2980
0x3ffc2980:	0x00000000	0x3ffc2360	0x3ffc2360	0x00000003
0x3ffc2990:	0x800fb3a0	0x3ffc29d0	0x3ffb4428	0x00000001
0x3ffc29a0:	0xa5a5a5a5	0xa5a5a5a5	0x00a5a5a5	0x2c000000
0x3ffc29b0:	0x3ffc2690	0x0000000c	0x00000000	0xff000000
0x3ffc29c0:	0x800d6c38	0x3ffc29f0	0x3ffb4428	0x3f4039cc
And .... I'm stuck - now I don't know what to do with that. I would have thought that I would see the address of the instruction it's returning to, then some values, then the canary bytes but I'm stuck interpreting this.

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Understanding/Debugging Stack Smashing

Postby kolban » Fri Aug 03, 2018 4:57 am

If the problem is reproducible, I'd first examine the amount of stack space on the task that is running the Arduino WiFi classes. I personally have low skills in Arduino ESP32 programming and do most of my work in ESP-IDF exclusively. As such, I'm afraid I don't know how tasks are created in an Arduino framework. Mirroring to ESP-IDF, one can ask a FreeRTOS task for its high-water marks for stack usage. Typically, one can then increase the stack to a larger than needed allocation size and then run the app for a while and see how much stack it actually needed. If you allocate less stack that it needs, all bets are off.
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

0xffff
Posts: 41
Joined: Tue Jun 19, 2018 1:53 am

Re: Understanding/Debugging Stack Smashing

Postby 0xffff » Fri Aug 03, 2018 5:50 am

Thanks. I tried that to no avail. My understanding is that the arduino WiFi library creates its own tasks per the messages that it produces.

I'm hoping someone can teach me or point me to resources where I can a bit more about how to analyze the stack contents - is my understanding of stack smashing correct? Is my understanding of the stack backtrace correct? Am I looking at the right memory values? Maybe an addition to the next version your book? ;)

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Understanding/Debugging Stack Smashing

Postby ESP_Angus » Fri Aug 03, 2018 6:38 am

0xffff wrote:the trace is showing that WiFiGenericClass::getMode() is somehow overwriting the bounds of its stack (which I find hard to believe).
How old is your version of the Arduino repo? There was (technically) a stack smashing bug in this function which was fixed very recently. The type of the "mode" local variable was originally uint8_t, but it should be wifi_mode_t. enums are 4 bytes wide, so calling esp_wifi_get_mode(&mode) overflows 3 bytes onto the caller's stack.

I say "technically" this is a stack smashing bug because stacks are 16 byte aligned, so nothing else ever occupied those 3 bytes which were clobbered. But stack smashing detection will still catch it, and rightly so! Updating Arduino should fix it.
0xffff wrote:

Code: Select all

(gdb) x/20x 0x3ffc2980
0x3ffc2980:	0x00000000	0x3ffc2360	0x3ffc2360	0x00000003
0x3ffc2990:	0x800fb3a0	0x3ffc29d0	0x3ffb4428	0x00000001
0x3ffc29a0:	0xa5a5a5a5	0xa5a5a5a5	0x00a5a5a5	0x2c000000
0x3ffc29b0:	0x3ffc2690	0x0000000c	0x00000000	0xff000000
0x3ffc29c0:	0x800d6c38	0x3ffc29f0	0x3ffb4428	0x3f4039cc
And .... I'm stuck - now I don't know what to do with that. I would have thought that I would see the address of the instruction it's returning to, then some values, then the canary bytes but I'm stuck interpreting this.
The return addresses are there, but they've had the high bits manipulated by the Xtensa calling convention. Replace any leading 0x8 with 0x4 to see them. Also, they point to the address *after* the calling address (ie the address to return to), so you need to subtract 3. ie 0x800d6c38 == 0x400d6c35.

(Look at the putEntry() function in panic.c to see how this is done to print the backtrace.)

The stack protector "canary" is a random 32-bit value (unique each time the ESP32 resets) so it's hard to tell what it was without a longer stack dump.
kolban wrote:If the problem is reproducible, I'd first examine the amount of stack space on the task that is running the Arduino WiFi classes. I personally have low skills in Arduino ESP32 programming and do most of my work in ESP-IDF exclusively. As such, I'm afraid I don't know how tasks are created in an Arduino framework. Mirroring to ESP-IDF, one can ask a FreeRTOS task for its high-water marks for stack usage. Typically, one can then increase the stack to a larger than needed allocation size and then run the app for a while and see how much stack it actually needed. If you allocate less stack that it needs, all bets are off.
Something to remember here is that this kind of stack smashing is different to what you've just described.

In ESP-IDF (or any embedded system, really), it's possible for some otherwise 100% correct C or C++ code to use more stack than it has available. The code is otherwise 100% correct, it just runs off the end of the stack space. With the default ESP-IDF configuration the FreeRTOS "stack canary", which puts bytes at the end of the block of memory set aside for the stack, should catch this and produce a fatal stack overflow crash.

The solution in that case is usually to add more stack to the task (or to reduce the task's overall stack usage). There's no other bug.

What's happened here is that the C or C++ code is wrong. The code has said "I need 1 byte to store this variable", and then it's sent that address as a pointer to a function which writes 4 bytes to that memory address. It doesn't matter if there are still kilobytes of unused stack memory in the task, this is a stack smash because those additional 3 bytes should not have been written to (in this case it happened to be benign, but often this class of bug leads to a crash or worse). This is what the toolchain "stack smashing protection" is detecting.

(In ESP-IDF we use "stack smashing" and "stack overflow" pretty consistently to differentiate these two types of stack problem, but technically they're both kinds of stack overflows.)

0xffff
Posts: 41
Joined: Tue Jun 19, 2018 1:53 am

Re: Understanding/Debugging Stack Smashing

Postby 0xffff » Fri Aug 03, 2018 3:09 pm

Thanks Angus, super-informative.

I am using ESP-IDF v3.0.2 but the latest arduino-esp doesn't compile with it because it depends on enums that don't exist in the core .h definitions, so I backtracked commits until it does compile, and that's where I got the stack-smashing bug. Probably I don't quite understand how to find the correct version of arduino-esp to use with the official released esp-idf.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Understanding/Debugging Stack Smashing

Postby ESP_Angus » Sun Aug 05, 2018 11:18 pm

0xffff wrote:Probably I don't quite understand how to find the correct version of arduino-esp to use with the official released esp-idf.
I think arduino-esp32 tends to track "master" rather than releases. You can tell which ESP-IDF version was used to build the precompiled libraries (used by the Arduino IDE) by looking at the commit log of the tools/sdk/lib directory:
https://github.com/espressif/arduino-es ... ls/sdk/lib

(The commit hashes for updates of the IDE are given there.)

I suspect the release/v3.1 branch would work as well.

0xffff
Posts: 41
Joined: Tue Jun 19, 2018 1:53 am

Re: Understanding/Debugging Stack Smashing

Postby 0xffff » Sun Aug 05, 2018 11:46 pm

Thanks - I ended up fixing the stack-smashing bug per your notes manually. Works.

Who is online

Users browsing this forum: No registered users and 94 guests