We are facing a problem with the ESP32 halting after a stack overflow, what we have noticed is, if we continuously invoke a stack overflow, after some time the ESP32 remains in halted state and does not reset.
The GPIO states after the ESP32 is halted are similar to that in reset.
We have kept a gpio_set() inside the panic_hanlder() function and confirmed that the code enters the panic handler, but does not exit the same.
Code: Select all
static void panic_handler(void *frame, bool pseudo_excause)
{
panic_info_t info = { 0 };
/*
* Setup environment and perform necessary architecture/chip specific
* steps here prior to the system panic handler.
* */
int core_id = cpu_hal_get_core_id();
// If multiple cores arrive at panic handler, save frames for all of them
g_exc_frames[core_id] = frame;
#if !CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE
// These are cases where both CPUs both go into panic handler. The following code ensures
// only one core proceeds to the system panic handler.
if (pseudo_excause) {
#define BUSY_WAIT_IF_TRUE(b) { if (b) while(1); }
// For WDT expiry, pause the non-offending core - offending core handles panic
BUSY_WAIT_IF_TRUE(panic_get_cause(frame) == PANIC_RSN_INTWDT_CPU0 && core_id == 1);
BUSY_WAIT_IF_TRUE(panic_get_cause(frame) == PANIC_RSN_INTWDT_CPU1 && core_id == 0);
// For cache error, pause the non-offending core - offending core handles panic
if (panic_get_cause(frame) == PANIC_RSN_CACHEERR && core_id != esp_cache_err_get_cpuid()) {
// Only print the backtrace for the offending core in case of the cache error
g_exc_frames[core_id] = NULL;
while (1) {
;
}
}
}
// Need to reconfigure WDTs before we stall any other CPU
esp_panic_handler_reconfigure_wdts();
esp_rom_delay_us(1);
SOC_HAL_STALL_OTHER_CORES();
// gpio_set_level(4, 1);
#endif
esp_ipc_isr_stall_abort();
if (esp_cpu_in_ocd_debug_mode()) {
#if __XTENSA__
if (!(esp_ptr_executable(cpu_ll_pc_to_ptr(panic_get_address(frame))) && (panic_get_address(frame) & 0xC0000000U))) {
/* Xtensa ABI sets the 2 MSBs of the PC according to the windowed call size
* Incase the PC is invalid, GDB will fail to translate addresses to function names
* Hence replacing the PC to a placeholder address in case of invalid PC
*/
panic_set_address(frame, (uint32_t)&_invalid_pc_placeholder);
}
#endif
if (panic_get_cause(frame) == PANIC_RSN_INTWDT_CPU0
#if !CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE
|| panic_get_cause(frame) == PANIC_RSN_INTWDT_CPU1
#endif
) {
wdt_hal_write_protect_disable(&wdt0_context);
wdt_hal_handle_intr(&wdt0_context);
wdt_hal_write_protect_enable(&wdt0_context);
}
}
// Convert architecture exception frame into abstracted panic info
frame_to_panic_info(frame, &info, pseudo_excause);
// Call the system panic handler
esp_panic_handler(&info);
}
Code: Select all
SOC_HAL_STALL_OTHER_CORES();
We came across this issue, as our main application encountered a stack overflow and got stuck on site.
We have recreated this issue with a simple code, with a ton of print statements and WiFi enabled.
In this example(attached below), every 9 seconds we invoke a panic due to a stack overflow and in about 10-20 mins the ESP32 gets hanged.
What we have noticed is this issue does not occur if WiFi is not initialized, also it happens faster if we print long strings of data.
We feel its similar to this issue https://github.com/espressif/esp-idf/issues/8033, however we do not call esp_restart().
Our Development Environment :-
ESP-IDF : v4.4
OS : Ubuntu
Module : ESP32-WROOM-32E 8MB
Please find the attached code. Also find this GitHub issue https://github.com/espressif/esp-idf/issues/10110 we have raised about the same, we also have some debug logs there.