Random Crashes in TCP Task
Posted: Thu Jan 20, 2022 10:19 am
Hi All,
I have a largely complete ESP32 project running on custom hardware, but it crashes every now and then.
The ESP32 sits and listens to serial data from another processor and forwards the data to any connected Websocket clients. It also stores some data onto an SD card and watches a couple of GPIO using interrupts. All parts of it work, until it crashes that is.
The crashes are invariably in the TCP task and usually when the main app task is trying to forward data to a Websocket client. Sometimes the TCP task asserts in tcpip_thread_handle_msg, other times it crashes in lwip_netconn_do_write and sometimes gets as far as lwip_netconn_do_writemore. Very occasionally it will crash in udp_sendto_if_src (a UDP socket is used by the main app task for control messages).
All crashes appear to happen because something has overwritten the api_msg in the main app task's stack. The main app stack is usually corrupt so the core dump doesn't really help to pinpoint the cause. Looking at the main app stack manually almost always shows a reference to _xt_user_exit at SP and also nearer the top, above which the stack looks normal (refs to vPortTaskWrapper and app_task). There are also references to _xt_lowint1, _frxt_int_enter and usually lwip_netconn_do_write and vprintf alongside several 0xA5 bytes and a couple of 0xbaad5678 (I have comprehensive heap corruption detection enabled).
Occasionally this seems to happen in a different task with a much smaller stack which causes a stack overflow crash instead but the damage to the stack is much the same as above. A colleague has another unit running the same firmware but it crashes more often than mine.
Does anyone have any idea what could possibly be causing this?
Is there anything else I could post that could help tracking down the cause?
Cheers
Paul
I have a largely complete ESP32 project running on custom hardware, but it crashes every now and then.
The ESP32 sits and listens to serial data from another processor and forwards the data to any connected Websocket clients. It also stores some data onto an SD card and watches a couple of GPIO using interrupts. All parts of it work, until it crashes that is.
The crashes are invariably in the TCP task and usually when the main app task is trying to forward data to a Websocket client. Sometimes the TCP task asserts in tcpip_thread_handle_msg, other times it crashes in lwip_netconn_do_write and sometimes gets as far as lwip_netconn_do_writemore. Very occasionally it will crash in udp_sendto_if_src (a UDP socket is used by the main app task for control messages).
All crashes appear to happen because something has overwritten the api_msg in the main app task's stack. The main app stack is usually corrupt so the core dump doesn't really help to pinpoint the cause. Looking at the main app stack manually almost always shows a reference to _xt_user_exit at SP and also nearer the top, above which the stack looks normal (refs to vPortTaskWrapper and app_task). There are also references to _xt_lowint1, _frxt_int_enter and usually lwip_netconn_do_write and vprintf alongside several 0xA5 bytes and a couple of 0xbaad5678 (I have comprehensive heap corruption detection enabled).
Occasionally this seems to happen in a different task with a much smaller stack which causes a stack overflow crash instead but the damage to the stack is much the same as above. A colleague has another unit running the same firmware but it crashes more often than mine.
Does anyone have any idea what could possibly be causing this?
Is there anything else I could post that could help tracking down the cause?
Cheers
Paul