HTTP Server stop function can cause endless loop.
Posted: Wed Oct 11, 2023 5:50 am
I'm using IDF4.4.5 on ESP32 and I've come across what I think is a bug or maybe an oversight in the way the HTTP server is implemented.
I have a commercial project on a few different hardware platforms and I run multiple network interfaces simultaneously (e.g. STA, AP, ETH, PPP), and they all start-up and shut down independently (without reboot) in response to user actions etc.
I might not be understanding this perfectly, but as far as I can see:
The HTTP server code uses a select statement with an infinite timeout to block on a loopback socket for the control messages that are used to signal a shutdown to the httpd thread. The shutdown function blocks the calling task until the thread sets a shutdown flag.
However, If the interface that is acting as loopback (e.g STA) goes down the loopback socket seems to die and the signal never reaches the task and it is impossible to restart the HTTP server. LWIP says that if there is a valid interface this will act as the loopback rather than creating a loopback only dummy interface.
(Haven't checked if this is the docs for the exact same version of LWIP!).
https://www.nongnu.org/lwip/2_0_x/group ... 91e7ea58c5
So what seems to be happening is this:
1. Device boots, STA enabled.
2. Web server starts, and uses STA netif for loopback.
3. STA disconnects (on purpose, or even by accident). NETIF Still exists, but link is down.
4. I try to call httpd_stop()
5. Select statement never unblocks because loopback socket is dead.
6. The calling task is stuck in a loop pending (indirectly) on the select.
I see the master branch has something different here in the shutdown function for httpd, and it will return an error code in this situation if it can't signal the httpd thread, but I still don't think that helps unless you can restart (And reconnect) the wifi. It stops the caller blocking, but doesn't help the http server thread.
Does anyone have any ideas what I should do here? I can't guarantee I can get a clean shutdown of wifi in situations like an abrupt disconnection. I'd get stuck in a loop until the watchdog catches it with this code.
Any help would be appreciated!
I have a commercial project on a few different hardware platforms and I run multiple network interfaces simultaneously (e.g. STA, AP, ETH, PPP), and they all start-up and shut down independently (without reboot) in response to user actions etc.
I might not be understanding this perfectly, but as far as I can see:
The HTTP server code uses a select statement with an infinite timeout to block on a loopback socket for the control messages that are used to signal a shutdown to the httpd thread. The shutdown function blocks the calling task until the thread sets a shutdown flag.
However, If the interface that is acting as loopback (e.g STA) goes down the loopback socket seems to die and the signal never reaches the task and it is impossible to restart the HTTP server. LWIP says that if there is a valid interface this will act as the loopback rather than creating a loopback only dummy interface.
(Haven't checked if this is the docs for the exact same version of LWIP!).
https://www.nongnu.org/lwip/2_0_x/group ... 91e7ea58c5
So what seems to be happening is this:
1. Device boots, STA enabled.
2. Web server starts, and uses STA netif for loopback.
3. STA disconnects (on purpose, or even by accident). NETIF Still exists, but link is down.
4. I try to call httpd_stop()
5. Select statement never unblocks because loopback socket is dead.
6. The calling task is stuck in a loop pending (indirectly) on the select.
I see the master branch has something different here in the shutdown function for httpd, and it will return an error code in this situation if it can't signal the httpd thread, but I still don't think that helps unless you can restart (And reconnect) the wifi. It stops the caller blocking, but doesn't help the http server thread.
Does anyone have any ideas what I should do here? I can't guarantee I can get a clean shutdown of wifi in situations like an abrupt disconnection. I'd get stuck in a loop until the watchdog catches it with this code.
Any help would be appreciated!