MQTT Client causes Chip Hardlock
Posted: Tue May 14, 2024 7:27 pm
I have a system based on an ESP32-S3 (N8) that uses Ethernet and the ESP-IDF MQTT client ) and I experience hardlocks in my system in a way I want to say is intermittent, but happen at the same place in my code in a deterministic manner.
The hardlocks occur when the system hits the broker: on primary connection, on add subscription, on publish message. The "hardlock" in this case is that the chip completely stops - all threads quit reporting to the console log, all external communications stop, no apparent processing takes place.
I am unable to reproduce the issue consistently, all I know is that some builds "just hardlock". I'll do a full code build, flash it to chip, and usually if the issue occurs it happens early in the boot process (usually at my connection init or adding subscribers). Sometimes it won't happen until I get a subscribed message come in.
I've been dealing with this issue for a few months, back then I saw something that said it could have to do with WiFi, but now I have my hardware which has a Wiz5500 Ethernet module and I'm still seeing the hardlocks.
Now, how I've gotten around this issue may provide some hints to what is happening, I hope: since this seems to be deterministic, the hardlocks occur at the exact same place IF it is going to happen in a build. The way that I've been able to get around this is to "make the code different enough", and I'm doing that with adding or removing ESP_LOGX() log lines around in my code.
I do not currently have JTAG debugging working as I can't get my development OpenOCD to see my chip via USB (I've tried just about any of the fixes I've found on the web, but nothing seems to be working). When I was looking for solutions to this a few months back I ran across a similar issue where folks with working JTAG were showing that the system was locked in some kind of mqtt wait (I've since looked for those threads to no avail).
I've tried to turn up logging on the mqtt related features, but I don't seem to get any more info from them:
but those don't seem to spit out any more information. If there are any other modules I could log that could be helpful, please send those along.
Some other solutions I've tried: I've tried moving these between cores - kept all networking on Core0, moved networking to Core1, changed priorities of MQTT and Networking tasks, etc.
In general, my code is not much more than the example code, I connect to a broker, set some subscriptions, and publish some messages. I'm not doing anything that I consider especially esoteric.
Thanks for reading, any help is welcome!
A
Code: Select all
PRIV_REQUIRES mqttclient
#include "mqtt_client.h"
I am unable to reproduce the issue consistently, all I know is that some builds "just hardlock". I'll do a full code build, flash it to chip, and usually if the issue occurs it happens early in the boot process (usually at my connection init or adding subscribers). Sometimes it won't happen until I get a subscribed message come in.
I've been dealing with this issue for a few months, back then I saw something that said it could have to do with WiFi, but now I have my hardware which has a Wiz5500 Ethernet module and I'm still seeing the hardlocks.
Now, how I've gotten around this issue may provide some hints to what is happening, I hope: since this seems to be deterministic, the hardlocks occur at the exact same place IF it is going to happen in a build. The way that I've been able to get around this is to "make the code different enough", and I'm doing that with adding or removing ESP_LOGX() log lines around in my code.
I do not currently have JTAG debugging working as I can't get my development OpenOCD to see my chip via USB (I've tried just about any of the fixes I've found on the web, but nothing seems to be working). When I was looking for solutions to this a few months back I ran across a similar issue where folks with working JTAG were showing that the system was locked in some kind of mqtt wait (I've since looked for those threads to no avail).
I've tried to turn up logging on the mqtt related features, but I don't seem to get any more info from them:
Code: Select all
esp_log_level_set( "mqtt5_client", ESP_LOG_VERBOSE);
esp_log_level_set( "MQTT5EventHandler", ESP_LOG_VERBOSE);
Some other solutions I've tried: I've tried moving these between cores - kept all networking on Core0, moved networking to Core1, changed priorities of MQTT and Networking tasks, etc.
In general, my code is not much more than the example code, I connect to a broker, set some subscriptions, and publish some messages. I'm not doing anything that I consider especially esoteric.
Thanks for reading, any help is welcome!
A