ESP Core MQTT Agent blocks for 3s when sending MQTT messages

jcolebaker
Posts: 64
Joined: Thu Mar 18, 2021 12:23 am

ESP Core MQTT Agent blocks for 3s when sending MQTT messages

Postby jcolebaker » Tue Feb 21, 2023 9:37 pm

Hi,

We're using esp-aws-iot libraries (https://github.com/espressif/esp-aws-iot) to connect our device to AWS. We are using release/202012.04-LTS.

Specifically, I am using the coreMQTT-Agent library to connect to AWS and send device shadow updates and telemetry messages.

I am observing a strange behavior where it takes the MQTT Agent typically 3 seconds to send each MQTT message.

We have an MQTT agent task which connects to the broker with "xTlsConnect", then calls "MQTTAgent_CommandLoop" to start the MQTT Agent.

Note it uses the "xTlsConnect" implementation in esp-aws-iot\libraries\coreMQTT\port\network_transport\network_transport.c, which is what the examples used.

When I send several agent commands (e.g. with MQTTAgent_Publish), the first one will be processed quickly (i.e. data sent to the server) but subsequent commands will be delayed by up to 3 seconds before the agent sends them.

When I traced the code and added log messages, I found this:

* MQTT agent task receives commands from a queue and processes them in processCommand(core_mqtt_agent.c)
* After calling the command function, MQTT_ProcessLoop is called (if commandOutParams.runProcessLoop is TRUE, e.g. for publish commands)
* MQTT_ProcessLoop calls receiveSingleIteration then MQTT_GetIncomingPacketTypeAndLength
* MQTT_GetIncomingPacketTypeAndLength calls the transport interface "readFunc" to read from the TLS connection
* "readFunc" maps to espTlsTransportRecv in "network_transport.c"
* If there is no data waiting, espTlsTransportRecv will BLOCK for 3000 mS before returning.

It looks to me like this call is expected to be a non-blocking read? However, the "non_block" option is NOT specified in the esp_tls_cfg_t passed to esp_tls_conn_new_sync (in xTlsConnect).

I'm confused by this. It looks like it will almost always block for 3 seconds per message when several MQTT messages are sent, and this is what I observe. It became very obvious when I tried to request a firmware update file (in 1 kb chunks) via MQTT file stream, using QOS 0, and each request took 3 seconds, which was spent waiting for the agent to publish the request message.

Am I using this interface incorrectly? Or is it supposed to have "non_block = 1" in the esp_tls_cfg_t?

**txf**
Posts: 12
Joined: Thu Apr 21, 2022 3:03 pm

Re: ESP Core MQTT Agent blocks for 3s when sending MQTT messages

Postby **txf** » Tue Mar 21, 2023 11:50 am

Maybe a little late but yes the default configuration in

Code: Select all

components/esp-aws-iot/libraries/coreMQTT/port/network_transport/network_transport.c
Is blocking. In

Code: Select all

xTlsConnect()
you need to change timeout_ms to a lower value. However, the lower the value, the greater likelihood of ESP-TLS just dropping the connection.

You can also turn on .non_block and increase the timeout, otherwise ESP-TLS will have issues connecting. This is my config as of right now:

Code: Select all

    esp_tls_cfg_t xEspTlsConfig = {
        .cacert_buf = (const unsigned char*) ( pxNetworkContext->pcServerRootCAPem ),
        .cacert_bytes = strlen( pxNetworkContext->pcServerRootCAPem ) + 1,
        .clientcert_buf = (const unsigned char*) ( pxNetworkContext->pcClientCertPem ),
        .clientcert_bytes = strlen( pxNetworkContext->pcClientCertPem ) + 1,
        .skip_common_name = pxNetworkContext->disableSni,
        .alpn_protos = pxNetworkContext->pAlpnProtos,
        .non_block = /*BLOCKING_MODE_TLS,*/ true,
#if CONFIG_CORE_MQTT_USE_SECURE_ELEMENT
        .use_secure_element = true,
#elif CONFIG_CORE_MQTT_USE_DS_PERIPHERAL
        .ds_data = pxNetworkContext->ds_data,
#else
        .use_secure_element = false,
        .ds_data = NULL,
        .clientkey_buf = ( const unsigned char* )( pxNetworkContext->pcClientKeyPem ),
        .clientkey_bytes = strlen( pxNetworkContext->pcClientKeyPem ) + 1,
#endif
        .timeout_ms = 15000/*TIMEOUT_TLS*/
    };
One thing to note that regardless of what you do, with non_block, ESP-TLS will constantly be dropping the connection, so if your message rate is time sensitive you will have issues. The default OTA settings with non_block will constantly cause the tls connection to be dropped.

Overall its a pretty rotten situation, if you need to send messages fast and maintain a stable connection.

jcolebaker
Posts: 64
Joined: Thu Mar 18, 2021 12:23 am

Re: ESP Core MQTT Agent blocks for 3s when sending MQTT messages

Postby jcolebaker » Tue Apr 04, 2023 1:07 am

Hi,

I have updated the TLS config in file esp-aws-iot\libraries\coreMQTT\port\network_transport\network_transport.c as follows (from around line 13):

Code: Select all

    esp_tls_cfg_t xEspTlsConfig = {
        .cacert_buf = (const unsigned char*) ( pxNetworkContext->pcServerRootCA ),
        .cacert_bytes = pxNetworkContext->pcServerRootCASize,
        .clientcert_buf = (const unsigned char*) ( pxNetworkContext->pcClientCert ),
        .clientcert_bytes = pxNetworkContext->pcClientCertSize,
        .skip_common_name = pxNetworkContext->disableSni,
        .alpn_protos = pxNetworkContext->pAlpnProtos,
        .use_secure_element = pxNetworkContext->use_secure_element,
        .ds_data = pxNetworkContext->ds_data,
        .clientkey_buf = ( const unsigned char* )( pxNetworkContext->pcClientKey ),
        .clientkey_bytes = pxNetworkContext->pcClientKeySize,

        // ****  PATCH - Non-blocking MQTT Port ***********************
        .timeout_ms = 4000,
        .non_block = true,
        //*******************************************************************

    };
This is working well for us - The TLS connection is stable so long as the Wi-Fi is good. I did have to increase the timeout to 4000 mS to get the TLS handshake to work, and this value might still be a bit low as I've seen occasional failures when AWS is a bit slow.

However, this only solves half the problem. The other problem is that the MQTT agent command loop is always blocking waiting for a command on its queue (see MQTTAgent_CommandLoop function in libraries\coreMQTT-Agent\coreMQTT-Agent\source\core_mqtt_agent.c). While the agent task is blocked waiting for a command, it's not checking for messages from the TLS connection. The default block time for the agent is 1000 mS. This means you will often see a delay of up to 1 second between an MQTT message arriving "on the wire" and actually getting processed by the agent. The agent really needs to implement some kind of "select" mechanism here, so that it can wait for a command OR incoming data.

The quick and dirty solution was to make the agent block time shorter (we use 200 mS). This is done by setting the following config value:

Code: Select all

CONFIG_MQTT_AGENT_MAX_EVENT_QUEUE_WAIT_TIME=200
It uses a bit more CPU time (effectively polling the TLS connection for data) but I haven't seen any impact on our code at 200 mS.

We don't really need the MQTT to be very responsive most of the time. However, we use MQTT to download firmware updates in 2 KiB chunks, and the delay made the download process very slow.

Who is online

Users browsing this forum: Majestic-12 [Bot] and 177 guests