Crash running https_request example

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Mon Jun 12, 2017 6:55 am

Changed a couple things:
  1. Modified app_main to Pin "https_get_task" to CPU1 (see below)
  2. Added a printf to print the available heap before each "Starting again!"
  3. Did a "make erase_flash" - then "make flash" and "make monitor"
And, it runs without crashing - no "Guru Meditation Error"s, no reboots!

...For about 30 minutes. Then "https_get_task" hangs while app_main keeps running. I think both times "https_get_task" hung at "Performing the SSL/TLS handshake..." The second time it was at 31:49.

Any idea what's going on here? The Free Heap size was: 191364, so I don't think its a lack of memory. Is it possible that some timer kills tasks? So possibly we need to kill and re-create them every 30 minutes?

Code: Select all

void app_main()
{
    ESP_ERROR_CHECK( nvs_flash_init() );
    initialise_wifi();
    // xTaskCreate(&https_get_task, "https_get_task", 8192, NULL, 5, NULL);
    xTaskCreatePinnedToCore(&https_get_task, "https_get_task", 8192, NULL, 5, NULL, 1);

    while (1) {
         vTaskDelay(100/portTICK_PERIOD_MS);
         print_time();
    }
}

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Mon Jun 12, 2017 6:36 pm

@f.h-f.s. - Credit - your suggestion worked!

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Mon Jun 12, 2017 6:40 pm

Conversely (to killing tasks)

- Is there a way to monitor tasks and possibly be notified if they die so they can be restarted?

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Crash running https_request example

Postby kolban » Mon Jun 12, 2017 7:23 pm

There is a FreeRTOS API called vTaskList ... see:

http://www.freertos.org/a00021.html#vTaskList

However, it comes with caveats (see the docs). It is also not enabled by default in ESP-IDF but can be enabled by changing the FreeRTOS config files.

See also:

https://github.com/espressif/esp-idf/issues/411
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Mon Jun 12, 2017 11:43 pm

@kolban
Thanks - looks like it's "not intended for normal application runtime use but as a debug aid" - so probably better to focus on why it's hanging.

Any thoughts on the best way to debug why "https_get_task" hangs at random times?
  • So far it's been after 30-40 minutes (31 Minutes, 49 Secs yesterday and 43 Minutes, 58 Secs today).
  • Each time, it's been after printing "Performing the SSL/TLS handshake..." - so I'm assuming it's during that "Phase"
  • Is there some resource that could be running out? Any-way to see what that resource is?

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Crash running https_request example

Postby kolban » Tue Jun 13, 2017 12:35 am

Non-deterministic or timing problems can be ugly to diagnose. If a hang is occurring, what I would suggest is to find the statement that was entered but has not returned and see if it is consistent across hangs. Without the ability to "break into" code, the best suggestion I have is to liberally sprinkle printf() statements across your code each with a unique string. If these are captured in the serial log, we will eventually see a "last" log entry which, if done correctly, will show us exactly which statement we entered but didn't return from.

With that knowledge in hand, something else might pop into our minds. Depending on what the statement is, we may be able to switch on additional debugging for that code or ... in worst case ... insert further diagnostics into that code and recompile it.
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Tue Jun 13, 2017 3:04 am

Thanks again.

A couple things:
  1. Received the new ESP32 - in this case a NodeMCU Dev Kit. Tested with the original (unpinned) demo and it still crashes. So, pretty sure it has nothing to do with brownouts / power issues. Pretty sure it's a bug in the demo code pure and simple. It wouldn't have taken any special testing to see it - just let the demo run for a while. Is EspressIf testing their code?!?
  2. It turns out you can get detailed mbedtls debugging messages.
    Run "make menuconfig" and choose "Component config ---> mbedTLS ---> Enable mbedTLS debugging"
With debugging on: it hung after printing "Server used HashAlgorithm 5":

I (3346) mbedtls: ssl_tls.c:2405 ssl->f_recv(_timeout)() returned 333 (-0xfffffeb3)
I (3356) mbedtls: ssl_tls.c:2417 <= fetch input
I (3366) mbedtls: ssl_tls.c:3975 <= read record
I (3366) mbedtls: ssl_cli.c:1912 ECDH curve: secp256r1
I (3376) mbedtls: ssl_cli.c:2134 Server used SignatureAlgorithm 1
I (3386) mbedtls: ssl_cli.c:2135 Server used HashAlgorithm 5
HERE -->

If it had continued, the next thing printed would be something like:
I (XXXX) mbedtls: ssl_cli.c:2509 <= parse server key exchange

I tried setting the read_timeout, but apparently theres another setting needed for it to work. So, I'll have to test this in the next round...

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Tue Jun 13, 2017 4:53 am

After a few more tests, "https_get_task" has hung in as little as 1Minute:30sec. The last hung at 7:45. The test is still running to see of it ever comes back (app_main is still printing the time every 100ms).

"Enable mbedTLS debugging" is turned on.

It turns out that the read_timeout (AKA read/recv timeout) was set and is being called - repeatedly - which is weird because:
  1. it's being called very quickly (I can tell because there aren't a bunch of "times-being-printed-every-100ms" and then a timeout)
  2. I've got the timeout period set to 10000 which I thought was 10 seconds - but maybe I need to set a longer timeout period?!?
Compared the debugging messages printed by mbedTLS on the:
  1. last cycle (which hung)
    the last thing printed in this case was:
    "mbedtls: ssl_cli.c:2135 Server used HashAlgorithm 5"
  2. 2nd-to-last cycle (which successfully completed)
    In the case it continued:
    mbedtls: ssl_cli.c:2135 Server used HashAlgorithm 5
    mbedtls: ssl_cli.c:2509 <= parse server key exchange
    ...
    mbedtls: ssl_tls.c:6316 <= handshake
If anyone wants to see the complete listings, I can post them...But it looks like the problem is somewhere between "Server used HashAlgorithm 5" and "parse server key exchange".

User avatar
kolban
Posts: 1683
Joined: Mon Nov 16, 2015 4:43 pm
Location: Texas, USA

Re: Crash running https_request example

Postby kolban » Tue Jun 13, 2017 2:13 pm

I tend to suggest always go ahead and post the traces and logs but post them to pastebin.com and include a link(s). Since your forum post is public domain and you are prepared to share the traces, it doesn't seem to do any harm to post them to pastebin.
Free book on ESP32 available here: https://leanpub.com/kolban-ESP32

mgleason_3
Posts: 44
Joined: Mon Nov 07, 2016 5:04 pm

Re: Crash running https_request example

Postby mgleason_3 » Wed Jun 14, 2017 4:46 am

So, there's really two different issues. Continued
  1. Here: https://github.com/espressif/esp-idf/issues/698
    and
  2. Here: https://github.com/espressif/esp-idf/issues/630

Who is online

Users browsing this forum: Majestic-12 [Bot] and 61 guests