Reboot problem when using NVS

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Reboot problem when using NVS

Postby Zingemneire » Thu May 31, 2018 9:36 am

HI,

I have been using ESP-IDF and its component libraries for a few weeks now.
The first application I built with it was a test application which used only the Wifi component and it worked just fine: operate for a few seconds, go to sleep for 5 seconds, wake up again, work, ....

The second one is the base for our final application, all it does for now is use the NVS library to initialise a parameter storage area, sleep for 60 seconds and start again, ....

it is based on the following enum table to identify each parameter:

Code: Select all

enum Parameters
{
   ParameterDataVersion,    // parameter type: uint32_t
   RequiredRxRssiLevel,     // parameter type: int8_t
   AccessPointSSID_1,       // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointSSID_2,       // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointSSID_3,       // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointSSID_4,       // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointSSID_5,       // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointPassword_1,   // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointPassword_2,   // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointPassword_3,   // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointPassword_4,   // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   AccessPointPassword_5,   // parameter type: char [ MAX_STRING_PARAMETER_SIZE ]
   OtaServerRootCertificate,// parameter type: char [ MAX_CERTIFICATE_PARAMETER_SIZE ]

   DefinedParameters        // dummy parameter enum to generate the number of parameters value
};
There is also an associated area in RAM to hold copies of the parameter data in RAM, which also contain default values for each parameter:

Code: Select all

static uint32_t _ParameterDataVersion_                                        = PARAMETER_DATA_VERSION;
static int8_t   _RequiredRxRssiLevel_                                         = -75;           //unit = dB
static char     _AccessPointSSID_1_        [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointSSID_2_        [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointSSID_3_        [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointSSID_4_        [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointSSID_5_        [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointPassword_1_    [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointPassword_2_    [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointPassword_3_    [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointPassword_4_    [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _AccessPointPassword_5_    [ MAX_STRING_PARAMETER_SIZE ]      = "************";
static char     _OtaServerRootCertificate_ [ MAX_CERTIFICATE_PARAMETER_SIZE ] = OTA_SERVER_ROOT_CA_PEM;
Note: the data shown as "************" is not the real deal of course. :)

To bind it all together there is a table providing some essential data such as the NVS key strings to use, the parameter's data types and the pointers to the parameter values listed just above:

Code: Select all

static ParameterListEntry_t ParameterList[ DefinedParameters ]=
{
   { "ParDataVersion ", Type_uint32, &_ParameterDataVersion_     },   // enum index: ParameterDataVersion
   { "ReqRxRssiLevel ", Type_int8,   &_RequiredRxRssiLevel_      },   // enum index: RequiredRxRssiLevel
   { "AccessPntSSID_1", Type_string, &_AccessPointSSID_1_        },   // enum index: AccessPointSSID_1
   { "AccessPntSSID_2", Type_string, &_AccessPointSSID_2_        },   // enum index: AccessPointSSID_2
   { "AccessPntSSID_3", Type_string, &_AccessPointSSID_3_        },   // enum index: AccessPointSSID_3
   { "AccessPntSSID_4", Type_string, &_AccessPointSSID_4_        },   // enum index: AccessPointSSID_4
   { "AccessPntSSID_5", Type_string, &_AccessPointSSID_5_        },   // enum index: AccessPointSSID_5
   { "AccessPntPwrd_1", Type_string, &_AccessPointPassword_1_    },   // enum index: AccessPointPassword_1
   { "AccessPntPwrd_2", Type_string, &_AccessPointPassword_2_    },   // enum index: AccessPointPassword_2
   { "AccessPntPwrd_3", Type_string, &_AccessPointPassword_3_    },   // enum index: AccessPointPassword_3
   { "AccessPntPwrd_4", Type_string, &_AccessPointPassword_4_    },   // enum index: AccessPointPassword_4
   { "AccessPntPwrd_5", Type_string, &_AccessPointPassword_5_    },   // enum index: AccessPointPassword_5
   { "OtaServRootCert", Type_string, &_OtaServerRootCertificate_ },   // enum index: OtaServerRootCertificate
};
The idea is to see if the ParameterdataVersion is in the flash, check if it corresponds to what it should be and if not initialise the values in flash to the defined default values in RAM.
The function to do the work is called for each parameter in reverse order ( to guarantee that the ParameterDataVersion value is written to flash last ) and it looks like this:

Code: Select all

           switch (dataType)
            {
               case Type_uint8:       espError = nvs_set_u8  ( NvsHandle, key, *( uint8_t * )  dataLocation ); break;
               case Type_int8:        espError = nvs_set_i8  ( NvsHandle, key, *( int8_t * )   dataLocation ); break;
               case Type_uint16:      espError = nvs_set_u16 ( NvsHandle, key, *( uint16_t * ) dataLocation ); break;
               case Type_int16:       espError = nvs_set_i16 ( NvsHandle, key, *( int16_t * )  dataLocation ); break;
               case Type_uint32:      espError = nvs_set_u32 ( NvsHandle, key, *( uint32_t * ) dataLocation ); break;
               case Type_int32:       espError = nvs_set_i32 ( NvsHandle, key, *( int32_t * )  dataLocation ); break;
               case Type_uint64:      espError = nvs_set_u64 ( NvsHandle, key, *( uint64_t * ) dataLocation ); break;
               case Type_int64:       espError = nvs_set_i64 ( NvsHandle, key, *( int64_t * )  dataLocation ); break;
               case Type_string:      espError = nvs_set_str ( NvsHandle, key,  ( char * )     dataLocation ); break;
               case Type_certificate: espError = nvs_set_str ( NvsHandle, key,  ( char * )     dataLocation ); break;
            }


From the attached serial port output file ( sensitive data converted to "*" symbols ) it is clear to see that it actually works until it suddenly stops while executing the nvs_set function for the ParameterDataVersion ( using key string "ParDataVersion "). From that point onward there is no more serial port output whatsoever. It appears to crash but oddly enough it does perform the 60 second sleep period at the end of the top level program.
One things is certain: the actual ParameterDataVersion does not get written into the flash because the code always performs the same way. That can only be caused by the value not being written so it tries to perform the exact same initialisation procedure over and over.

Some Extra observations:
  • It always fails/stops after 78.7 milliseconds
    The default log level in this case is DEBUG_LEVEL
    Adding vTaskDelay(5) does not help much either: it just slows it down a bit and does manage to send the debug message for the ParameterDataVersion parameter.
Is there anyone out there with a clue as to what might be going on ?
Attachments
CommPortOutput.txt
(12.11 KiB) Downloaded 671 times

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Re: Reboot problem when using NVS

Postby Zingemneire » Fri Jun 01, 2018 7:26 am

Hi,

It took about a full day to figure this one out but in the end I cracked it. The conclusion though is not comforting.

My original software was brand new and pretty extensive so it had some bugs in it, so whenever an "nvs_..." function was called incorrectly it appeared to crash but it did respect the original deep sleep instruction of 60 seconds at the very end of the sofware's execution path.

That made me suspect that one way or another the intensive use use of the functions in the NVS library somehow affected the serial port output while in reality the software was executing normally. I also noticed that the more data you send via the serial port the sooner it seems to fail so I set the log level back to info.

Putting some "vTaskDelay(10)" instructions after each iterative "nvs_get..." or "nvs_set..." helped because putting ESP_LOGI messages to aid in debugging after each iterative operation I managed to iron out all of the problems in the code. The "vTaskDelay(10)" instructions even made it possible to put in ESP_LOGI messages to confirm proper operation as the software components went through their cyclic "nvs_set..." and "nvs_get..." operations.

In the end I managed to iron out all of the problems and ended up with a version that does a number of things 100% correctly:
  • - if the specified "ParameterDataVersion" is changed a restart of the unit will trigger an erase operation of the partition allocated to store parameters.
    - After that the new list of parameter default values will be written to the nvs (flash).
    - When the ESP module then reboots it will check the version, find that it is ok, and read the parameters from flash into their RAM equivalents.
All of the functions called always return an esp_error_t to indicate whether or not they executed correctly, if they do work correctly they return ESP_OK.
In case my final answer is ESP_OK I terminate the software with a 5 seconds deep sleep command, if it is not I terminate the software with a 20 seconds deep sleep command.

End result: I see a regular restart of the software with an interval of 5 seconds which means that the software effectively returns ESP_OK in the end. In that case the execution path that has been followed contains a single ESP_LOGI message too describe which operation has been performed, for example: ESP_LOGI ( TAG, "RAM values set to flash values" ) ;.
In spite of the fact that I know the software executes correctly and the ESP_LOGI message just mentioned is what I should see in case of repetitive reboots I don't get to see it.
Instead the intense use of the NVS library seems to kill the serial port output as I always get this output from the start:
rst:0x5 (DEEPSLEEP_RESET),boot:0x37 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5676
load:0x40078000,len:0
load:0x40078000,len:14868
entry 0x40078628
<ESC>[0;32mI (29) boot: ESP-IDF v3.1-dev-961-ga2556229 2nd stage bootloader<ESC>[0m
<ESC>[0;32mI (29) boot: compile time 13:10:46<ESC>[0m
<ESC>[0;32mI (35) boot: Enabling RNG early entropy source...<ESC>[0m
<ESC>[0;32mI (35) boot: SPI Speed : 40MHz<ESC>[0m
<ESC>[0;32mI (39) boot: SPI Mode : DIO<ESC>[0m
<ESC>[0;32mI (43) boot: SPI Flash Size : 4MB<ESC>[0m
<ESC>[0;32mI (47) boot: Partition Table:<ESC>[0m
<ESC>[0;32mI (51) boot: ## Label Usage Type ST Offset Length<ESC>[0m
<ESC>[0;32mI (58) boot: 0 nvs WiFi data 01 02 00009000 00004000<ESC>[0m
<ESC>[0;32mI (66) boot: 1 otadata OTA data 01 00 0000d000 00002000<ESC>[0m
<ESC>[0;32mI (73) boot: 2 phy_init RF data 01 01 0000f000 00001000<ESC>[0m
<ESC>[0;32mI (80) boot: 3 factory factory app 00 00 00010000 00100000<ESC>[0m
<ESC>[0;32mI (88) boot: 4 ota_0 OTA app 00 10 00110000 00100000<ESC>[0m
<ESC>[0;32mI (95) boot: 5 ota_1 OTA app 00 11 00210000 00100000<ESC>[0m
<ESC>[0;32mI (103) boot: 6 ParameterPart WiFi data 01 02 00310000 00010000<ESC>[0m
<ESC>[0;32mI (110) boot: 7 EventLogPart WiFi data 01 02 00320000 00040000<ESC>[0m
<ESC>[0;32mI (118) boot: End of partition table<ESC>[0m
<ESC>[0;31mE (122) boot: ota data partition invalid, falling back to factory<ESC>[0m
<ESC>[0;32mI (129) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x0690c ( 26892) map<ESC>[0m
<ESC>[0;32mI (147) esp_image: segment 1: paddr=0x00016934 vaddr=0x3ffb0000 size=0x02c94 ( 11412) load<ESC>[0m
<ESC>[0;32mI (152) esp_image: segment 2: paddr=0x000195d0 vaddr=0x40080000 size=0x00400 ( 1024) load<ESC>[0m
<ESC>[0;32mI (156) esp_image: segment 3: paddr=0x000199d8 vaddr=0x40080400 size=0x06638 ( 26168) load<ESC>[0m
<ESC>[0;32mI (176) esp_image: segment 4: paddr=0x00020018 vaddr=0x400d0018 size=0x16b08 ( 92936) map<ESC>[0m
<ESC>[0;32mI (208) esp_image: segment 5: paddr=0x00036b28 vaddr=0x40086a38 size=0x02cfc ( 11516) load<ESC>[0m
<ESC>[0;32mI (214) esp_image: segment 6: paddr=0x0003982c vaddr=0x400c0000 size=0x00064 ( 100) <ESC>[0m
<ESC>[0;32mI (221) boot: Loaded app from partition at offset 0x10000<ESC>[0m
<ESC>[0;32mI (221) boot: Disabling RNG early entropy source...<ESC>[0m
<ESC>[0;32mI (227) cpu_start: Pro cpu up.<ESC>[0m
<ESC>[0;32mI (230) cpu_start: Starting app cpu, entry point is 0x40080f00<ESC>[0m
<ESC>[0;32mI (0) cpu_start: App cpu up.<ESC>[0m
<ESC>[0;32mI (241) heap_init: Initializing. RAM available for dynamic allocation:<ESC>[0m
<ESC>[0;32mI (247) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM<ESC>[0m
<ESC>[0;32mI (253) heap_init: At 3FFB34D0 len 0002CB30 (178 KiB): DRAM<ESC>[0m
<ESC>[0;32mI (260) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM<ESC>[0m
<ESC>[0;32mI (266) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM<ESC>[0m
<ESC>[0;32mI (273) heap_init: At 40089734 len 000168CC (90 KiB): IRAM<ESC>[0m
<ESC>[0;32mI (279) cpu_start: Pro cpu start user code<ESC>[0m
<ESC>[0;32mI (297) cpu_start: Starting scheduler on PRO CPU.<ESC>[0m
<ESC>[0;32mI (0) cpu_start: Starting scheduler on APP CPU.<ESC>[0m
<ESC>[0;3
From all the testing I did I can conclude two things:
  • - Calling an NVS function incorrectly terminates the serial port communication within a few millieseconds.
    - Calling NVS functions correctly and consecutively achieves the same within a very short time.
Knowing the software works is great but how do I test other software functions I add later as it takes only about 297 milliseconds before the main debugging output channel calls it quits ????

Anyone out there who has an idea or suggestion on why this is happening?
It would be even better if someone knows how to keep the serial port up and running or restart it?

ESP_Sprite
Posts: 9708
Joined: Thu Nov 26, 2015 4:08 am

Re: Reboot problem when using NVS

Postby ESP_Sprite » Sat Jun 02, 2018 7:51 am

That sounds pretty strange, as internal things like WiFi etc also make use of NVS to store parameters; if NVS writes have this effect, we'd be up to our neck in bug reports by now. Is there any chance you could whittle your code down to the smallest example which exhibits this behaviour and post that somewhere?

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Re: Reboot problem when using NVS

Postby Zingemneire » Mon Jun 04, 2018 7:46 am

Hi ESP_Sprite,

I agree, it is very strange behaviour and as you say if it were to happen regularly you would be inundated with bug reports.

Later today I have a meeting on the progression of the software development so I will discuss it there what we can provide.

I will let you know what develops.

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Re: Reboot problem when using NVS

Postby Zingemneire » Wed Jun 06, 2018 7:21 am

Hi ESP_Sprite,

I tried to send this via PM but I seem to be unable to attach files there so hereby attached you will find an AES encrypted zip file.
I will send you the password to unlock it via PM. 7z or any other zip tool should be able to unpack it.

It is a stripped down version of our application which only performs the Parameter Storage initialisation.
If you run it on an ESP32-WROVER module you will see that the first LOGI message "Here we go" is transmitted on the serial port but none of the others make it.
At least that is what happens on my hardware which is nothing more than an ESP32-WROVER module with a few extra bits to read data from an I2C temperature sensor.
The data stored: WIFI SSIDs, WIFI Passwords and some OTA certificates are not real of course but they have exactly the same size as the real data. As they are not used for now the difference is not relevant.

The project is configured to be edited in ECLIPSE under the following directory: C:\Work\EclipseWS\NVS_TEST but it does build correctly when using the "make" command on the MSYS32 MINGW32 command line so you should be able to put it anywhere.
To keep down the size of the file a "make clean" command was executed prior to zipping it.

I hope you can figure out what be wrong or what I am doing wrong.
Attachments
NVS_test.zip
(168.16 KiB) Downloaded 405 times

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Re: Reboot problem when using NVS

Postby Zingemneire » Wed Jun 06, 2018 12:20 pm

Hi ESP_Sprite,

Some more bits of info:

1) I sent the PM with the password but it seems to be stuck in my "Outbox" although it indicates I sent it this morning, not sure if it is supposed to be that way as I would have expected it to be in my "Sent messages" area, like the first PM I sent you a few days ago.

2) I have experimented further with NVS to store Event log entries in a reserved 256 kByte partition for that. It works OK until I come to the point where I erase 100 log entries to reduce the maximum number of entries from 1100 to 1000 by means of the "nvs_erase_key" function. I see the ESP_LOGI message to indicate the start of that 100 times "nvs_erase_key" loop operation but I don't see the one saying it is done so the serial port has stopped operating along the way.

ESP_Sprite
Posts: 9708
Joined: Thu Nov 26, 2015 4:08 am

Re: Reboot problem when using NVS

Postby ESP_Sprite » Thu Jun 07, 2018 3:09 am

Your program does exactly what it's supposed to do, but bu going into deep sleep immediately after sending the 'Initialization done' log line, you don't give the UART enough time to finish sending that to the UART. FYI, you could have seen that by the fact that the ESP32 clearly indicates DEEPSLEEP_RESET as the reset reason.

Zingemneire
Posts: 68
Joined: Tue Apr 17, 2018 7:35 am

Re: Reboot problem when using NVS

Postby Zingemneire » Thu Jun 07, 2018 5:34 am

Hi ESP_Sprite,

Many thanks for that, so: the conclusion is that there is not a real problem, just a minor timing issue, which is a great relief.

I knew about the deep sleep as I was using it to check whether or not the result of my initialisation worked OK or not but I did not realise how fast it works. I will put in a 1/2 second delay or so to give the UART the chance to send everything it needs to send.

Just a case of "Glaringly obvious once pointed out", thanks for making my day!

Who is online

Users browsing this forum: ulaoulao and 137 guests