Page 1 of 1

NVS crashing ESP32 after used a few times

Posted: Thu Feb 02, 2017 8:45 pm
by josmunpav
Hi all,

I managed to save complex structs into NVS using key values. My code is working ok so far, the problem is using the NVS in write mode several times seems to crash entirely the ESP32 and I need to reboot it. I save all kind of structs with a well defined size of no more of 512bytes each. This is a typical code:

Code: Select all

/*
 * Save any kind of structure into NVS by a Module/Key
 * Module Handler must have a valid value.
 */
esp_err_t ConfigurationHandler::saveDataToModule(nvs_handle *moduleHandler, String key, void* data, size_t size)
{
	Serial.printf("\r\nSaving data with key: '%s'", key.c_str());
	esp_err_t opResult = nvs_set_blob(*moduleHandler, key.c_str(), data, size);

	delay(500);
	opResult = nvs_commit(*moduleHandler);
	delay(500);

	if (opResult == ESP_OK)
		Serial.println("\r\nData saved");
	else
		Serial.println("\r\nData could not be saved");


	return opResult;
}
First code was using a delay of 10ms between setting the blob and committing. That has been increased up to 500ms in different attempts of finding stability. No luck so far. Above 30ms seemed to be stable but after trying several times it crashes :(

I am currently using IDF1.0, so not sure if the NVS has been reviewed since then.

Many thanks

Re: NVS crashing ESP32 after used a few times

Posted: Thu Feb 02, 2017 9:18 pm
by ESP_igrr
Could you please share the application which exhibits this issue? Thanks.

Re: NVS crashing ESP32 after used a few times

Posted: Thu Feb 02, 2017 9:55 pm
by josmunpav
I cannot copy the whole code but for example I have got this structure:

Code: Select all

/**
 * Defines a Device
 */
struct DeviceNVS
{
	char name[CHAR_NORMAL_ARRAY_SIZE];
	SystemElementId systemElementId;			// enum
	ConnectedDeviceType connectedDeviceType;	// enum
	SystemPinOrMask pinOrMask;					// enum
	uint8_t enabled;
};
And I make use of the previous posted save method against that struct like this:

Code: Select all

		// Relays
		for (int i=0; i<relays.size(); i++)
		{
			DeviceNVS device = relays.get(i)->toDeviceNVS();
			String key(CONFIGURATION_KEY_RELAYS);
			key += i;
			saveDataToModule(&moduleHandler, key, &device, sizeof(DeviceNVS));
		}
Where the key is something like "DEVICE" and the module handler is created on another method like:

Code: Select all

       return nvs_open("NVS module name here", openMode, moduleHandler);

The main problem my application has is that it is totally configurable by the user through a web interface. That data is sent over to the ESP32 in JSON format, parsed intro structs and then saved/read to NVS module.

So in theory I can have an indeterminate (limited by NVS size) number of different NVS elements, each one of them different in size. I wonder if I am having some kind of overwriting in keys, as data will not be a fixed configuration but a changing one. Problem only occurs when writing though.

What I am testing right now is to write all possible allowed configurations (as they are hierarchical and I limit the number of children) with fake data to "reserve" the space, and only use it when necessary.

E.g.

Code: Select all

SystemNVS (64bytes) -> 1 main struct
   Subsystem_1_NVS (32 bytes) -> 10 max structs (reserve space)
      Sybsystem_1_1_NVS (50 bytes) -> 20 max structs (reserve space)
   Subsystem_2_NVS (16 bytes) -> 5 max structs (reserve space)
   Subsystem_3_NVS  (100 bytes) -> 15 max structs( reserve space)
...
and so on.
This way I would lose flexibility of NVS key/values but I can guarantee no keys are overlapped if I delete and recreate them, as they all will have their space pre reserved.

Hope it makes sense.

Re: NVS crashing ESP32 after used a few times

Posted: Thu Feb 02, 2017 10:32 pm
by kolban
Howdy there ... for me, the core phrase in your postings is "seems to crash entirely the ESP32". I think I'd like to examine that some more. Can you clarify what you mean by that? Is it a hang or some other symptom? If a true "crash" then we should anticipate seeing an exception and if we get an exception we can get a core dump and determine exactly where the exception is occurring ... and from there work backwards to the cause.

What I think I'd like to hear from you are the details of the exception ... and if the exception is indeed an ESP32 crash then let us further look at capturing a core dump ... see http://esp-idf.readthedocs.io/en/latest/core_dump.html

Once we get those details, the puzzle may open up for an easy resolution ... and if not obvious, at least it will start to point us in a better direction.

Re: NVS crashing ESP32 after used a few times

Posted: Fri Feb 03, 2017 1:38 pm
by aschweiz
Hi josmunpav,

did you try running your application on a single core only?
(make menuconfig -> Component Config -> FreeRTOS -> Run only on 1st core)

I also have sporadic problems with writing to flash memory (crash with IllegalInstruction exception, or a core hangs) but could solve it temporarily by using only 1 core.

From a quick look at the esp32 sources, I believe that it has something to do with disabling the caches before writing to the flash, may be a race condition in cache_utils.c, but I didn't have time yet to debug this in detail.

HTH
Andreas

Re: NVS crashing ESP32 after used a few times

Posted: Sat Feb 04, 2017 12:05 am
by WiFive
There have been bugfixes since 1.0 that may affect this.

Re: NVS crashing ESP32 after used a few times

Posted: Sat Feb 04, 2017 9:10 am
by josmunpav
aschweiz wrote:Hi josmunpav,

did you try running your application on a single core only?
(make menuconfig -> Component Config -> FreeRTOS -> Run only on 1st core)

I also have sporadic problems with writing to flash memory (crash with IllegalInstruction exception, or a core hangs) but could solve it temporarily by using only 1 core.

From a quick look at the esp32 sources, I believe that it has something to do with disabling the caches before writing to the flash, may be a race condition in cache_utils.c, but I didn't have time yet to debug this in detail.

HTH
Andreas
Hi Andreas,

I did notice yesterday that disabling the second task (web server) by simply commenting the code where it starts and leaving just the main one while doing simple but iterative NVS writing I could write up to 80 times without issues (I stopped it at that point), so it looks like is something related to that.

I am using Sloeber IDE for an Arduino friendly environment so I am not sure I can run the menuconfig to setup only 1 core. However I was also doing more tests yesterday by setting up the tasks into two different cores. By default the Arduino main loop task runs on core 0 so I set up the web server app into core 1. In this case stability is much better but the hang still happens after a few dozen writings, sometimes even on the first attempt. So no luck in there.

I want to play now with suspending the main task before accessing the NVS resource and resuming after that. Not the most elegant solution I know but this is only for updating the whole system which might bring some benefits to my app so I can live with it.

I am also playing with a very simple IDF app to try reproducing this behaviour with no luck at the moment. In this case the app does not hang and finish its task ok. My app is way more complex than this, though. Here you have the code if you guys want to play with it:

Code: Select all

#include "freertos/FreeRTOS.h"
#include "esp_wifi.h"
#include "esp_system.h"
#include "esp_event.h"
#include "esp_event_loop.h"
#include "nvs_flash.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "nvs.h"

const char * MODULE = "TEST_MOD";
const char * KEY = "TEST_KEY";
const int DELAY = 5;
const int NUMBER_OF_TESTS = 100;
const char * tag = "NVS_TESTS";
const char * tagSecond = "SECOND_TASK";

const int SECOND_TASK_CORE = 0;

struct TestNVSStruct
{
	char name[32];
	int myInt;
	float myFloat;	
	int myIntArray[10];
}testStruct;

/** Second Dummy Task */ 
void secondDummyTask(void *data)
{
	for(;;)
	{		
		vTaskDelay(100 / portTICK_PERIOD_MS);
		testStruct.myFloat += testStruct.myInt;
	}
}


/** Main App Task */
void app_main(void)
{
    nvs_flash_init();	
	ESP_LOGI(tag, "STARTING SECOND DUMMY TASK");
	xTaskCreatePinnedToCore(&secondDummyTask, "secondDummyTask", 4096, NULL, 5, NULL, SECOND_TASK_CORE);	
	
	ESP_LOGI(tag, "STARTING NVS TESTS IN 5 SECONDS");
	vTaskDelay(5000 / portTICK_PERIOD_MS);
	int testCompleted = 0;

    while (true) 
	{		
		for (int i=1; i<=NUMBER_OF_TESTS && testCompleted == 0; i++)
		{
			ESP_LOGI(tag, "NVS Test number %d", i);


			testStruct.myInt = i;
			
			nvs_handle moduleHandler;
			ESP_ERROR_CHECK(nvs_open(MODULE, NVS_READWRITE, &moduleHandler));
			vTaskDelay(DELAY / portTICK_PERIOD_MS);			
			ESP_ERROR_CHECK(nvs_set_blob(moduleHandler, KEY, &testStruct, sizeof(testStruct)));
			vTaskDelay(DELAY / portTICK_PERIOD_MS);			
			ESP_ERROR_CHECK(nvs_commit(moduleHandler));	
			vTaskDelay(DELAY / portTICK_PERIOD_MS);
			nvs_close(moduleHandler);	
			ESP_LOGI(tag, "NVS Test number %d Completed, waiting 1 seconds to do next\r\n\r\n", i);			
			
			vTaskDelay(1000 / portTICK_PERIOD_MS);
		}
		
		if (testCompleted == 0)
		{
			ESP_LOGI(tag, "NVS TESTS COMPLETED");
			testCompleted = 1;
		}
		
		vTaskDelay(1000 / portTICK_PERIOD_MS);
    }
}

Re: NVS crashing ESP32 after used a few times

Posted: Sat Feb 04, 2017 9:16 am
by josmunpav
kolban wrote:Howdy there ... for me, the core phrase in your postings is "seems to crash entirely the ESP32". I think I'd like to examine that some more. Can you clarify what you mean by that? Is it a hang or some other symptom? If a true "crash" then we should anticipate seeing an exception and if we get an exception we can get a core dump and determine exactly where the exception is occurring ... and from there work backwards to the cause.

What I think I'd like to hear from you are the details of the exception ... and if the exception is indeed an ESP32 crash then let us further look at capturing a core dump ... see http://esp-idf.readthedocs.io/en/latest/core_dump.html

Once we get those details, the puzzle may open up for an easy resolution ... and if not obvious, at least it will start to point us in a better direction.
The app does not restart, just hangs in there, showing on the UART crazy simbols.

I am currently using Slober IDE (Arduino plugin for Eclipse) so not sure I can run the menuconfig or something similar. I need to investigate this option, but thanks for the tip.

Re: NVS crashing ESP32 after used a few times

Posted: Sat Feb 04, 2017 12:07 pm
by josmunpav
Some light at least!

I have found what seems to be a stable solution, when both tasks run at the same core, at least from my preliminary tests. Basically I am suspending the main task and also using critical section while the system update occurs in the second task (this is where the NVS writing happens). Something like:

Code: Select all

/** Waits until the System is ready for an Update */
void UpdateSystemHelper::waitForSystemUpdatePreparation()
{
	Serial.println("UPDATE STARTED");
	updateRequestACK = false;
	updateInProgress = true;
	delay(50);

	// Wait for Update ACK
	while(!updateRequestACK) // This flag is set by the main app (not the caller to the update)
	{
		Serial.println("Update task waiting for Request ACK");
		delay(100);
	}

	vTaskEnterCritical(&mux);
	delay(50);
	suspendMainTask();
	delay(50);
}


/** Calls the Update to be completed */
void UpdateSystemHelper::updateCompleted()
{
	Serial.println("UPDATE COMPLETED");
	updateInProgress = false;
	updateRequestACK = false;
	vTaskExitCritical(&mux);
	delay(50);
	resumeMainTask();
	delay(50);
}

boolean UpdateSystemHelper::suspendMainTask()
{
	Serial.println("Suspending Main Task");
	vTaskSuspend(mainTaskHandler);
	return true;

}

boolean UpdateSystemHelper::resumeMainTask()
{
	Serial.println("Resuming Main Task");
	vTaskResume(mainTaskHandler);
	return true;
}


The secondary task, in this case REST API which is called by the web app calls the update, suspends the main task, make any changes to the NVS and keeps going by resuming the main task again. Seems pretty fast and reliable. E.g.:

Code: Select all


// This code happens in the secondary task (web server)
updateSystem->waitForSystemUpdatePreparation();
esp_err_t opResult = configHandler->saveConfiguration(); // this is where NVS writing happens
updateSystem->updateCompleted();

// This code happens in the main task 
// Check for updates
if (updateSystem.isUpdateInProgress())
{
      updateSystem.acknowledgeUpdateRequest();
      while(updateSystem.isUpdateInProgress())
            delay(250);
}


With this I have been able to save a series of structs about 2-5KB in size without issues, and repeating the process several times when was hanging before. Again, only working with same core tasks.

I need to find a workaround for possible deadlocks if the web request fails or something but that should not be a major issue. I hope this finally works.

PS: Btw Kolban I am using your port for Mongoose which is working great in my app. I read on your dedicated post Cesanta already ported it to ESP32 but I will stick with your code which is working pretty good.