Page 1 of 2
[SOLVED] ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() due to nvs_commit() race condition
Posted: Mon Jun 01, 2020 12:55 am
by Charger
Are there any known instances in which an NVS read will fail even though the NVS seems to contain valid data?
Using a 64 KiB NVS partition to store 250 B blobs, the NVS works for a while, but after some seemingly random number of reads/writes/reboots the nvs_get_blob() will return 4354 (ESP_ERR_NVS_NOT_FOUND, "Id namespace doesn't exist yet and mode is NVS_READONLY"). Once the NVS gets into this state, it seems to subsequently keep returning the same error even if the device is rebooted. Critical sections of the code:
Code: Select all
// Initialisation and read (called from the start of app_main()):
esp_err_t ret = nvs_flash_init_partition(NVS_PARTITION_NAME);
nvs_handle handle;
size_t sz = sizeof(dev_context.dev);
ESP_ERROR_CHECK(nvs_open_from_partition(NVS_PARTITION_NAME, "device_info", NVS_READWRITE, &handle));
esp_err_t err = nvs_get_blob(handle, "info", &dev_context.dev, &sz); // Failure here sometimes
// Write (in a helper function):
ESP_ERROR_CHECK(nvs_open_from_partition(NVS_PARTITION_NAME, "device_info", NVS_READWRITE, &handle));
ESP_ERROR_CHECK(nvs_set_blob(handle, "info", &dev_context.dev, sz));
ESP_ERROR_CHECK(nvs_commit(handle));
nvs_close(handle);
// Deinitialisation (only at the end of app_main()):
nvs_flash_deinit_partition(NVS_PARTITION_NAME);
Anecdotally, the error may be associated with sleep and/or restarts. Is it possible the NVS is being corrupted?
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Mon Jun 01, 2020 4:43 am
by nickname
Hi Charger,
Could you try to use different handle instances for `nvs_get_blob()` and `nvs_set_blob` or alternatively close the handle after the first `nvs_set_blob()`? Not sure whether that's related though. The error `ESP_ERR_NVS_NOT_FOUND` may in fact occur in different situations than "Id namespace doesn't exist yet and mode is NVS_READONLY". I think that's a mistake in the documentation.
The way it's designed, NVS is supposed to not get into an inconsistent state even during sudden power losses. If it is indeed in an inconsistent state, it's a bug.
How large are the blobs you're trying to save?
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Mon Jun 01, 2020 5:23 am
by Charger
The blob is 250 bytes in size ... not huge, but probably bigger than the original key:value intention for NVS. I use it to serialise a struct with all the persistent settings.
That's a great idea with the handles. I actually call open/get/close and open/set/commit/close every time. Might try also moving the nvs_flash_init_partition() and nvs_flash_deinit_partition() calls into the getter and setter.
Not sure if it's worth anything, but when I call nvs_get_stats() in this error condition, it reports that used_entries == 0 and namespace_count == 1 (presumably "device_info" from the open). As if the NVS had never been used. But reading the flash out shows all the writes up to the point of the error, in the first n blocks.
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Mon Jun 01, 2020 6:22 am
by nickname
Hi Charger,
this looks a bit more like an issue in nvs. used_entries shouldn't be 0 if you set values already. The partition should be large enough, in particular if you only save one of these entries on the whole partition.
Is it possible to read out the partition in this state and send it to us so we can analyze it? You can post it here directly. If you are concerned that it might leak too much information, you can also send it to me or ESP_igrr directly.
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Mon Jun 01, 2020 7:31 am
by Charger
Sure! Thank you for offering to take a look. I've attached the bin file as copied from the 64 KiB NVS partition using esptool.
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Wed Jun 03, 2020 9:52 am
by ESP_jakob
Hi Charger,
This is nickname, but I changed my account name to reflect my relationship to Espressif a bit better...
did this nvs issue happen only on one device so far or on multiple devices?
Best,
Jakob
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Thu Jun 04, 2020 12:35 am
by Charger
Hi Jacob! Unfortunately this happens on all three prototypes.
I've now rewritten our code so instead of serialising a struct it gets/sets each of the struct's members (as ints and blobs) individually. This will decrease the speed at which we use NVS, but I'm not sure if it will resolve the problem or just make it more rare.
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Thu Jun 04, 2020 2:18 am
by ESP_jakob
OK...
The reason why I'm asking is that there seems to be some kind of corruption of the NVS memory which triggers a delete of the blob because the checksum doesn't match anymore. These are the last 32 bytes of the blob data:
Code: Select all
b2e4 b9b6 b8ff ca00 0000 0000 0000 0000 0000 0000 0000 0000 0000 ffff ffff ffff
If you change
to
then the checksum will match again. If we then also correct the entry table of this page to make the blob data entry and the blob data itself readable again (changing the entry state from 0b00 -> 0b10), then the read works again. I did exactly that with the partition in the attachment.
However, we're currently not sure what caused this corruption, so we'll continue investigating...
Please let us know if the other approach works fine, in particular if it doesn't use blobs anymore, as this would indicate some problem with the nvs blob code.
Best,
JaKob
Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?
Posted: Thu Jun 04, 2020 8:30 am
by Charger
Thanks, Jakob (and apologies about the spelling before)! You've pinpointed what appears to be a race condition.
That bit in the blob corresponds to a struct member which is set immediately
after the write helper function returns, i.e. the flow is:
Code: Select all
ESP_ERROR_CHECK(nvs_set_blob(handle, "info", &dev_context.dev, sz));
ESP_ERROR_CHECK(nvs_commit(handle));
nvs_close(handle);
// Helper function returns, after which:
dev_context.dev.is_charging = is_on;
This indicates nvs_commit() returns before the data are successfully committed. nvs_commit()'s documentation states that it returns "ESP_OK if the changes have been written successfully", but the implementation in nvs_api.cpp has "no-op for now, to be used when intermediate cache is added", and indeed doesn't seem to perform a sync.
This solves both the mystery and our problem.
Re: [SOLVED] ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() due to nvs_commit() race condition
Posted: Thu Jun 04, 2020 9:34 am
by ESP_jakob
Happy to hear that!
So you're doing an asynchronous write (with threads/FreeRTOS tasks), correct?
And yes, seems like our documentation needs some more clarity at this point.
Best,
Jakob