[SOLVED] ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() due to nvs_commit() race condition

Charger
Posts: 8
Joined: Mon Jun 01, 2020 12:25 am
Location: Australia

[SOLVED] ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() due to nvs_commit() race condition

Postby Charger » Mon Jun 01, 2020 12:55 am

Are there any known instances in which an NVS read will fail even though the NVS seems to contain valid data?

Using a 64 KiB NVS partition to store 250 B blobs, the NVS works for a while, but after some seemingly random number of reads/writes/reboots the nvs_get_blob() will return 4354 (ESP_ERR_NVS_NOT_FOUND, "Id namespace doesn't exist yet and mode is NVS_READONLY"). Once the NVS gets into this state, it seems to subsequently keep returning the same error even if the device is rebooted. Critical sections of the code:

Code: Select all

// Initialisation and read (called from the start of app_main()):
    esp_err_t ret = nvs_flash_init_partition(NVS_PARTITION_NAME);
    nvs_handle handle;
    size_t sz = sizeof(dev_context.dev);
    ESP_ERROR_CHECK(nvs_open_from_partition(NVS_PARTITION_NAME, "device_info", NVS_READWRITE, &handle));
    esp_err_t err = nvs_get_blob(handle, "info", &dev_context.dev, &sz);	// Failure here sometimes

// Write (in a helper function):
    ESP_ERROR_CHECK(nvs_open_from_partition(NVS_PARTITION_NAME, "device_info", NVS_READWRITE, &handle));
    ESP_ERROR_CHECK(nvs_set_blob(handle, "info", &dev_context.dev, sz));
    ESP_ERROR_CHECK(nvs_commit(handle));
    nvs_close(handle);

// Deinitialisation (only at the end of app_main()):
    nvs_flash_deinit_partition(NVS_PARTITION_NAME);
Anecdotally, the error may be associated with sleep and/or restarts. Is it possible the NVS is being corrupted?
Last edited by Charger on Thu Jun 04, 2020 8:31 am, edited 1 time in total.

nickname
Posts: 5
Joined: Sat Sep 28, 2019 2:37 am

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby nickname » Mon Jun 01, 2020 4:43 am

Hi Charger,

Could you try to use different handle instances for `nvs_get_blob()` and `nvs_set_blob` or alternatively close the handle after the first `nvs_set_blob()`? Not sure whether that's related though. The error `ESP_ERR_NVS_NOT_FOUND` may in fact occur in different situations than "Id namespace doesn't exist yet and mode is NVS_READONLY". I think that's a mistake in the documentation.

The way it's designed, NVS is supposed to not get into an inconsistent state even during sudden power losses. If it is indeed in an inconsistent state, it's a bug.

How large are the blobs you're trying to save?

Charger
Posts: 8
Joined: Mon Jun 01, 2020 12:25 am
Location: Australia

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby Charger » Mon Jun 01, 2020 5:23 am

The blob is 250 bytes in size ... not huge, but probably bigger than the original key:value intention for NVS. I use it to serialise a struct with all the persistent settings.

That's a great idea with the handles. I actually call open/get/close and open/set/commit/close every time. Might try also moving the nvs_flash_init_partition() and nvs_flash_deinit_partition() calls into the getter and setter.

Not sure if it's worth anything, but when I call nvs_get_stats() in this error condition, it reports that used_entries == 0 and namespace_count == 1 (presumably "device_info" from the open). As if the NVS had never been used. But reading the flash out shows all the writes up to the point of the error, in the first n blocks.

nickname
Posts: 5
Joined: Sat Sep 28, 2019 2:37 am

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby nickname » Mon Jun 01, 2020 6:22 am

Hi Charger,

this looks a bit more like an issue in nvs. used_entries shouldn't be 0 if you set values already. The partition should be large enough, in particular if you only save one of these entries on the whole partition.
Is it possible to read out the partition in this state and send it to us so we can analyze it? You can post it here directly. If you are concerned that it might leak too much information, you can also send it to me or ESP_igrr directly.

Charger
Posts: 8
Joined: Mon Jun 01, 2020 12:25 am
Location: Australia

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby Charger » Mon Jun 01, 2020 7:31 am

Sure! Thank you for offering to take a look. I've attached the bin file as copied from the 64 KiB NVS partition using esptool.
Attachments
nvs_backup_at_time_of_nvs_get_blob_failure.zip
(2.8 KiB) Downloaded 571 times

ESP_jakob
Posts: 49
Joined: Mon Jun 01, 2020 6:28 am

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby ESP_jakob » Wed Jun 03, 2020 9:52 am

Hi Charger,

This is nickname, but I changed my account name to reflect my relationship to Espressif a bit better...

did this nvs issue happen only on one device so far or on multiple devices?

Best,
Jakob

Charger
Posts: 8
Joined: Mon Jun 01, 2020 12:25 am
Location: Australia

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby Charger » Thu Jun 04, 2020 12:35 am

Hi Jacob! Unfortunately this happens on all three prototypes.

I've now rewritten our code so instead of serialising a struct it gets/sets each of the struct's members (as ints and blobs) individually. This will decrease the speed at which we use NVS, but I'm not sure if it will resolve the problem or just make it more rare. :?

ESP_jakob
Posts: 49
Joined: Mon Jun 01, 2020 6:28 am

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby ESP_jakob » Thu Jun 04, 2020 2:18 am

OK...

The reason why I'm asking is that there seems to be some kind of corruption of the NVS memory which triggers a delete of the blob because the checksum doesn't match anymore. These are the last 32 bytes of the blob data:

Code: Select all

b2e4 b9b6 b8ff ca00 0000 0000 0000 0000 0000 0000 0000 0000 0000 ffff ffff ffff
If you change

Code: Select all

ca00
to

Code: Select all

 ca01
then the checksum will match again. If we then also correct the entry table of this page to make the blob data entry and the blob data itself readable again (changing the entry state from 0b00 -> 0b10), then the read works again. I did exactly that with the partition in the attachment.

However, we're currently not sure what caused this corruption, so we'll continue investigating...

Please let us know if the other approach works fine, in particular if it doesn't use blobs anymore, as this would indicate some problem with the nvs blob code.

Best,
JaKob
Attachments
nvs_fixed.zip
(2.89 KiB) Downloaded 639 times

Charger
Posts: 8
Joined: Mon Jun 01, 2020 12:25 am
Location: Australia

Re: Random ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() sometimes?

Postby Charger » Thu Jun 04, 2020 8:30 am

Thanks, Jakob (and apologies about the spelling before)! You've pinpointed what appears to be a race condition.

That bit in the blob corresponds to a struct member which is set immediately after the write helper function returns, i.e. the flow is:

Code: Select all

    ESP_ERROR_CHECK(nvs_set_blob(handle, "info", &dev_context.dev, sz));
    ESP_ERROR_CHECK(nvs_commit(handle));
    nvs_close(handle);
    // Helper function returns, after which:
    dev_context.dev.is_charging = is_on;
This indicates nvs_commit() returns before the data are successfully committed. nvs_commit()'s documentation states that it returns "ESP_OK if the changes have been written successfully", but the implementation in nvs_api.cpp has "no-op for now, to be used when intermediate cache is added", and indeed doesn't seem to perform a sync.

This solves both the mystery and our problem. :D

ESP_jakob
Posts: 49
Joined: Mon Jun 01, 2020 6:28 am

Re: [SOLVED] ESP_ERR_NVS_NOT_FOUND from nvs_get_blob() due to nvs_commit() race condition

Postby ESP_jakob » Thu Jun 04, 2020 9:34 am

Happy to hear that!

So you're doing an asynchronous write (with threads/FreeRTOS tasks), correct?
And yes, seems like our documentation needs some more clarity at this point.

Best,
Jakob

Who is online

Users browsing this forum: Bing [Bot] and 323 guests