Writing code robust to SD card failures

dizcza
Posts: 56
Joined: Tue Sep 07, 2021 6:59 pm

Writing code robust to SD card failures

Postby dizcza » Mon Dec 13, 2021 12:05 pm

Hello ESP community,

I'd like to share my code and ideas on how to continuously read a sensor and write the measurements to an SD card in a robust way. Robustness means that:
1) sensor measurement delays are not affected by long SD card IO commands;
2) all measurements are guaranteed to be saved to an SD card in the correct order.

The post is based on the SD lib issue in arduino-esp32 https://github.com/espressif/arduino-esp32/issues/5998 although the code was originally written in ESP-IDF.

The first criteria obviously demand a thread-based implementation with one thread for reading (polling) a sensor and at least one thread for dumping the measurements to an SD card. Here is the framework I've outlined for doing this:
  • Start two threads: read_sensor (the sender) bound to core 0 and write_data (the receiver) to core 1. It's better to swap the cores on which to run the code because there is some code routine hidden from the developer that is always being run on code 0, but I found that some commands and libraries like Arduino Wire don't like to be run on core 1.
  • The "read_sensor" thread is sensor-specific, but if the data needs to be sampled at a high frequency (> 1000 Hz), you can't use the vTaskDelay() function in the thread. Instead, I
    • start a timer with the resolution you want (say, 500 us)
    • when the timer is triggered, I call "xTaskNotifyGive(read_sensor_task_handle)"
    • inside the "read_sensor" task body, I'm awaiting for "ulTaskNotifyTake(pdTRUE, portMAX_DELAY)".
    But this also won't work for very high frequencies (> 10kHz) because the "ulTaskNotifyTake" is not immediately triggered. I'm having the delays of ~100 us. Maybe it's better to read the sensor directly in the timer function. I'm wondering how audio signals are sampled at a 30kHz rate in ESP32...?
    In either case, the measurements should be sent to a receiving thread ("write_data") via the "xQueueSend()" function.
  • The receiver, "write_data", awaits for messages with the "xQueueReceive()" function and accumulates them in a temporary buffer array. Ideally, the size of this array should be a multiple of the SD card sector size, which is 512 bytes. But I found having the exact match is not crucial. Once the array is filled with the desired number of samples, a write command is issued to a previously open file. That's where I spent the whole week trying different options best for Arduino SD lib and here is the pseudo-code I've come up with:

    Code: Select all

    
    #include "SD.h"
    
    // sometimes we can get away without (long) SD card restarting
    #define SD_WAIT_UNTIL_RESTART_MS 100
    
    typedef struct Record ...
    
    static FILE* open_file() {
        static int trial = 0;
        char fpath[128];
        snprintf(fpath, sizeof(fpath), "/sd/file-%03d.BIN", trial++);
        FILE *file = fopen(fpath, "w");
        int64_t t_last = esp_timer_get_time();
        while (file == NULL) {
            int64_t t_curr = esp_timer_get_time();
            if (t_curr - t_last > SD_WAIT_UNTIL_RESTART_MS * 1000L) {
                // this is Arduino-specific code to restart the SD card
                SD.end();
                while (!SD.begin()) delay(10);
                t_last = t_curr;
            }
            // a delay is needed to notify the WatchDog
            // and let other threads not to miss their work
            vTaskDelay(pdMS_TO_TICKS(10));
            file = fopen(fpath, "w");
        }
        return file;
    }
    
    
    static void write_data() {
        Record records[RECORDS_BUFFER_SIZE];
        FILE *file = open_file();
    
        while (1) {
            // fill in the records here with the xQueueReceive
            // ...
            size_t written_cnt = fwrite(records, sizeof(Record), RECORDS_BUFFER_SIZE, file);
            int fflush_res = fflush(file);
            int fsync_res = fsync(fileno(file));
            while (!(written_cnt == RECORDS_BUFFER_SIZE && fflush_res == 0 && fsync_res == 0)) {
                ESP_LOGE(TAG, "fwrite failed. Reopening...");
                fclose(file);
                file = open_file();
                written_cnt += fwrite(&records[written_cnt], sizeof(Record), RECORDS_BUFFER_SIZE - written_cnt, file);
            }
    
            vTaskDelay(pdMS_TO_TICKS(10));
        }
    }
    
    
The full example code is here: https://github.com/dizcza/M5Core2_SDPSe ... rd_sdp.cpp. Hope someone will find this thread useful to bring the ideas from.

Suggestions? How are you handling file IO errors?

Best,
Danylo

vanBassum
Posts: 68
Joined: Sun Jan 17, 2021 11:59 am

Re: Writing code robust to SD card failures

Postby vanBassum » Mon Dec 13, 2021 3:50 pm

My approch would be something like this:
2 tasks, 1 timer.

Task 1,
Wait for semaphore, when semaphore is set, execute a measurement.
Measurement contains both data and timestamp.
Place the measurement into a messagequeue.

Task 2,
Receive item from message queue.
Write to SD card and retry if fails.

Timer 1,
Set the semaphore at a specific interval.

I guess this matches your idea :)

Besides I would write this to a file on the SD card, not as raw bytes to sectors on the card. This would allow someone to read the card with a PC.

dizcza
Posts: 56
Joined: Tue Sep 07, 2021 6:59 pm

Re: Writing code robust to SD card failures

Postby dizcza » Mon Dec 13, 2021 9:08 pm

vanBassum wrote:
Mon Dec 13, 2021 3:50 pm
My approch would be something like this:
2 tasks, 1 timer.

Task 1,
Wait for semaphore, when semaphore is set, execute a measurement.
Measurement contains both data and timestamp.
Place the measurement into a messagequeue.

Task 2,
Receive item from message queue.
Write to SD card and retry if fails.

Timer 1,
Set the semaphore at a specific interval.
Yes, that is exactly how I'm doing. Though I've read that direct task notification is a bit faster than traditional semaphores https://www.freertos.org/RTOS_Task_Noti ... phore.html. The crucial part of thinking lies in "Write to SD card and retry if fails." For example, I didn't know (1) should I just reopen the file and append data, (2) is it actually possible to reopen a corrupted file, (3) would it make sense to add a checksum after each block is written to an SD card - answering these question took a considerable amount of time. It was especially hard to test them - I needed to trigger a situation where an IO error occurs. Needless to say, the behavior is non-deterministic!

But I'm still at a loss how people handle very large (>10kHz) sampling rates. The approach you outlined with a built-in RTC timer sometimes gives ~100 us delay from one measurement to another. And I don't know whether it's due to the imperfect RTC timer, FreeRTOS interrupts during sensor measurements, or yet other unknown issues... This is perhaps the topic for a separate discussion thread.
vanBassum wrote:
Mon Dec 13, 2021 3:50 pm
Besides I would write this to a file on the SD card, not as raw bytes to sectors on the card. This would allow someone to read the card with a PC.
Sorry for the confusion, I do write in files, not raw sectors. What I mean by adjusting the buffer size to 512 bytes is that it's efficient to write some medium-size data chunk at once

Code: Select all

Record records[RECORDS_BUFFER_SIZE];
fwrite(records, sizeof(Record), RECORDS_BUFFER_SIZE, file);
rather than calling "fwrite" on each sensor measurement which takes only ~10 bytes. Writing small portions (tens of bytes) to an SD card is (1) more time-consuming than buffering up to 500 or 1000 bytes before the write command and (2) will quickly exhaust the SD card.

mikemoy
Posts: 626
Joined: Fri Jan 12, 2018 9:10 pm

Re: Writing code robust to SD card failures

Postby mikemoy » Tue Dec 14, 2021 3:18 am

Sounds like you should use a FRAM IC or the like to store all the data to it. When your ready to put it on a SD card, copy the data from FRAM to the SD card. Writing to a uSD card that often is only going to cause you a problem.

dizcza
Posts: 56
Joined: Tue Sep 07, 2021 6:59 pm

Re: Writing code robust to SD card failures

Postby dizcza » Tue Dec 14, 2021 7:59 am

mikemoy wrote:
Tue Dec 14, 2021 3:18 am
Sounds like you should use a FRAM IC or the like to store all the data to it. When your ready to put it on a SD card, copy the data from FRAM to the SD card. Writing to a uSD card that often is only going to cause you a problem.
Why not use the available DRAM? And having buffer arrays larger than 4092 bytes, which is the maximum transfer size incorporating DMA, won't benefit unless I'm misinterpreting the "spi_bus_config_t.max_transfer_sz" field needed to initialize the SPI bus for an SD peripheral.

vanBassum
Posts: 68
Joined: Sun Jan 17, 2021 11:59 am

Re: Writing code robust to SD card failures

Postby vanBassum » Tue Dec 14, 2021 12:32 pm

But I'm still at a loss how people handle very large (>10kHz) sampling rates. The approach you outlined with a built-in RTC timer sometimes gives ~100 us delay from one measurement to another. And I don't know whether it's due to the imperfect RTC timer, FreeRTOS interrupts during sensor measurements, or yet other unknown issues... This is perhaps the topic for a separate discussion thread.
Yea, at these speeds this isn't an option anymore.
One option would be to do the measurement in an interupt routine, an other option would be to use the hardware if its possible.

I would also use a buffer, So gather mutliple measurements and write them all at once to the card. RAM is the easy solution of course. Also, if you do your measurements from an ISR then your options are limited, RAM meight be the only option.

Closing and opening the files is a bit depending on how long everything takes. Also, if you can assume the SD card isn't removed when the device is turned on. I would personally use multiple files, keep them around a few mb. Then you can also delete the oldest if your out of memory.

Who is online

Users browsing this forum: cdollar, Google [Bot], MicroController and 104 guests