Protect counter variable by portENTER_CRITICAL vs <atomic>

chr_te · Postby **chr_te** » Thu Apr 11, 2024 11:41 pm

Hey everyone,

I have a counter variable (int type). This counter is incremented from within an ISR. Outside of an ISR that variable is read (for an <=int32 that’s atomic), but also decremented an amount of n (so that’s a read modify write).

From what I understand, the proper ESP-IDF FreeRTOS way is to protect the variable access by portENTER_CRITICAL(&spinLock), whenever I’m rmw-ing.
Now, I’ve stumbled upon the c++ <atomic> header. Is this feature properly implemented within the esp32? From what I understand, I don’t need a spinlock, or the critical section in this case, but does it also really protect the variable in the multiprocessing case?

I’ve created this simplified example, implementing both methods. The first one with protection by critical section and spinlock, the second one with protection by <atomic>.

Code: Select all

#include <Arduino.h>
#include "FunctionalInterrupt.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/semphr.h"
#include "spinlock.h"
#include <atomic>

#define SER USBSerial // ESP32 S3 integrated usb serial


// TODO: use ARDUINO_ISR_ATTR or IRAM_ATTR to place the function in IRAM???

class MyHardware {
public:
    // worker binary semaphore
    SemaphoreHandle_t _workerSemaphore;
    void link();

    // Version 1: Spinlock and critical section protected
    uint8_t counter1Get();
    void counter1Reduce(uint8_t amount);

    // Version 2: Using atomic header
    uint8_t counter2Get();
    void counter2Reduce(uint8_t amount);

private:
    unsigned long _debounce_delay;

    // Version 1: Spinlock and critical section protected
    volatile uint8_t _counter1;
    unsigned long _last_debounce1;
    portMUX_TYPE _counter1_spinLock;
    void IRAM_ATTR _myInterrupt1();

    // Version 2: Using atomic header
    std::atomic_uint8_t _counter2; //TODO: volatile needed for atomic?
    unsigned long _last_debounce2;
    void IRAM_ATTR _myInterrupt2();
};


void MyHardware::link() {
    // Must be called within or after setup()
    _workerSemaphore = xSemaphoreCreateBinary();
    _debounce_delay = 30; // in ms, compared to millis()

    // Version 1: Spinlock and critical section protected
    _counter1 = 0;
    _last_debounce1 = 0;
    _counter1_spinLock = portMUX_INITIALIZER_UNLOCKED;
    pinMode(4, INPUT_PULLDOWN);
    attachInterrupt(4,
                    std::bind(&MyHardware::_myInterrupt1, this),
                    RISING);
    // Version 2: Using atomic header
    _counter2 = 0;
    _last_debounce2 = 0;
    pinMode(5, INPUT_PULLDOWN);
    attachInterrupt(5,
                    std::bind(&MyHardware::_myInterrupt2, this),
                    RISING);
}

/******** Version 1 member functions: spinlock and critical section ***********/
void IRAM_ATTR MyHardware::_myInterrupt1() {
    unsigned long current_time = millis();
    if (current_time - _last_debounce1 < _debounce_delay) {
        return;
    }
    _last_debounce1 = current_time;
    portENTER_CRITICAL_ISR(&_counter1_spinLock);
    _counter1++;
    portEXIT_CRITICAL_ISR(&_counter1_spinLock);
    if (_counter1 >= 10) {
        BaseType_t xHigherPriorityTaskWoken = pdFALSE;
        xSemaphoreGiveFromISR(_workerSemaphore, &xHigherPriorityTaskWoken);
        if (xHigherPriorityTaskWoken == pdTRUE) {
            portYIELD_FROM_ISR();
        }
    }
}

uint8_t MyHardware::counter1Get() {
    return _counter1;
}

void MyHardware::counter1Reduce(uint8_t amount) {
    portENTER_CRITICAL(&_counter1_spinLock);
    _counter1 -= amount;
    portEXIT_CRITICAL(&_counter1_spinLock);
}

/************* Version 2 member functions: Using atomic header ****************/
void IRAM_ATTR MyHardware::_myInterrupt2() {
    unsigned long current_time = millis();
    if (current_time - _last_debounce2 < _debounce_delay) {
        return;
    }
    _last_debounce2 = current_time;
    // _counter2++ would use memory_order_seq_cst by default (slower)
    _counter2.fetch_add(1, std::memory_order_relaxed);
    if (_counter2.load(std::memory_order_relaxed) >= 10) {
        BaseType_t xHigherPriorityTaskWoken = pdFALSE;
        xSemaphoreGiveFromISR(_workerSemaphore, &xHigherPriorityTaskWoken);
        if (xHigherPriorityTaskWoken == pdTRUE) {
            portYIELD_FROM_ISR();
        }
    }
}

uint8_t MyHardware::counter2Get() {
    // return _counter2 would use memory_order_seq_cst by default (slower)
    return _counter2.load(std::memory_order_relaxed);
}

void MyHardware::counter2Reduce(uint8_t amount) {
    _counter2.fetch_sub(amount, std::memory_order_relaxed);
}

/******************************************************************************/

void genericTask(void *pvParameters) {
    SER.println("Task started");
    MyHardware* my_hw = static_cast<MyHardware*>(pvParameters);
    uint8_t counter1, counter2;
    while (1) {
        if (xSemaphoreTake(my_hw->_workerSemaphore, portMAX_DELAY) == pdTRUE) {
            counter1 = my_hw->counter1Get();
            counter2 = my_hw->counter2Get();
            SER.print("\nCounter 1: "); SER.print(counter1);
            SER.print(", Counter 2: "); SER.println(counter2);
            if (counter1 >= 10) {
                // some work here
                my_hw->counter1Reduce(10);
                SER.println("Counter 1 reduced by 10");
            }
            if (counter2 >= 10) {
                // some work here
                my_hw->counter2Reduce(10);
                SER.println("Counter 2 reduced by 10");
            }
            SER.print("Counter 1: "); SER.print(my_hw->counter1Get());
            SER.print(", Counter 2: "); SER.println(my_hw->counter2Get());
        }
    }
}


MyHardware g_MY_HW;

void setup() {
    SER.begin(115200);
    g_MY_HW.link();

    // Create task, pin to core 1
    xTaskCreatePinnedToCore(
        genericTask,   // Task function
        "genericTask", // Name of the task (for debugging)
        2048,          // Stack size (bytes)
        &g_MY_HW,      // Task input parameter
        1,             // Priority of the task
        NULL,          // Task handle
        1              // Core
    );
}

void loop() {}

In the example high going pulses on pin 4 and 5 are counted, I’ve coded this out in an object oriented way that I would use, for IC-drivers connected to the ESP…
The program seems to work as intended, but that doesn’t mean anything certain.

Are both methods (multiprocessing) safe and correctly done like that? Is there a (performance?) downside to using <atomic>?
Also, do I need the volatile keyword for the atomic variable? Probably not... And on a side note, should I use ARDUINO_ISR_ATTR or IRAM_ATTR, or is it just the same?

Background:
I’m working with an ESP32-S3 and the Arduino core. I’ve just gotten started with FreeRTOS (I’m somewhat aware of ESP IDF’s FreeRTOS change due to SMP) and trying to figure out how to properly manage concurrency etc.

Thank you in advance!

Postby **ESP_Sprite** » Fri Apr 12, 2024 2:33 am

Atomics are supported, and as long as you use a 32-bit atomic, access is a lot faster as the hardware will take care of atomicity rather than needing code for critical sections.

chr_te · Postby **chr_te** » Fri Apr 12, 2024 9:04 am

Thank you! And I probably don't need the volatile keyword, right?
Another maybe naive question, but I just want to make sure. Same of course goes for 16-bit and 8-bit atomics, that they are handeled in hardware?

chr_te · Postby **chr_te** » Fri Apr 12, 2024 5:29 pm

To answer my own question: We can use the member function is_lock_free() on objects of <atomic> and it will tell us if the associated atomicity is implemented without any software locking etc.
This is the case for (u)int8, (u)int16 and (u)int32 types, as shown by the little example:

Code: Select all

#include <Arduino.h>
#include <atomic>
#define SER USBSerial

void setup() {
    // put your setup code here, to run once:
    SER.begin(9600);

    // Sleep (time for monitoring the serial output)
    delay(5000);

    SER.println("Atomic test:");

    std::atomic_uint8_t my_uint8{0};
    SER.print("my_uint is lock free: ");
    SER.println(my_uint8.is_lock_free());

    std::atomic_uint16_t my_uint16{0};
    SER.print("my_uint is lock free: ");
    SER.println(my_uint16.is_lock_free());

    std::atomic_uint32_t my_uint32{0};
    SER.print("my_uint is lock free: ");
    SER.println(my_uint32.is_lock_free());

    // It does not compile for uint64
    // std::atomic_uint64_t my_uint64{0};
    // SER.print("my_uint is lock free: ");
    // SER.println(my_uint64.is_lock_free());
}

void loop() {
  // put your main code here, to run repeatedly:
}

Compiling with e.g. uint64_t does not work, which was to be expected, as a int64 exceeds the esp32's native word width...

MicroController · Postby **MicroController** » Fri Apr 12, 2024 6:22 pm

And I probably don't need the volatile keyword, right?

No, you don't need volatile for the C++ atomics.

Same of course goes for 16-bit and 8-bit atomics, that they are handeled in hardware?

Apparently yes, gcc does use the relevant CPU instruction (S32C1I) for 8-, 16-, and 32-bit atomics.

Btw,

Code: Select all

    _counter2.fetch_add(1, std::memory_order_relaxed);
    if (_counter2.load(std::memory_order_relaxed) >= 10) {

involves two atomic operations, so _counter2 could have been concurrently modified between the add and the load.
If you don't explicitly want/need that, you can use

Code: Select all

if( (_counter2.fetch_add(1, std::memory_order_relaxed) + 1) >= 10) {

which also saves the cost of the additional load.

chr_te · Postby **chr_te** » Fri Apr 12, 2024 10:44 pm

Thank you, @MicroController!

Protect counter variable by portENTER_CRITICAL vs <atomic>

Protect counter variable by portENTER_CRITICAL vs <atomic>

Re: Protect counter variable by portENTER_CRITICAL vs <atomic>

Re: Protect counter variable by portENTER_CRITICAL vs <atomic>

Re: Protect counter variable by portENTER_CRITICAL vs <atomic>

Re: Protect counter variable by portENTER_CRITICAL vs <atomic>

Re: Protect counter variable by portENTER_CRITICAL vs <atomic>

Who is online

About Us

Extra

Information