attribute register or DRAM_ATTR

stoumk
Posts: 12
Joined: Tue May 24, 2022 10:54 am

attribute register or DRAM_ATTR

Postby stoumk » Thu Jul 20, 2023 6:41 am

  1. register uint32_t var1 = 123;
  2. DRAM_ATTR uint32_t var2 = 123;
  3. IRAM_ATTR uint32_t var3 = 123;
I usually use "register", but in ESP32 do I need to use DRAM_ATTR to get the same effect?
Can I use IRAM_ATTR on variables or structures?

MicroController
Posts: 1734
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: attribute register or DRAM_ATTR

Postby MicroController » Thu Jul 20, 2023 9:42 am

I usually use "register"
You most likely shouldn't.
to get the same effect?
The desired effect being...?

ESP_Sprite
Posts: 9766
Joined: Thu Nov 26, 2015 4:08 am

Re: attribute register or DRAM_ATTR

Postby ESP_Sprite » Thu Jul 20, 2023 9:54 am

They do different things. 'register' hints at the compiler that it may be a good idea to allocate a variable in a register (but note that for modern compilers, it's very unlikely that the compiler cares about the hint)

IRAM_ATTR and DRAM_ATTR change the memory type where the thing is allocated: in 32-bit-only memory or in 'standard' memory. DRAM_ATTR is the default, so there's no use in adding it; IRAM_ATTR generally is used for functions and I'm not even sure if it has any effect on variables. IRAM variables come with a bunch of gotchas, though; you'd only need them in very specific circumstances and only when you know what you're doing.

Generally, I'd agree with MicroController: unless you have a very specific and well-defined reason to, you shouldn't use any of these attributes on variables, period.

stoumk
Posts: 12
Joined: Tue May 24, 2022 10:54 am

Re: attribute register or DRAM_ATTR

Postby stoumk » Tue Jul 25, 2023 4:53 am

My goal: to get the fastest possible calculations
I measure the period of the pin, do the calculation and raise another pin.

MicroController
Posts: 1734
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: attribute register or DRAM_ATTR

Postby MicroController » Tue Jul 25, 2023 3:42 pm

stoumk wrote:
Tue Jul 25, 2023 4:53 am
My goal: to get the fastest possible calculations
Ok.
As Sprite said, compilers today are pretty good especially when it comes to register allocation. The compiler is well aware that accessing RAM is more expensive than bare register access and it can and will reduce RAM accesses very efficiently.

If you can share your time-critical function code we may be able to give you a hint on potential manual optimizations, if there are any.

Also, I find it useful and instructive to use the Compiler Explorer to quickly experiment and see which variations of a piece of code generate what assembler output.

stoumk
Posts: 12
Joined: Tue May 24, 2022 10:54 am

Re: attribute register or DRAM_ATTR

Postby stoumk » Tue Jul 25, 2023 7:09 pm

line 104
  1. #define GPIO_Set(x)             REG_WRITE(GPIO_OUT_W1TS_REG, 1<<x)
  2. #define GPIO_Clear(x)           REG_WRITE(GPIO_OUT_W1TC_REG, 1<<x)
  3. #define GPIO_IN_Read(x)         REG_READ(GPIO_IN_REG) & (1 << x)
  4. #define GPIO_IN_ReadAll()       REG_READ(GPIO_IN_REG)
  5. #define GPIO_IN_ReadAll2()      REG_READ(GPIO_IN1_REG)  
  6.  
  7. #define DEF_PIN_1 GPIO_NUM_16
  8. #define DEF_PIN_2 GPIO_NUM_17
  9.  
  10. #if CONFIG_ESP32_DEFAULT_CPU_FREQ_80
  11. #define NS_PER_CLK 12.5F
  12. #endif
  13. #if CONFIG_ESP32_DEFAULT_CPU_FREQ_160
  14. #define NS_PER_CLK 6.25F
  15. #endif
  16. #if CONFIG_ESP32_DEFAULT_CPU_FREQ_240
  17. #define NS_PER_CLK 4.166F
  18. #endif
  19.  
  20. typedef struct
  21. {
  22.     uint32_t _us;
  23.     uint32_t delay_ns;
  24.     float kef;
  25. } TimingsMath_t;
  26. volatile TimingsMath_t DRAM_ATTR TimingsMath[20];
  27.  
  28. typedef struct
  29. {
  30.     uint32_t _us;
  31.     uint32_t delay_ns;
  32.     float kef;
  33. } Delay_t;
  34.  
  35. void IRAM_ATTR Core1_( void* p) {
  36.  
  37.     portDISABLE_INTERRUPTS();  
  38.  
  39.     for (;;) {
  40.         uint32_t gpioValue = 0;
  41.         uint8_t sensor_1_status = 0;
  42.         uint8_t sensor_2_status = 0;
  43.          
  44.         Delay_t tablePoint[20];
  45.         for (size_t i = 0; i < 20; i++)
  46.         { // copy
  47.             tablePoint[i]._us = TimingsMath[i]._us;
  48.             tablePoint[i].delay_ns = TimingsMath[i].delay_ns;
  49.             tablePoint[i].kef = TimingsMath[i].kef;
  50.         }
  51.  
  52.         register uint32_t _dT_us_prev_sens1  = 120UL;
  53.         register uint32_t _dT_us_prev_sens2  = 120UL;
  54.         register uint32_t _dT_us_current  = 1000000;
  55.         register uint32_t _dT_us_first  = 120000;
  56.         register uint32_t _baseDelay_ns = 11111;
  57.         register uint32_t delay_us = 11111;
  58.         uint8_t _start = 4;
  59.          
  60.         register float kefK = 0;
  61.  
  62.         int8_t active_pin_ = -1;    
  63.         for (;;){  
  64.             gpioValue = GPIO_IN_ReadAll(); //10tik@160MHz(6,25ns) 7tik@80MHz(8,75ns)
  65.  
  66.             if( ((gpioValue >> 22) & 1) == 1){
  67.                 sensor_1_status = 5;
  68.             }
  69.             if(sensor_1_status && --sensor_1_status == 0){  
  70.  
  71.                 _dT_us_current = micros() - _dT_us_prev_sens1;
  72.                 _dT_us_prev_sens1 = micros();  
  73.                 active_pin_ = DEF_PIN_1;
  74.             }
  75.            
  76.  
  77.             if( ((gpioValue >> 23) & 1) == 1){ //   HIGT/
  78.                 sensor_2_status = 5;
  79.             }
  80.             if(sensor_2_status && --sensor_2_status == 0){
  81.            
  82.                 _dT_us_current = micros() - _dT_us_prev_sens2;
  83.                 _dT_us_prev_sens2 = micros();  
  84.                 active_pin_ = DEF_PIN_2  ;
  85.             }
  86.  
  87.  
  88.             if(_start > 0){
  89.                 --_start;  
  90.                 continue;
  91.             }
  92.  
  93.  
  94.             for (int8_t i = 19; i >= 0 ; i--)
  95.             {  
  96.                 if( _dT_us_current <= tablePoint[i]._us ){
  97.                     _dT_us_first = tablePoint[i]._us;
  98.                     _baseDelay_ns = tablePoint[i].delay_ns;
  99.                     kefK = tablePoint[i].kef;
  100.                     break;
  101.                 }
  102.             }
  103.            
  104.             delay_us = (_baseDelay_ns - (uint32_t)((float)(_dT_us_first - _dT_us_current) *kefK)) /1000 ;  
  105.  
  106.             // uint32_t ccnt_st_ = XTHAL_GET_CCOUNT(); // 6,25ns
  107.             // while( (XTHAL_GET_CCOUNT() - ccnt_st_ ) < delay_us/NS_PER_CLK ); // 1000ns/6,25=160clk  
  108.             esp_rom_delay_us( delay_us );
  109.              
  110.             GPIO_Set(active_pin_);
  111.            
  112.             esp_rom_delay_us(50);
  113.             GPIO_Clear(active_pin_);
  114.             esp_rom_delay_us(150);
  115.             active_pin_ = -1;
  116.         }
  117.     }
  118.     vTaskDelete(NULL);
  119. }

ESP_Sprite
Posts: 9766
Joined: Thu Nov 26, 2015 4:08 am

Re: attribute register or DRAM_ATTR

Postby ESP_Sprite » Wed Jul 26, 2023 3:07 am

I'd try to get rid of those floats if you can. It's usually a fair bit faster to use fixed-point integer logic. Another hint: profile. Unless you know how fast a given iteration of your code runs, it's impossible to see if an optimization actually optimized stuff or made it worse.

MicroController
Posts: 1734
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: attribute register or DRAM_ATTR

Postby MicroController » Wed Jul 26, 2023 9:54 am

Definitely get rid of floating point calculations in the critical loop.

Agreeing with Sprite again, you can use XTHAL_GET_CCOUNT() or esp_cpu_get_cycle_count() to measure the (average) time of an iteration (excluding esp_rom_delay_us(...) of course) and see if/how much it improves.

Some minor suggestions:
- Call micros() only once per iteration
- Check if you can pull the initial delay (_start) to before the actual processing loop.
- Use 32-bit integer variables instead of 8- or 16-bit
- Comparing to 0 may be faster than to non-zero values: if ( (x&1) == 1 ) -> if ( (x&1) != 0 )

stoumk
Posts: 12
Joined: Tue May 24, 2022 10:54 am

Re: attribute register or DRAM_ATTR

Postby stoumk » Thu Jul 27, 2023 5:27 pm

Thanks. I tried to take measurements in this way. In any case, I get 1 tick for different inputs to the formula.
  1. start = XTHAL_GET_CCOUNT();
  2. delay_us = (_baseDelay_ns - (uint32_t)((float)(_dT_us_first - _dT_us_current) *kefK)) /1000 ;  
  3. start2 = XTHAL_GET_CCOUNT();
  4. startDiff = start2 - start  ;
  5. portENABLE_INTERRUPTS();
  6. printf("bench:\t%lu\n", startDiff);
  7. portDISABLE_INTERRUPTS();  
UPDATE:
portGET_RUN_TIME_COUNTER_VALUE instead of XTHAL_GET_CCOUNT gives 12-16 ticks

MicroController
Posts: 1734
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: attribute register or DRAM_ATTR

Postby MicroController » Thu Jul 27, 2023 6:42 pm

Two thoughts:
1. XTHAL_GET_CCOUNT seems to be defined to do a direct register read, while portGET_RUN_TIME_COUNTER_VALUE (confusingly) maps to the ROM function xthal_get_ccount(); hence, you'd get more accurate results from XTHAL_GET_CCOUNT,
BUT
2. This kind of micro-benchmark may be foiled by the compiler: The compiler is allowed to re-order instructions/operations under certain conditions; in this case, the compiler may (and probably does) re-order your calculation to before or after the timed period - because it cannot see a dependency between start, start2 and delay_us. (There is no way the CPU could have actually done your calculation in 1 clock cycle...)

You can either try and time a bigger section of code, e.g. most of a full iteration, or use some 'hacks' to 'fool' the compiler,
e.g.

Code: Select all

// Make the compiler believe that some volatile magic happens to "value"
// to prevent reordering of code/expression w.r.t. other "volatiles".
static uint32_t vltle(uint32_t value) {
    // This assembly is empty (does nothing), but tells the compiler that it may have changed ("+r") the contents of "value"
    asm volatile ( "\n" : "+r" (value) );
    return value;
}

...
start = XTHAL_GET_CCOUNT();
delay_us = vltle( (_baseDelay_ns - (uint32_t)((float)( vltle( _dT_us_first ) - _dT_us_current) *kefK)) /1000  );
start2 = XTHAL_GET_CCOUNT();
(vltle used twice above to ensure that the calculation does not "start" before the first XTHAL_GET_CCOUNT and has produced its result before the second XTHAL_GET_CCOUNT)

Who is online

Users browsing this forum: Google [Bot] and 58 guests