It's always worth checking the disassembly to verify your assumptions about what's happening. Experience has taught me many times that the compiler is usually smarter than I am.
If you write the exact C code posted above, the compiler will warn " warning: variable 'len' set but not used" and "control reaches end of non-void function". If you set it to the ignore these warnings, the compiler will happily optimise out the unused variables and operations and you're testing two no-ops that just call the gpio_set_level() functions and do nothing else.
Here's a main.c file which compiles without warnings:
Code: Select all
#include "esp_attr.h"
#include "driver/gpio.h"
int IRAM_ATTR test1 ( uint32_t length )
{
uint32_t len;
gpio_set_level( (gpio_num_t)25, 1 );
len = length * 4;
gpio_set_level( (gpio_num_t)25, 0 );
return len;
}
int IRAM_ATTR test2 ( uint32_t length )
{
uint32_t len;
gpio_set_level( (gpio_num_t)25, 1 );
len = length << 2;
gpio_set_level( (gpio_num_t)25, 0 );
return len;
}
void app_main(void)
{
test1(33);
test2(33);
}
Built with default config (ie default compiler optimisation level which is -Og (EDIT: fixed level), if you want max performance then set this to
Performance). Now let's look at the disassembly:
(Note I'm using the CMake build system here so main.c compiles to main.c.obj. If using GNU Make, main.c will compile to main.o in a different directory but you can still use objdump to disassemble it.)
Code: Select all
$ xtensa-esp32-elf-objdump -d build/esp-idf/main/CMakeFiles/__idf_main.dir/main.c.obj
build/esp-idf/main/CMakeFiles/__idf_main.dir/main.c.obj: file format elf32-xtensa-le
Disassembly of section .iram1.0.literal:
00000000 <.iram1.0.literal>:
...
Disassembly of section .iram1.1.literal:
00000000 <.iram1.1.literal>:
...
Disassembly of section .literal.app_main:
00000000 <.literal.app_main>:
...
Disassembly of section .iram1.0:
00000000 <test1>:
0: 004136 entry a1, 32
3: 1b0c movi.n a11, 1
5: 9a1c movi.n a10, 25
7: 000081 l32r a8, fffc0008 <test1+0xfffc0008>
a: 0008e0 callx8 a8
d: 1122e0 slli a2, a2, 2
10: 0b0c movi.n a11, 0
12: 9a1c movi.n a10, 25
14: 000081 l32r a8, fffc0014 <test1+0xfffc0014>
17: 0008e0 callx8 a8
1a: f01d retw.n
Disassembly of section .iram1.1:
00000000 <test2>:
0: 004136 entry a1, 32
3: 1b0c movi.n a11, 1
5: 9a1c movi.n a10, 25
7: 000081 l32r a8, fffc0008 <test2+0xfffc0008>
a: 0008e0 callx8 a8
d: 1122e0 slli a2, a2, 2
10: 0b0c movi.n a11, 0
12: 9a1c movi.n a10, 25
14: 000081 l32r a8, fffc0014 <test2+0xfffc0014>
17: 0008e0 callx8 a8
1a: f01d retw.n
Disassembly of section .text.app_main:
00000000 <app_main>:
0: 004136 entry a1, 32
3: 21a0a2 movi a10, 33
6: 000081 l32r a8, fffc0008 <app_main+0xfffc0008>
9: 0008e0 callx8 a8
c: 1a2c movi.n a10, 33
e: 000081 l32r a8, fffc0010 <app_main+0xfffc0010>
11: 0008e0 callx8 a8
14: f01d retw.n
(Note: you can use objdump -S instead of objdump -d if you want to see the source code mixed with disassembly.)
The compiler has noticed that you're multiplying by a power of two and chosen to use an slli (shift left) instruction in both functions. Clever compiler. "Use shift instead of multiple/divide by power of two" was probably useful advice with 80s and maybe 90s compilers, but it's one of the first optimisations a modern C compiler will apply. Unfortunately the "advice" still sticks around.
Regarding using CCOUNT to measure cycles, here's an example:
Code: Select all
#include <stdio.h>
#include "esp_attr.h"
#include "soc/cpu.h"
int IRAM_ATTR test1 ( uint32_t length )
{
uint32_t start, end;
uint32_t len;
RSR(CCOUNT, start);
len = length * 4;
RSR(CCOUNT, end);
printf("Operation executed in %u cycles\n", end - start);
return len;
}
void app_main(void)
{
test1(33);
}
Note that CCOUNT only works if the task is pinned to one core, if the task may migrate cores then use
esp_timer_get_time() function which returns a uint64_t timestamp in microseconds.
Note also that interrupts are still on in these tests, so sometimes a context switch interrupt or other interrupt will happen during the operation and skew the results. Better to run the test a number of times and average out the outliers.
***
ESP_Sprite wrote:
How do you figure the division may be the bottle neck there? Before you try to optimize what you think is the bottleneck, it's probably better to instrument first so you know for sure. You may be surprised with what you find.
This is good advice that ESP_Sprite gave you. If you can post the actual code you're trying to optimise and some performance measurements for it, someone can probably give you some useful tips.
Note that "inline IRAM_ATTR" is probably never what you want, because inline will incorporate the code into the caller - meaning the IRAM_ATTR is ignored as the code ends up in the same memory space as the caller is in (either IRAM or flash). Suggest either "static inline" if you want to incorporate into the caller, or "IRAM_ATTR" by itself if you want just this function to be in IRAM to avoid flash delays.