Funny how one's C++ experience defines what they prefer. I like the inline assembly as there is no need to understand the inner workings of the GCC compiler in order to understand what's happening. ... it also looks like its possible to use the .S extension with the ESP-IDF and compile direct assembly code. But this is not something you even mention so I'm guessing its not your preferred approach.
Well, except for the single inline assembly instruction I wrap (and __attribute__((always_inline))) everything is plain standard C++20
Using standalone assembler files (.S) is the opposite of what I want. Writing
algorithms in assembler on top of manually implementing the ABI, managing register allocation, data types, optimizations &c. is unnecessarily complicated, as it makes you replicate what the compiler will do automatically for you.
Take for example
Code: Select all
rpt(cnt, [&src_p,&dest_p]() {
vld_128_ip<0>(src_p); // Load 16 bytes from RAM into q0, increment src_p
vst_128_ip<0>(dest_p); // Store 16 bytes from q0 to RAM, increment dest_p
});
which lets the compiler do its thing so that I don't have to concern myself with how and where src_p and dest_p come from, or which registers are best to use at that location in the code; gcc can optimize that as it likes.
My current version is wrapping the SIMD instructions in meaningful objects (i.e. vectors) and operator overrides which map naturally to the SIMD instructions (vecA = vecB + vecC;).