https://godbolt.org/z/6WfPTdh96
Is there a way to force compiler to generate "always the same execution path" code without branch instructions for 64bit arithmetic. In that example, a14 register is always preloaded with value 1. Then comparation in between a10 and a9 is performed, telling should value of a14 has to stay at 1 by skipping next instruction or it should be 0 by executing next instruction. And that makes divergent paths of execution depending on input values, because for some pair of values that SKIP is going to be performed while for other pairs skip is not performed but one extra instruction is executed. That means, execution time of routine, depends on input values which is not acceptable in critical synchronization routines which involve 64bit math.
I know I can write "by hand" such a code by replacing that part with something like:
Code: Select all
sub a14, a10, a9 // if a10 is less than a9 highest bit in a14 is going to be set, otherwise it will be reset after subtraction
srl a14, a14, 31 // logical shift it to first bit position (funneling zeroes into higher bits)
// that is the full replacement leaving 0 or 1 in a14
Code: Select all
main:
entry sp, 48
l32r a8, .LC0
memw
l32i.n a12, sp, 0
memw
l32i.n a10, a8, 0 // modified, exchanged order of loading registers
memw
l32i.n a13, sp, 4 // modified, exchanged order of loading registers
sub a9, a10, a12 // so we can execute this here
memw
l32i.n a11, a8, 4
sub a14, a10, a9 // and here is our replacement (a9 is modified 3 instructions earlier so no stalls)
// movi.n a14, 1
// bltu a10, a9, .L2
// movi.n a14, 0
//.L2:
sub a12, a11, a13
srl a14, a14, 31 // and here is the shift
memw
s32i.n a9, a8, 0
sub a11, a12, a14 // here is the actual usage of a14
memw
s32i.n a11, a8, 4
memw
l32i.n a12, a8, 0
l32r a10, .LC2
memw
l32i.n a13, a8, 4
call8 printf
retw.n
(I wrote this as an example while writing this question just to illustrate the question. I didn't test it, however there is no reason of why it shouldn't work)
But, is there a way to force the compiler to generate such a non-branching code of int64 arithmetic keeping routine execution time constant and independent on input values?