Question re: sub-microsecond timing, performance tuning
Posted: Tue Jan 14, 2020 1:04 am
I'm working on code to interface to a motor controller with 1us and 200ns setup and hold time requirements. If I do timings via micros(), I'll be wasting a noticeable amount of time (probably 2-3 microseconds due to resolution issues), so decided to roll an alternative using ccount. But, the routine seems to be taking 2-3us in and of itself. So, my questions are (1) is there a better way? (2) Why are these taking so long to run - seems like they should be a few dozen cycles in general for each inlined call, and the ESP32 is set to 240MHz.
The routines:
The test code, in the Arduino setup():
The "immediately" and "another" should be returning VERY fast, with the while loop dropping out almost immediately. But, I tend to see 3-4 microsecond delays. Hand in-lining seems to show that the while loop takes about 1/2 microsecond. Moving the const computations earlier might be helping, but given everything is constants and inlined I'd expect it to be compiled away.
Thoughts?
P.S. The micros() calls are really quick, so don't seem to be the issue.
The routines:
Code: Select all
// Get an initial time, for use as a baseline
__attribute__((always_inline))
uint32_t getBaseTime()
{
return xthal_get_ccount();
}
// Busy-wait until at least the given amount of time has elapsed. Note that if that
// amount of time has already elapsed, returns immediately.
// NOTE: won't work if you have already waited more than about 17 seconds since the baseline, due to counter roll-over.
__attribute__((always_inline))
void waitForElapsedNs(const uint32_t baseline, const uint32_t elapsedTime_ns)
{
const uint32_t cyclesPerUs = XT_CLOCK_FREQ/(1000*1000);
const uint32_t elapsedCcounts = (elapsedTime_ns > 1000*1000) ? (elapsedTime_ns / 1000 * cyclesPerUs) : ((elapsedTime_ns * cyclesPerUs) / 1000);
while ((xthal_get_ccount()-baseline) < elapsedCcounts) {};
}
Code: Select all
uint32_t base = getBaseTime();
unsigned long startTime = micros();
waitForElapsedNs(base, 3000);
unsigned long plus3us = micros();
waitForElapsedNs(base, 3000*1000);
unsigned long plus3ms = micros();
waitForElapsedNs(base, 3000);
unsigned long immediately = micros();
waitForElapsedNs(base, 1000*1000*1000);
unsigned long plus1s = micros();
waitForElapsedNs(base, 3000);
unsigned long another = micros();
Serial.print("3us later "); Serial.println(plus3us - startTime);
Serial.print("3ms laster "); Serial.println(plus3ms - startTime);
Serial.print("Immediately return "); Serial.println(immediately - startTime);
Serial.print("1 second "); Serial.println(plus1s - startTime);
Serial.print("Immediately "); Serial.println(another - startTime);
Thoughts?
P.S. The micros() calls are really quick, so don't seem to be the issue.