Core 0 is Slower
Posted: Fri Apr 14, 2023 9:43 pm
Good afternoon community...
I have set myself an ambitious project in relation to VGA and Esp32 (using a library of course, maybe you know it), but using both Esp32 Cores in their true and maximum computing capacity.
Unfortunately FreeRTOS is a big obstacle to get to such a point, I can tolerate the fact that it consumes a considerable amount of RAM, but what I don't find funny is that it is running SOMETHING behind the scenes (at this point I don't know what it does or so, I don't know if it's the Watchdog, interruptions, some light sub-task, I don't know).
Between my investigations on the internet, I managed to use both Cores without the Watchdog running (or so I think) and without having to use some kind of delay for... I don't know, I don't know why all the examples of the use of dual core must use delays so the Esp32 wouldn't crash, but I managed to avoid that .
But there is a detail... For some unknown reason, Core 0 is a little less efficient than Core 1, I don't know why, would I have missed deactivating something from FreeRTOS? Or perhaps at the hardware level is it less efficient?
Here I show you some Benchmarks that I did, which use the I/O (to see if there are no performance losses for the use of VGA) and two different high consumption tasks (to see if the computing level is affected).
Code: https://gist.github.com/HiperDoo/c87624 ... dd003a73cd
1. Use Core 1 to calculate PI or Prime Numbers and use Core 0 for OUTPUT/INPUT GPIOs. (Values vary ~1ms)
Write timing is slower and Read is slower.
PI timing is perfect and Prime timing is perfect.
2. Use Core 0 to calculate PI or Prime Numbers and use Core 1 for OUTPUT/INPUT GPIOs. (Values vary ~1ms)
Write timing is perfect and Read is perfect.
PI timing is slower and Prime timing is slower.
3. Use Core 1 to calculate PI and use Core 0 to calculate Prime Numbers. (Values vary ~1ms)
PI timing is perfect and Prime timing is slower.
4. Use Core 0 to calculate PI and use Core 1 to calculate Prime Numbers. (Values vary ~1ms)
PI timing is slower and Prime timing is perfect.
5. Using both Cores for READ and WRITE (even though they are different pins), drops performance for both tasks randomly (ie times are ~2000ms apart!!!).
Obviously I will be attentive to your answers, but also to recommendations.
I know that they usually recommend not deactivating the Watchdog (although I don't know why, I haven't seen the reasons), but the least of my problems is that a task doesn't finish executing (that would be my logic problem).
I have set myself an ambitious project in relation to VGA and Esp32 (using a library of course, maybe you know it), but using both Esp32 Cores in their true and maximum computing capacity.
Unfortunately FreeRTOS is a big obstacle to get to such a point, I can tolerate the fact that it consumes a considerable amount of RAM, but what I don't find funny is that it is running SOMETHING behind the scenes (at this point I don't know what it does or so, I don't know if it's the Watchdog, interruptions, some light sub-task, I don't know).
Between my investigations on the internet, I managed to use both Cores without the Watchdog running (or so I think) and without having to use some kind of delay for... I don't know, I don't know why all the examples of the use of dual core must use delays so the Esp32 wouldn't crash, but I managed to avoid that .
But there is a detail... For some unknown reason, Core 0 is a little less efficient than Core 1, I don't know why, would I have missed deactivating something from FreeRTOS? Or perhaps at the hardware level is it less efficient?
Here I show you some Benchmarks that I did, which use the I/O (to see if there are no performance losses for the use of VGA) and two different high consumption tasks (to see if the computing level is affected).
Code: https://gist.github.com/HiperDoo/c87624 ... dd003a73cd
1. Use Core 1 to calculate PI or Prime Numbers and use Core 0 for OUTPUT/INPUT GPIOs. (Values vary ~1ms)
Write timing is slower and Read is slower.
PI timing is perfect and Prime timing is perfect.
Code: Select all
// For the Core 0
>>> BENCHMARK WRITE<<< Core: 0
* digitalWrite(): 6853 ms
* gpio_set_level(): 5927 ms
* GPIO.out_w1ts/t: 1009 ms
* REG_WRITE(): 1009 ms
// OR
>>> BENCHMARK READ <<< Core: 0
* digitalWrite(): 2019 ms
* gpio_get_level(): 1639 ms
* GPIO.in: 505 ms
* REG_READ(): 504 ms
// For the Core 1
>>> BENCHMARK PI <<< Core: 1
* PI: 3.141593
* Time: 38244 ms
// OR
>>> BENCHMARK PRIME <<< Core: 1
* Prime: 25997
* Time: 94846 ms
Write timing is perfect and Read is perfect.
PI timing is slower and Prime timing is slower.
Code: Select all
// For the Core 1
>>> BENCHMARK WRITE <<< Core: 1
* digitalWrite(): 6837 ms
* gpio_set_level(): 5914 ms
* GPIO.out_w1ts/t: 1007 ms
* REG_WRITE(): 1008 ms
// OR
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2013 ms
* gpio_get_level(): 1636 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
// For the Core 0
>>> BENCHMARK PI <<< Core: 0
* PI: 3.141593
* Time: 40368 ms
// OR
>>> BENCHMARK PRIME <<< Core: 0
* Prime: 25997
* Time: 98676 ms
PI timing is perfect and Prime timing is slower.
Code: Select all
>>> BENCHMARK PI <<< Core: 1
* PI: 3.141593
* Time: 38244 ms
>>> BENCHMARK PRIME <<< Core: 0
* Prime: 25997
* Time: 95070 ms
PI timing is slower and Prime timing is perfect.
Code: Select all
>>> BENCHMARK PI <<< Core: 0
* PI: 3.141593
* Time: 38334 ms
>>> BENCHMARK PRIME <<< Core: 1
* Prime: 25997
* Time: 94846 ms
Code: Select all
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2052 ms
* gpio_get_level(): 1701 ms
* GPIO.in: 503 ms
* REG_READ(): 504 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2086 ms
* gpio_get_level(): 1690 ms
* GPIO.in: 503 ms
* REG_READ(): 504 ms
>>> BENCHMARK WRITE <<< Core: 0
* digitalWrite(): 7860 ms
* gpio_set_level(): 7942 ms
* GPIO.out_w1ts/t: 1009 ms
* REG_WRITE(): 1009 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 4076 ms
* gpio_get_level(): 1708 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2051 ms
* gpio_get_level(): 1708 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2074 ms
* gpio_get_level(): 1689 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
>>> BENCHMARK WRITE <<< Core: 0
* digitalWrite(): 8868 ms
* gpio_set_level(): 6934 ms
* GPIO.out_w1ts/t: 1009 ms
* REG_WRITE(): 1009 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2087 ms
* gpio_get_level(): 3698 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
>>> BENCHMARK READ <<< Core: 1
* digitalWrite(): 2051 ms
* gpio_get_level(): 1708 ms
* GPIO.in: 504 ms
* REG_READ(): 504 ms
Obviously I will be attentive to your answers, but also to recommendations.
I know that they usually recommend not deactivating the Watchdog (although I don't know why, I haven't seen the reasons), but the least of my problems is that a task doesn't finish executing (that would be my logic problem).