Properly using the SPI peripheral to drive a screen

ivan_ca97
Posts: 2
Joined: Wed Nov 24, 2021 2:02 pm

Properly using the SPI peripheral to drive a screen

Postby ivan_ca97 » Fri Feb 02, 2024 7:34 pm

Hello everyone. I'm writing a driver for a 480x320 screen. I guided myself through the examples and managed to get it working to surprisingly good results, but I was wondering if there's a better way to handle this. Right now, I'm using the "spi_device_polling_transmit" function to send 10 480 pixel rows at a time. Which takes about 1000 us to update each row. Which makes sense because since it's 16 bits per pixel, 480 pixels per row, 10 rows per transaction and a clock of 80 MHz, it gives me a total time of:

16*480*10/80 = 960 us per block of 10 rows

Now, I tested that with the gptimer and it actually gives me very close results, each block taking between 980 us and 1010 us (But mostly close to 1000 us), so it makes sense to me, considering aditional delays from the execution, including the handling of the timer itself. The timing worked like this (DebugTimer is an object of a hardware timer class I wrote with many functionalities):
  1. spi_transaction_t Transaction = {};
  2.  
  3. //Transaction struct config
  4. Transtaction.foo_ = foo;
  5. Transtaction.foo1_ = foo1; Transtaction.foo2_ = foo1;
  6.  
  7. spi_device_acquire_bus(...);
  8.  
  9. DebugTimer.Tic();
  10. spi_device_polling_transmit(...);
  11. printf("Time: %lu us\n", DebugTimer.TocUs());
  12.  
  13. spi_device_release_bus(...);
Then, I was wondering if I could make it more efficient somehow and read about the "spi_device_queue_trans" function. I simply replaced "spi_device_polling_transmit" with "spi_device_queue_trans" again and got that each transaction now takes a muuuuuch lower time, about 45 us each block, some a bit longer but every one was below the 80 us mark. Now, however, if I start timing before calling "spi_device_acquire_bus" the first ones take the same time, but after a couple the timing goes to over 800 us, and then it goes back down again.

My conclusion is that "spi_device_queue_trans" buffers the transaction and the SPI peripheral is in charge of sending everything to the device. I read something about that in the esp-idf documentation, but I had asumed that I needed to write code in an ISR to handle the transaction, however this is surprisingly easy. And the reason why it takes little time at first, then longer, and then little again when I start timing before acquiring the bus is that some kind of buffer gets filled with transactions so I just can't buffer them anymore and have to wait.

Now... there are some things I don't fully understand. I don't have any problem if the refresh of the screen takes a few ms longer, but I do have a problem if the CPU is blocked for a few ms trying to send data to the screen.

Is an ISR called by default by the esp-idf framework that does all transactions? If so, is that ISR blocking the execution of other FreeRTOS tasks for a long time? Or are they short lived? From what I read in the esp-idf, a polling transaction keeps the CPU busy, so from what I understand other tasks won't be executed until the polling ends.

This is proving to be a bit confusing to me. I don't mind waiting for the transaction to complete in the task that handles the transaction. But I want other tasks to be able to run while a transaction is ongoing, even if it means refreshing the screen takes longer.

Is using "spi_device_queue_trans" and making sure no two different tasks try to access the same SPI device all I have to do to achieve that? I'm sorry if I'm not being clear, I'll keep an eye out on the comments. Thanks!

EDIT: I forgot to mention, if the reason why after some transaction it takes longer when I also time "spi_device_acquire_bus" is that some buffer is full, and the function blocks the task until that buffer is freed... how does it know that the next transaction will fit in the buffer? Or if that were to happen the function "spi_device_queue_trans" would block the task?

ESP_Sprite
Posts: 9730
Joined: Thu Nov 26, 2015 4:08 am

Re: Properly using the SPI peripheral to drive a screen

Postby ESP_Sprite » Sat Feb 03, 2024 8:52 am

The thing is that the SPI hardware works best with large transfers, so your end solution may simply be to send more than 10 rows at the same time (e.g. you can send the entire screen if you have the framebuffer in memory anyway).

What you're seeing is a tradeoff having to do with the fact that there's a cost to handling interrupts. At the end of a transfer, the CPU needs to set up the next transfer. That can happen in two ways: the CPU can actively poll the hardware to see if the transfer is done yet (which spi_device_polling_transmit does) or the CPU can go do something else and get an interrupt when the transfer is done (which spi_device_queue_trans does). Downside of the 2nd solution is that there's a fair amount of time between the transfer being done, and the CPU finishing handling the interrupt and handling the context switch. Downside of the 1st solution is that the CPU can't do anything else.

We have a solution in the works that on newer hardware can queue multiple transactions and handle them in hardware, but unfortunately that code is not in ESP-IDF yet.

MicroController
Posts: 1708
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: Properly using the SPI peripheral to drive a screen

Postby MicroController » Sat Feb 03, 2024 2:55 pm

Is an ISR called by default by the esp-idf framework that does all transactions? If so, is that ISR blocking the execution of other FreeRTOS tasks for a long time? Or are they short lived?
Yes, no, and yes :)
As the docs state, the _polling function is intended for small transactions, where the relative overhead of setup, context-switching, and ISR is too big. The docs include some timing estimates, but, generally, sending a stream of data to a display (i.e. hundreds+ of bytes per transaction) is not a small transaction and the DMA/background transfer (i.e. not polling) is very likely to improve system performance.
(For perspective, there are chips/devices which require you to only send/receive a few bytes, say 3-4, per transaction over SPI in a command-response style; these are the use cases for the _polling functions, where busy-waiting for a few microseconds for the transaction to finish is faster than 'blocking', i.e. context-switching the task out and back in again.)

So, for the display data, keep using DMA and the non-polling SPI functions so that the CPU is free to do other things while the SPI hardware is busy.
Or if that were to happen the function "spi_device_queue_trans" would block the task?
Yes, it would. That's why you have to pass it the 'ticks_to_wait' parameter; it will block for at most that amount of time and return an error if it fails to enqueue your transaction within that timeframe.
After sending off an SPI transaction via spi_device_queue_trans, you can also later use spi_device_get_trans_result to wait for the transaction to finish if you need to 'synchronize' your code with the transaction again.

Who is online

Users browsing this forum: Bing [Bot] and 85 guests