SPI DMA Delays
Posted: Mon Oct 02, 2017 6:09 pm
Good evening everybody,
currently im writing on a driver for my TFT display and run into some problems.
The basic concept behind my code is the following:
I have an array with 12 instances of the "spi_transaction_t" struct as transmission buffers.
When drawing on the displays the first thing that happens is that I use the first 5 instances to store the address window information and queue them for beeing send via DMA.
After the address window I use the remaining 7 to hold the color data and also queue them for beeing send via DMA.
Each of them point to the same color buffer allocated with "heap_caps_malloc(18 + 2 + 2*w*2 + 2, MALLOC_CAP_DMA)" with enough storage for 2 display lines (at the moment I am testing around with a fillRect function so its only one color).
When all 12 buffers are used the driver will wait until everything is transmitted and then use the 12 now free transmission buffers to transmit possible remaining lines or color data.
So far everything works fine but there are some minor problems with delays during queueing and waiting for the data beeing send.
On the picture above you can clearly see what I mean.
The smaller delays (like the one with the yellow arrow) happen when the driver queues up transmissions using "spi_device_queue_trans(...)".
You can see that there are 7 transmissions queued up before all transmission buffers are used and the driver waits after everything has been sent out resulting in a larger delay (blue arrow). I guess that the larger delay is the result of a task switch to the idle task or something.
The question I have now is:
How can I reduce these delays (especially the larger ones) without having to send more than 2 lines with a single transmission at the same time?
It works but even with 26MHz SPI you can see that the display flickers.
I think this must be because it takes to long to send everything cause of these delays.
The code below is not executeable but it contains the functions everything is based on and the fillRect function to show how it works.
I know its not the best yet but its still in development .
Hope someone can help me or give me some tips.
If you need executable code to test I can send my whole test project
currently im writing on a driver for my TFT display and run into some problems.
The basic concept behind my code is the following:
I have an array with 12 instances of the "spi_transaction_t" struct as transmission buffers.
When drawing on the displays the first thing that happens is that I use the first 5 instances to store the address window information and queue them for beeing send via DMA.
After the address window I use the remaining 7 to hold the color data and also queue them for beeing send via DMA.
Each of them point to the same color buffer allocated with "heap_caps_malloc(18 + 2 + 2*w*2 + 2, MALLOC_CAP_DMA)" with enough storage for 2 display lines (at the moment I am testing around with a fillRect function so its only one color).
When all 12 buffers are used the driver will wait until everything is transmitted and then use the 12 now free transmission buffers to transmit possible remaining lines or color data.
So far everything works fine but there are some minor problems with delays during queueing and waiting for the data beeing send.
On the picture above you can clearly see what I mean.
The smaller delays (like the one with the yellow arrow) happen when the driver queues up transmissions using "spi_device_queue_trans(...)".
You can see that there are 7 transmissions queued up before all transmission buffers are used and the driver waits after everything has been sent out resulting in a larger delay (blue arrow). I guess that the larger delay is the result of a task switch to the idle task or something.
The question I have now is:
How can I reduce these delays (especially the larger ones) without having to send more than 2 lines with a single transmission at the same time?
It works but even with 26MHz SPI you can see that the display flickers.
I think this must be because it takes to long to send everything cause of these delays.
The code below is not executeable but it contains the functions everything is based on and the fillRect function to show how it works.
I know its not the best yet but its still in development .
Code: Select all
static void ili9163c_send_cmd_dma(ili9163c_t *display, const uint8_t transNr, uint8_t *buffer, const uint8_t cmd) {
// first 2 bytes are used to control the dc pin
buffer[0] = display->pinDC;
buffer[1] = 0; // 0 for sending command
display->dmaBuffer[transNr].tx_data[0] = cmd;
display->dmaBuffer[transNr].length = 8; //Data length, in bits
display->dmaBuffer[transNr].rxlength = 8;
display->dmaBuffer[transNr].user = (void*)(buffer);
display->dmaBuffer[transNr].flags = SPI_TRANS_USE_TXDATA;
assert(spi_device_queue_trans(display->spiDevice, &(display->dmaBuffer[transNr]), portMAX_DELAY) == ESP_OK);
}
static void ili9163c_send_data_dma(ili9163c_t *display, const uint8_t transNr, uint8_t *data, uint16_t len) {
// first 2 bytes are used to control the dc pin
data[0] = display->pinDC;
data[1] = 1; // 1 for sending data
display->dmaBuffer[transNr].tx_buffer = (data + 2);
display->dmaBuffer[transNr].length = len*8; //Data length, in bits
display->dmaBuffer[transNr].rxlength = len*8;
display->dmaBuffer[transNr].user = (void*)(data);
display->dmaBuffer[transNr].flags = 0;
assert(spi_device_queue_trans(display->spiDevice, &(display->dmaBuffer[transNr]), portMAX_DELAY) == ESP_OK);
}
static void ili9163c_setAddr_dma(ili9163c_t *display, const uint8_t transNr, uint8_t *data, uint16_t x0, uint16_t y0, uint16_t x1, uint16_t y1) {
// 0 byte offset
ili9163c_send_cmd_dma(display, transNr, data, CMD_CLMADRS); // Column
if (display->rotation == 0 || display->rotation > 1) {
uint16_t *data16 = (uint16_t*)(data + 2);
data16[1] = swapbyte(x0);
data16[2] = swapbyte(x1);
// +2 byte offset cause of userdata from command before
ili9163c_send_data_dma(display, transNr + 1, (data + 2), sizeof(uint32_t));
} else {
uint16_t *data16 = (uint16_t*)(data + 2);
data16[1] = swapbyte(x0 + __OFFSET);
data16[2] = swapbyte(x1 + __OFFSET);
// +2 byte offset cause of userdata from command before
ili9163c_send_data_dma(display, transNr + 1, (data + 2), sizeof(uint32_t));
}
// +2 byte additional offset cause of userdata from data command before
// +4 byte additional offset cause of data from data command before
ili9163c_send_cmd_dma(display, transNr + 2, (data + (4 + sizeof(uint32_t))), CMD_PGEADRS); // Page
if (display->rotation == 0){
uint16_t *data16 = (uint16_t*)(data + (6 + sizeof(uint32_t)));
data16[1] = swapbyte(y0 + __OFFSET);
data16[2] = swapbyte(y1 + __OFFSET);
// +2 byte offset cause of userdata from command before
ili9163c_send_data_dma(display, transNr + 3, (data + (6 + sizeof(uint32_t))), sizeof(uint32_t));
} else {
uint16_t *data16 = (uint16_t*)(data + (6 + sizeof(uint32_t)));
data16[1] = swapbyte(y0);
data16[2] = swapbyte(y1);
// +2 byte offset cause of userdata from command before
ili9163c_send_data_dma(display, transNr + 3, (data + (6 + sizeof(uint32_t))), sizeof(uint32_t));
}
// +2 byte additional offset cause of userdata from command before
// +4 byte additional offset cause of data from data command before
ili9163c_send_cmd_dma(display, transNr + 4, (data + (8 + 2*sizeof(uint32_t))), CMD_RAMWR); //Into RAM
}
void ili9163c_fillRect_dma(ili9163c_t *display, uint16_t x0, uint16_t y0, uint16_t w, uint16_t h, uint16_t color) {
// allocate dma useable memory
uint8_t *databuffer = (uint8_t*)heap_caps_malloc(18 + 2 + 2*w*2 + 2, MALLOC_CAP_DMA);
uint16_t *data16 = (uint16_t*)(databuffer + 18 + 2);
color = swapbyte(color);
for(int i = 0; i <= w*2; i++) data16[i] = color;
// 5 transactions queued here
ili9163c_setAddr_dma(display, 0, databuffer, x0, y0, x0 + w - 1, y0 + h - 1); // go home
// 7 left
uint8_t maxSlots = 12;
uint8_t remainingSlots = 7;
while(h > 0) {
for( ; remainingSlots > 0 && (h - 2) >= 0; ) {
ili9163c_send_data_dma(display, (maxSlots - remainingSlots), (databuffer + 20), 2*w*2);
remainingSlots--; h -= 2;
}
// if theres a single line remaining
if(h == 1 && remainingSlots > 0) {
ili9163c_send_data_dma(display, (maxSlots - remainingSlots), (databuffer + 20), w*2);
remainingSlots --; h --;
}
// wait for the results
spi_transaction_t *rtrans;
for(int i = 0; i < (maxSlots - remainingSlots); i++) assert(spi_device_get_trans_result(display->spiDevice, &rtrans, portMAX_DELAY) == ESP_OK);
remainingSlots = maxSlots;
}
free(databuffer);
}
If you need executable code to test I can send my whole test project