The right way to check SPI DMA completion

rickyzhang
Posts: 6
Joined: Thu Jul 26, 2018 12:45 am

The right way to check SPI DMA completion

Postby rickyzhang » Thu Jul 26, 2018 1:03 am

I experimented dual frame buffer driver in ESP-32, which connects ili9341 LCD controller over SPI. I need to synchronize SPI DMA completion to switch frame buffer.

There seem to be two ways to do this.

1. After calling a number of spi_device_queue_trans function to send DMA transaction to the queue, invoke the corresponding number of spi_device_get_trans_result with max ticks to wait.

2. In struct spi_device_interface_config_t, set call back method in post_cb member.

I implemented method 1. But there are scars in screen (see Video -- https://www.youtube.com/watch?v=ftyc87KdjmI). It seems that spi_device_get_trans_result returns before DMA complete. Thus, it overwrites what it is being DMA.

I wonder if anyone could share their experience on SPI DMA. What's the right way to check SPI DMA completion.

ESP_Sprite
Posts: 9723
Joined: Thu Nov 26, 2015 4:08 am

Re: The right way to check SPI DMA completion

Postby ESP_Sprite » Thu Jul 26, 2018 4:09 am

That is strange; spi_device_get_trans_result triggers on the interrupt that specificially implies that the DMA operation is complete, so your plan should work just fine. Are you sure you are actually getting the result from the transaction you think you queued? If there's a transaction you queued but never gotten the result of, you'll always get a result from the previous transaction as soon as that one is done, leading to what you're seeing now.

rickyzhang
Posts: 6
Joined: Thu Jul 26, 2018 12:45 am

Re: The right way to check SPI DMA completion

Postby rickyzhang » Thu Jul 26, 2018 1:20 pm

That's a good point.

My experiment is built on top of other's code. Before I added mine, it would crash when invoking spi_device_get_trans_result due to the mismatch with spi_device_queue_trans. The crash happen in free function within spi_device_get_trans_result https://github.com/espressif/esp-idf/bl ... ter.c#L866

Code: Select all

esp_err_t SPI_MASTER_ATTR spi_device_get_trans_result(spi_device_handle_t handle, spi_transaction_t **trans_desc, TickType_t ticks_to_wait)
{
    BaseType_t r;
    spi_trans_priv trans_buf;

    SPI_CHECK(handle!=NULL, "invalid dev handle", ESP_ERR_INVALID_ARG);
    r=xQueueReceive(handle->ret_queue, (void*)&trans_buf, ticks_to_wait);
    if (!r) {
        // The memory occupied by rx and tx DMA buffer destroyed only when receiving from the queue (transaction finished).
        // If timeout, wait and retry.
        // Every on-flight transaction request occupies internal memory as DMA buffer if needed.
        return ESP_ERR_TIMEOUT;
    }

    (*trans_desc) = trans_buf.trans;

    if ( (void*)trans_buf.buffer_to_send != &(*trans_desc)->tx_data[0] && trans_buf.buffer_to_send != (*trans_desc)->tx_buffer ) {
        free( trans_buf.buffer_to_send );
    }

    //copy data from temporary DMA-capable buffer back to IRAM buffer and free the temporary one.
    if ( (void*)trans_buf.buffer_to_rcv != &(*trans_desc)->rx_data[0] && trans_buf.buffer_to_rcv != (*trans_desc)->rx_buffer ) {
        if ( (*trans_desc)->flags & SPI_TRANS_USE_RXDATA ) {
            memcpy( (uint8_t*)&(*trans_desc)->rx_data[0], trans_buf.buffer_to_rcv, ((*trans_desc)->rxlength+7)/8 );
        } else {
            memcpy( (*trans_desc)->rx_buffer, trans_buf.buffer_to_rcv, ((*trans_desc)->rxlength+7)/8 );
        }
        free( trans_buf.buffer_to_rcv );
    }

    return ESP_OK;
}
In the case of mismatch between spi_device_queue_trans and spi_device_get_trans_result, the condition of (void*)trans_buf.buffer_to_send != &(*trans_desc)->tx_data[0] && trans_buf.buffer_to_send != (*trans_desc)->tx_buffer is true. Because two frame buffer are different. Freeing the buffer somehow caused heap corruption.

After my fixing mismatch, it resolved heap corruption.

In summary, I'm pretty sure there should be no more mismatch between spi_device_queue_trans and spi_device_get_trans_result.

Now I wonder how can you so sure that DMA transmission completes before returns from spi_device_get_trans_result. Perhaps I should poke around the FreeFRTOS SPI code.

ESP_Sprite
Posts: 9723
Joined: Thu Nov 26, 2015 4:08 am

Re: The right way to check SPI DMA completion

Postby ESP_Sprite » Fri Jul 27, 2018 1:27 am

I'm mostly sure because of experience: I've used the SPI thing for displays often enough, and I never ran into what you're seeing right now :) if the hardware has a bug like that, I wouldn't expect you to be the first one to notice. Note that doesn't mean there can't be a coding or hardware bug after all; do you perhaps have some source code we can look at to reproduce the issue?

rickyzhang
Posts: 6
Joined: Thu Jul 26, 2018 12:45 am

Re: The right way to check SPI DMA completion

Postby rickyzhang » Fri Jul 27, 2018 11:21 am

I shared my work-in-progress code in branch wip-improve-interlacing: https://github.com/rickyzhang82/go-play ... nterlacing. I experiment dual buffer and interlace for ILI9341 over SPI at 40Mhz. The board is odroid-go https://www.hardkernel.com/main/product ... 2875062626

The core 0 is running nes emulator nonfrendo, while the core 1 is running videotask exclusively to do frame interlacing, scaling and DMA to LCD controller.

The main video driver is here -- https://github.com/rickyzhang82/go-play ... _display.c. The legacy code used send_reset_drawing, send_continue_wait and send_continue_line. I implemented two new one to fix mismatch between spi_device_get_trans_result and spi_device_queue_trans. Two new function send_one_line_blocking and send_one_line_wait are used exclusively in ili9341_write_frame_nes. I'm pretty sure that spi_device_get_trans_result and spi_device_queue_trans are matched in my new implementation now. Otherwise, I will get heap corruption.

https://github.com/rickyzhang82/go-play ... #L242-L250: send frame buffer from core 0 to core 1. There is one and only one slot available in the vidQueue. Thus, xQueueSend should block until DMA has finished (i.e. core 1 finish xQueueReceive). After custom_blit returns, it swapped dual buffer in core 0 (https://github.com/rickyzhang82/go-play ... #L357-L370). I found that if I make a copy of frame buffer bmp->line[0] here and send it to core 1, I won't see those scars in my video (see the code diff on the left https://github.com/rickyzhang82/go-play ... b75a8639b5). Thus, I suspect that the DMA transimission synchronization is not working.

https://github.com/rickyzhang82/go-play ... #L254-L273: ili9341_write_frame_nes invokes LCD DMA driver call. Ideally, it should return only when DMA transmission completes. But it doesn't seem to work as I expect.

ESP_Sprite
Posts: 9723
Joined: Thu Nov 26, 2015 4:08 am

Re: The right way to check SPI DMA completion

Postby ESP_Sprite » Sat Jul 28, 2018 3:52 am

Note that ili9341_write_frame_nes doesn't send the entire framebuffer; it sends lines instead. If the DMA completion didn't work correctly, the thing you'd be seeing is that the next line is already being written while the line[]-buffer of that still is read out by DMA. You'd be seeing glitches that would show up as a line being partially overwritten with the previous line, without any offset. That's not what I'm seeing in your YT video.

If you want to be 100% sure your transmissions aren't desynchronized, you can check this directly instead of relying on circumstancial evidence:

Code: Select all

esp_err_t err = spi_device_get_trans_result(spi, &trans_desc, portMAX_DELAY);
assert(trans_desc == &trans[i]);
Also:
There is one and only one slot available in the vidQueue. Thus, xQueueSend should block until DMA has finished (i.e. core 1 finish xQueueReceive).
Not sure if you actually meant this, but one slot being available in the vidQueue means the xQueueSend will block until DMA *on the previous frame* has finished. This means given an idle situation, after you do xQueueSend for the first time, it will return immediately and the other core will start munching on the bitmap it got sent. If you now happily allow the Nofrendo code to calculate the next frame into the same bmp buffer (as you seem to be doing here), it's going to overwrite the data core 1 is crunching on. You can solve this by having two frame buffers; one for Core 1 to process into LCD data and one for Core 0 to put the next frame in.

rickyzhang
Posts: 6
Joined: Thu Jul 26, 2018 12:45 am

Re: The right way to check SPI DMA completion

Postby rickyzhang » Sat Jul 28, 2018 2:51 pm

The implementation has two dual buffer techniques:

1. First dual buffer technique is for emulator in core 0 (https://github.com/rickyzhang82/go-play ... #L357-L370). As you describe in your last paragraph, at the first time primary buffer was sent to core 1 for DMA. It immediately returns. Then frame buffer swaps to back buffer and go on drawing by emulator. After the first time, primary buffer and back buffer swaps only after custom_blit finish. The function custom_blit finsh because xQueueSend stops blocking. Ideally, xQueueSend blocks until DMA transmission finishes. So at any time, core 1 should **never** see core 0 overwriting its frame buffer when it DMA (if the assumption that xQueueSend blocks until DMA transmission finishes is true)

2. Second dual buffer technique is for LCD driver in core 1 (https://github.com/rickyzhang82/go-play ... 1085-L1088). There are two line buffer for DMA. It swaps until one line DMA is finished.

Regarding to your assertion

Code: Select all

assert(trans_desc == &trans[i]);
, I think it only guarantees that the transaction matches between sending to the queue and getting the trans result. But it doesn't guarantee that DMA has finished upon the return of spi_device_get_trans_result. Your argument in the first paragraph below seems to make sense to me.
Note that ili9341_write_frame_nes doesn't send the entire framebuffer; it sends lines instead. If the DMA completion didn't work correctly, the thing you'd be seeing is that the next line is already being written while the line[]-buffer of that still is read out by DMA. You'd be seeing glitches that would show up as a line being partially overwritten with the previous line, without any offset. That's not what I'm seeing in your YT video.
However, I want to point it out I'm doing interlacing where I only send odd or even line update in each frame. The line buffer itself uses dual buffer technique as well. If DMA has not finished upon the return of getting the trans result, I'd see every other line get messed up . This seems to match what I see in the video.

I captured the new video. Please see here https://www.youtube.com/watch?v=CJHPjII7SpI

ESP_Sprite
Posts: 9723
Joined: Thu Nov 26, 2015 4:08 am

Re: The right way to check SPI DMA completion

Postby ESP_Sprite » Sun Jul 29, 2018 1:55 am

Yes, the assert is to be 100% sure the queue/get_result aren't offset; I still want to make 100% sure this is okay. Wrt the glitches you see: the longer I stare at it, the less sense it makes... what I see is that the offending lines start at a somewhat random address earlier in the framebuffer, but more than 1 or 2 lines earlier (see e.g. the clouds in your video); as the DMA thing does its work line-by-line, I still do not think that is the issue.

It makes me wonder if you're not running into a hardware issue; LCD_SPI_CLOCK_RATE is set awfully fast for the Odroid. Can you see what happens when you lower that a bit?

rickyzhang
Posts: 6
Joined: Thu Jul 26, 2018 12:45 am

Re: The right way to check SPI DMA completion

Postby rickyzhang » Wed Aug 01, 2018 11:43 pm

I added assertion below. But I don't see any exception:

Code: Select all

        assert(trans_desc == &trans[x]);
        if (1 == x || 3 == x)
        {
            assert(trans_desc->tx_data[0] == trans[x].tx_data[0] &&
                   trans_desc->tx_data[1] == trans[x].tx_data[1] &&
                   trans_desc->tx_data[2] == trans[x].tx_data[2] &&
                   trans_desc->tx_data[3] == trans[x].tx_data[3]);
        }


https://github.com/rickyzhang82/go-play ... #L310-L317

Regarding to SPI clock rate 40 Mhz, I did not see your point the high clock rate could cause issue. The clock rate seems to be desirable given the bit rate:

320 (width) x 240 (height) x 2 x 8 (bit) x 24 (fps)= 29,491,200 bps = 29 Mbps.

ESP_Sprite
Posts: 9723
Joined: Thu Nov 26, 2015 4:08 am

Re: The right way to check SPI DMA completion

Postby ESP_Sprite » Thu Aug 02, 2018 1:51 am

Sure. However, the ILI9341 controller they use is not specified for 40MHz: https://cdn-shop.adafruit.com/datasheets/ILI9341.pdf page 231 specifies a maximum clock cycle duration of 100 nS for write operations, which converts to a maximum speed of 10MHz. Now, it's widely known this controller also works when clocked much higher in most circumstances, but the manufacturer obviously doesn't guarantee anything there. It may be that your changed approach of uploading a line at a time triggers a slower path that cannot handle the 40MHz, giving you the glitches, hence my suggestion to try it at 10MHz instead.

Who is online

Users browsing this forum: axellin, Majestic-12 [Bot], SegmentationFault and 100 guests