Reduce time between two consecutive SPI transfers

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Reduce time between two consecutive SPI transfers

Postby wjxway » Wed Aug 05, 2020 3:35 am

Hi, I'm trying to drive an external DAC via SPI interface using esp32. But I cannot achieve the desired speed (>1M transfers/sec, each transfer consists of 16bits). The reason is not the SPI speed itself, which could be set as high as 40MHz, but the time delay between two consecutive SPI transfers.

A sample code is as follows (Arduino API):

Code: Select all

#include <SPI.h>

void setup()
{
    Serial.begin(115200);
    SPI.begin();
    SPI.beginTransaction(SPISettings(40000000,MSBFIRST,SPI_MODE0));
}

void loop()
{
    unsigned long tstart=micros(),count=0;

    while(micros()-tstart<1000000)
    {
        for(byte i=0;i<9;i++)
          SPI.transfer16(0b0101010110101010);
        count+=9;
    }

    Serial.println(count);
}
The output shows that ESP32 can do approximately 694,800 transfers/sec, still far away from my requirement of 1M transfers/sec. However, theoretically, using a 40M SPI bus, each transfer only takes 16/40M sec, which should enable a >2M transfers/sec speed, so the difference must originate from the time delay between two consecutive SPI transfers.

I know that these delays can possibly be canceled by transferring more data in a single transfer command. However, the interface of DAC requires me to pulse SYNC once (set it high and then low) before each SPI transfer, which is not possible if I use a single SPI transfer command.


I have searched for solutions online and found two related problems:

The one here https://github.com/espressif/esp-idf/issues/368 suggests disabling all MUTEX commands in esp32-hal-spi.c/h. Seems reasonable, but I see no effects. Probably because the compiler has already deleted those codes in optimization.

Another here https://esp32.com/viewtopic.php?t=1383 suggests using RMT to replace SPI, but suffer from timing issues.

I think there is also room for improvement as I only need to output data, and I access SPI on a single thread, so the solution can be thread-unsafe.


So I'm here to ask, are there any way to accelerate the SPI transfers?

Answers using Arduino as well as ESP-IDF are welcome! Thanks a lot in advance!

ESP_Sprite
Posts: 9711
Joined: Thu Nov 26, 2015 4:08 am

Re: Reduce time between two consecutive SPI transfers

Postby ESP_Sprite » Wed Aug 05, 2020 6:15 am

It's a hard problem in general... as you need software to set up the start of each separate 16-bit transaction, there will be a lot of overhead. Only way I can imagine to do it in software would be to abuse the I2S hardware.... theoretically, you can put it into 16-bit mode and use the data output to pulse the SYNC line. That would involve poking the hardware registers directly (or modifying the existing I2S driver), however, as the current driver is not made to do this.

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Re: Reduce time between two consecutive SPI transfers

Postby wjxway » Wed Aug 05, 2020 7:30 am

ESP_Sprite wrote:
Wed Aug 05, 2020 6:15 am
It's a hard problem in general... as you need software to set up the start of each separate 16-bit transaction, there will be a lot of overhead. Only way I can imagine to do it in software would be to abuse the I2S hardware.... theoretically, you can put it into 16-bit mode and use the data output to pulse the SYNC line. That would involve poking the hardware registers directly (or modifying the existing I2S driver), however, as the current driver is not made to do this.
I'm not familiar with I2S hardware, will I2S be faster than SPI in initialization? If so I can give it a look~

BTW, is it possible to reduce the time used in the setting up of SPI? maybe like ignoring some checks or disable the SPI input? (I only need the signal output, so maybe I can skip the setup of input part) I only need to reduce the setup time by 30% or so, do you think it is possible to tweak the SPI software to accomplish this?

Thanks a lot!

ESP_Sprite
Posts: 9711
Joined: Thu Nov 26, 2015 4:08 am

Re: Reduce time between two consecutive SPI transfers

Postby ESP_Sprite » Wed Aug 05, 2020 8:34 am

Whoops, I initially imagined you wanted to set up an ADC. A DAC may be even easier. Effectively, I2S allows you to set up a continuous SPI-like stream of data, read from a memory buffer without any CPU interference; the only thing the CPU needs to do is to make sure the memory buffer gets 'topped up' with new data once in a while. Suggest you look at the I2S documentation in the ESP32 technical reference manual and see if you would be able to generate the correct waveforms your DAC needs with that hardware.

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Re: Reduce time between two consecutive SPI transfers

Postby wjxway » Wed Aug 05, 2020 8:58 am

ESP_Sprite wrote:
Wed Aug 05, 2020 8:34 am
Whoops, I initially imagined you wanted to set up an ADC. A DAC may be even easier. Effectively, I2S allows you to set up a continuous SPI-like stream of data, read from a memory buffer without any CPU interference; the only thing the CPU needs to do is to make sure the memory buffer gets 'topped up' with new data once in a while. Suggest you look at the I2S documentation in the ESP32 technical reference manual and see if you would be able to generate the correct waveforms your DAC needs with that hardware.
Yes, it's a DAC :D I'm using DAC108S085 made by TI. The waveform needs to be something like this:

Image

Though it does not support I2S protocol by default, the diagram is quite standard, and timing requirements are loose. I would take a look at I2S, thanks a lot!

Will come back and reply if I manage to get it working!~

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Re: Reduce time between two consecutive SPI transfers

Postby wjxway » Wed Aug 05, 2020 11:30 am

ESP_Sprite wrote:
Wed Aug 05, 2020 8:34 am
Whoops, I initially imagined you wanted to set up an ADC. A DAC may be even easier. Effectively, I2S allows you to set up a continuous SPI-like stream of data, read from a memory buffer without any CPU interference; the only thing the CPU needs to do is to make sure the memory buffer gets 'topped up' with new data once in a while. Suggest you look at the I2S documentation in the ESP32 technical reference manual and see if you would be able to generate the correct waveforms your DAC needs with that hardware.
Hi, I've tried the I2S API, but still haven't quite understood it...

I try to use the L-R channel signal as SYNC signal of DAC, But not quite sure whether this will work out (My oscilloscope is still on the way...). However, a more important problem is I2S' performance.

When sample_rate<=300000, it can make sample_rate transfers/sec, great! but when sample_rate=500000, it can only make 250000 transfers/sec, the performance is even worse when sample_rate is even higher. Am I hitting the maximum clock frequency of I2S interface? I didn't find anything in the document about the maximum clock frequency of I2S though.

Code: Select all

#include "Arduino.h"
#include "driver/i2s.h" 

static const i2s_port_t i2s_num = I2S_NUM_0;

static const i2s_config_t i2s_config ={
    .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
    .sample_rate = 100000,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
    .communication_format = (i2s_comm_format_t)(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 8,
    .dma_buf_len = 1024,
    .use_apll=0,
    .tx_desc_auto_clear= true,
    .fixed_mclk=-1
};

static const i2s_pin_config_t pin_config ={
    .bck_io_num = 27,
    .ws_io_num = 26,
    .data_out_num = 25,
    .data_in_num = I2S_PIN_NO_CHANGE
};

void setup() {
    Serial.begin(115200);
    i2s_driver_install(i2s_num, &i2s_config, 0, NULL);
    i2s_set_pin(i2s_num, &pin_config);
}
void loop()
{
    static uint16_t Value16Bit[18]={ 0, 0b010, 0, 0b010101, 0, 0b10101010, 0, 0b010, 0, 0b010101, 0, 0b10101010, 0, 0b010, 0, 0b010101, 0, 0b10101010 };

    size_t BytesWritten;

    unsigned long tt=micros();
    unsigned long count=0;
    while (micros()-tt<1000000)
    {
        i2s_write(i2s_num, Value16Bit, 36, &BytesWritten, portMAX_DELAY);
        count+=9;
    }
    Serial.println(count);
}

ESP_Sprite
Posts: 9711
Joined: Thu Nov 26, 2015 4:08 am

Re: Reduce time between two consecutive SPI transfers

Postby ESP_Sprite » Wed Aug 05, 2020 3:02 pm

You may want to set USE_APLL to 1 as well as use a larger dma_buf_len... could be that the issue is that the ISR gets called too often or that the I2S driver has issues generating the frequencies you need.

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Re: Reduce time between two consecutive SPI transfers

Postby wjxway » Wed Aug 05, 2020 5:19 pm

ESP_Sprite wrote:
Wed Aug 05, 2020 3:02 pm
You may want to set USE_APLL to 1 as well as use a larger dma_buf_len... could be that the issue is that the ISR gets called too often or that the I2S driver has issues generating the frequencies you need.
Changing .USE_APLL to 1 couldn't improve performance.
Changing .dma_buf_len = 2048 or larger will result in SW_CPU_RESET

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

:(

but it seems that changing .fixed_mclk=40000000 do improve the performance hugely, reaching the theoretical limit of 1,250,000 transfers/sec but I'm not quite sure if the data are clocked out properly (I would check a few days later, after the arrival of my oscilloscope).

BTW, a new idea, is it possible to use QSPI interface to accomplish this? in my real application, I prepare the data used in 9 transfers at the same time (16 bits each). Transferring these 18 bytes in a single transmission may save some time. By using two data lines I can use one of them as SYNC signal and another as the real data. But the problem is maybe QSPI is inherently much slower than SPI, then this won't work.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: Reduce time between two consecutive SPI transfers

Postby PeterR » Thu Aug 06, 2020 12:17 am

Hey,
Not sure about Ardunio but DMA under IDF adds a 20uS+ (ish) hit.
A key point is the construction of the SPI transaction. IDF you can create this 'command' on the fly or you can create a command and then reuse. The performance difference is very important as frequency increaces...
I am not sure about the Ardunio interface but my IDF experience is that the SPI libary is quite laggy. I have 3 transactions at peak 250KHz and that (latency wise) pushes a core to limits. So I find it hard to sustain 750KHz (few SPI bytes) SPI transactions using IDF. I live with this ATM and have not yet made the obvious improvement of saving/reusing a transfer object. The SPI handler is short and sweet but still latency bites.
Adding DMA for short (<8 byte) transfers is not productive. Too much added setup time in the ESP library.
So... with a quick glance at your code I do not see the ability of your library to return a 'spi transaction' object. This suggests that the 'spi transaction' object is being created in each request. ESP IDF allows you to create the transaction & then reuse. This does shave 10-20uS from each transaction.
Its not a great answer but I would suggest IDF and create your SPI transaction at startup. Forget DMA and (obviously) setup processing/interrupts on core 1. If you use Ethernet then you will be hit with 2mS+ random latency. Would be interested in your experience there as that is another issue I live with.
In short, ESP software library latency :(
PS
Abusing I2C is an interesting approach. I think though that you might just achieve 1.25MHz under IDF with prepared requests, IRAM and proper core discipline. If you use Ethernet then you are screwed; you will get <2 mS random hits. Low frequency so guess it depends on noise immunity.
& I also believe that IDF CAN should be fixed.

wjxway
Posts: 15
Joined: Wed Aug 05, 2020 2:52 am

Re: Reduce time between two consecutive SPI transfers

Postby wjxway » Sun Aug 09, 2020 6:14 pm

PeterR wrote:
Thu Aug 06, 2020 12:17 am
Hey,
Not sure about Ardunio but DMA under IDF adds a 20uS+ (ish) hit.
A key point is the construction of the SPI transaction. IDF you can create this 'command' on the fly or you can create a command and then reuse. The performance difference is very important as frequency increaces...
I am not sure about the Ardunio interface but my IDF experience is that the SPI libary is quite laggy. I have 3 transactions at peak 250KHz and that (latency wise) pushes a core to limits. So I find it hard to sustain 750KHz (few SPI bytes) SPI transactions using IDF. I live with this ATM and have not yet made the obvious improvement of saving/reusing a transfer object. The SPI handler is short and sweet but still latency bites.
Adding DMA for short (<8 byte) transfers is not productive. Too much added setup time in the ESP library.
So... with a quick glance at your code I do not see the ability of your library to return a 'spi transaction' object. This suggests that the 'spi transaction' object is being created in each request. ESP IDF allows you to create the transaction & then reuse. This does shave 10-20uS from each transaction.
Its not a great answer but I would suggest IDF and create your SPI transaction at startup. Forget DMA and (obviously) setup processing/interrupts on core 1. If you use Ethernet then you will be hit with 2mS+ random latency. Would be interested in your experience there as that is another issue I live with.
In short, ESP software library latency :(
PS
Abusing I2C is an interesting approach. I think though that you might just achieve 1.25MHz under IDF with prepared requests, IRAM and proper core discipline. If you use Ethernet then you are screwed; you will get <2 mS random hits. Low frequency so guess it depends on noise immunity.
It is unlikely that the SPI transaction object is created on each transaction if it would bring a ~10us latency. The basic SPI interface on arduino can sustain a stable 700k trans/sec @ 16bits/trans. I am aware of core 0/core 1 issue, so in my real application, no task except the SPI related task are executed on core 1.

Who is online

Users browsing this forum: Bing [Bot] and 70 guests