Page 1 of 1

SPI continous transmission with DMA with circular linked list

Posted: Fri Oct 18, 2019 9:03 am
by electronsquare_chris
I have two shift registers I can update via SPI. I send two bytes with spi and latch this data into the shift register with the CS-line. They need to be updated continuously at 300 Hz or higher. I use it to strobe a 7-segment display array. . This works using interrupts or timers. I don't want to generate that many context switches or interrupts, so I'm looking for a DMA solution.
I've found an example for using SPI to drive VGA signals: viewtopic.php?f=2&t=4011#p18107 . Github: https://github.com/t-mat/esp32-vga-experiment . I've modified this example. I've added a CLK and a CS signal and lowered the frequency.
I'm currently facing several issues with this:
  • The chip select signal is not being driven. It is set low and never gets high.
  • The clock signal is being driven before the first data is send on MOSI. This shouldn't be an issue since only the last 16 bits will be latched.
  • There is no pause between the two-byte transmissions. They are back to back. Even if a link in the linked list is marked as start of frame and/or end of frame.
Here is my modification to the VGA example:

my_vga.cpp:

Code: Select all

#define SEGMENT_COUNT 3
static uint8_t spi_buffer_[2*SEGMENT_COUNT] __attribute__ ((aligned (4)));
static lldesc_t lldesc_[SEGMENT_COUNT]={};

	esp_err_t init(const myvga_init_params_t* initParams) {
		auto* ap = reinterpret_cast<uint8_t*>(this);
		ap += sizeof(ThisClass);

		{
			const auto& ip = *initParams;
			userVideo.width		= ip.video.width;
			userVideo.height	= ip.video.height;
			userVideo.stride	= ip.video.strideInBytes;
			userVideo.buffer	= static_cast<uint8_t*>(ip.video.buffer);

			spi.host			= ip.spi.host;
			spi.dmaChan			= ip.spi.dmaChan;
			spi.mosiGpioNum		= ip.spi.mosiGpioNum;
			spi.hw				= myspi_get_hw_for_host(ip.spi.host);

			rmt.hsyncChannel	= ip.rmt.hsyncChannel;
			rmt.vsyncChannel	= ip.rmt.vsyncChannel;
			rmt.hsyncGpioNum	= ip.rmt.hsyncGpioNum;
			rmt.vsyncGpioNum	= ip.rmt.vsyncGpioNum;
		}

		vsyncCallback.callback	= nullptr;

		blankLine	= reinterpret_cast<decltype(blankLine)>(ap);
		ap += blankLineBytes;
		{
			memset(blankLine, 0, blankLineBytes);
		}

		descs		= reinterpret_cast<decltype(descs)>(ap);
		ap += sizeof(lldesc_t) * 2 * VgaSignalHeightInLines;
		{
			for(int y = 0; y < VgaSignalHeightInLines; ++y) {
				const int videoY = y - (VgaVSyncSignalInLines + VgaVSyncBackPorchInLines);
				const bool isVideoEnable = (videoY >= 0 && videoY < userVideo.height);
				const bool isLast = (y == VgaSignalHeightInLines - 1);
				{
		            auto* dd = &descs[y * 2 + 0];
					auto* next = dd + 1;
					const int dmaChunkLen = userVideo.width / 8;
		            dd->size            = dmaChunkLen;
		            dd->length          = dmaChunkLen;
					uint8_t* buf = nullptr;
					if(isVideoEnable) {
						buf = &userVideo.buffer[userVideo.stride * videoY];
					}
					if(nullptr == buf) {
						buf = blankLine;
					}
					dd->buf				= buf;
		            dd->eof             = 0;
		            dd->sosf            = 0;
		            dd->owner           = 1;
		            dd->qe.stqe_next    = next;
				}
				{
		            auto* dd = &descs[y * 2 + 1];
					auto* next = dd + 1;
					if(isLast) {
						next = &descs[0];
					}
					const int dmaChunkLen = (VgaSignalWidthInPixels - userVideo.width) / 8;
		            dd->size            = dmaChunkLen;
		            dd->length          = dmaChunkLen;
					dd->buf				= blankLine;
		            dd->eof             = 0;
		            dd->sosf            = 0;
		            dd->owner           = 1;
		            dd->qe.stqe_next    = next;
				}
			}
		}



		const double SpiDmaClockSpeedInHz = 1E6;//VgaPixelFrequencyInHz;

		//put test data in buffer:

//	  for(int i = 0;i<SEGMENT_COUNT;++i)
//	  {
//	    spi_buffer_[i*2+0] = uint8_t(7 & (~(1<<i)));//anode
//	    spi_buffer_[i*2+1] = 0;//data
//	  }

//	  spi_buffer_[0*2+1] = 0xFF;
//    spi_buffer_[1*2+1] = 0xFB;
//    spi_buffer_[2*2+1] = 0xF7;

	  spi_buffer_[0]=129;
	  spi_buffer_[1]=3;
	  spi_buffer_[2]=7;
	  spi_buffer_[3]=15;
	  spi_buffer_[4]=31;
	  spi_buffer_[5]=63;
	  spi_buffer_[6]=127;



	  for(int i = 0;i<SEGMENT_COUNT;++i)
	  {
	    lldesc_[i].size = 2;
	    lldesc_[i].length = 2;
	    lldesc_[i].offset = 0;
	    lldesc_[i].sosf = 1;
	    lldesc_[i].eof = 1;
	    lldesc_[i].owner = 1;//????

	    //point to data buffer:
	    lldesc_[i].buf = &spi_buffer_[i*2];


	    //create circular linked list:
	    lldesc_[i].qe.stqe_next = &lldesc_[(i+1)%SEGMENT_COUNT];
	  }

    gpio_num_t latch_pin=GPIO_NUM_4;
    gpio_num_t data_pin=GPIO_NUM_5;
    gpio_num_t clk_pin=GPIO_NUM_2;

		myspi_prepare_circular_buffer(
		      spi.host
		    , spi.dmaChan
		    , lldesc_//descs
		    , SpiDmaClockSpeedInHz
		    , data_pin
		    , clk_pin
		    , latch_pin
		    , SpiHSyncBackporchWaitCycle
		);

		intr_handle_t my_rmt_isr_handle;
		ESP_ERROR_CHECK(esp_intr_alloc(ETS_RMT_INTR_SOURCE, ESP_INTR_FLAG_SHARED, rmtIsr, this, &my_rmt_isr_handle));

		portDISABLE_INTERRUPTS();
		// Reset timers and begin SPI DMA transfer
		spi_dev_t* const spiHw = getSpiHw();

		// Here, we're waiting for completion of RMT TX.  When TX is completed,
		// RMT channel's internal counter becomes some constant value (maybe 0?).
		// Therefore, we can see stable behaviour of RMT channel.
		{
			auto& hsyncRmtConf1 = RMT.conf_ch[rmt.hsyncChannel].conf1;
			auto& vsyncRmtConf1 = RMT.conf_ch[rmt.vsyncChannel].conf1;

			hsyncRmtConf1.tx_conti_mode	= 0;
			vsyncRmtConf1.tx_conti_mode	= 0;

			const uint32_t mask = BIT(rmt.hsyncChannel * 3 + 0) | BIT(rmt.vsyncChannel * 3 + 0);
			for(;;) {
				const uint32_t int_raw = RMT.int_raw.val;
				if((int_raw & mask) == mask) {
					break;
				}
			}

			hsyncRmtConf1.ref_cnt_rst	= 1;	// RMT_REF_CNT_RST_CH	Setting this bit resets the clock divider of channel n. (R/W)
			vsyncRmtConf1.ref_cnt_rst	= 1;	// RMT_REF_CNT_RST_CH

			hsyncRmtConf1.mem_rd_rst	= 1;	// RMT_MEM_RD_RST_CHn	Set this bit to reset the read-RAM address for channel n by accessing the transmitter. (R/W)
			vsyncRmtConf1.mem_rd_rst	= 1;	// RMT_MEM_RD_RST_CHn
		}


		spiHw->dma_conf.dma_tx_stop		= 1;	// Stop SPI DMA
		//spiHw->ctrl2.val           		= 0;	// Reset timing
		spiHw->dma_conf.dma_tx_stop		= 0;	// Disable stop
		spiHw->dma_conf.dma_continue	= 1;	// Set contiguous mode
		spiHw->dma_out_link.start		= 1;	// Start SPI DMA transfer (1)

		ESP_ERROR_CHECK(rmt_set_tx_thr_intr_en(rmt.hsyncChannel, true, 1));
		ESP_ERROR_CHECK(rmt_set_tx_thr_intr_en(rmt.vsyncChannel, true, 7));

		clearIsrCounters();

		kickPeripherals(spiHw, rmt.hsyncChannel, rmt.vsyncChannel);
		portENABLE_INTERRUPTS();

		return ESP_OK;
	}
	

my_spi.cpp:

Code: Select all

static uint8_t getSpid_cs0_out_ByHost(
    spi_host_device_t host
) {
    switch(host) {
    case SPI_HOST:  return SPICS0_OUT_IDX;  break;
    case HSPI_HOST: return HSPICS0_OUT_IDX; break;
    case VSPI_HOST: return VSPICS0_OUT_IDX; break;
    default:        return InvalidIndex;  break;
    }
}

static uint8_t getSpid_cs0_in_ByHost(
    spi_host_device_t host
) {
    switch(host) {
    case SPI_HOST:  return SPICS0_IN_IDX;  break;
    case HSPI_HOST: return HSPICS0_IN_IDX; break;
    case VSPI_HOST: return VSPICS0_IN_IDX; break;
    default:        return InvalidIndex;  break;
    }
}

static uint8_t getSpi_clk_in_ByHost(
    spi_host_device_t host
) {
    switch(host) {
    case SPI_HOST:  return SPICLK_IN_IDX;   break;
    case HSPI_HOST: return HSPICLK_IN_IDX;  break;
    case VSPI_HOST: return VSPICLK_IN_IDX;  break;
    default:        return InvalidIndex;  break;
    }
}

static uint8_t getSpi_clk_out_ByHost(
    spi_host_device_t host
) {
    switch(host) {
    case SPI_HOST:  return SPICLK_OUT_IDX;   break;
    case HSPI_HOST: return HSPICLK_OUT_IDX;  break;
    case VSPI_HOST: return VSPICLK_OUT_IDX;  break;
    default:        return InvalidIndex;  break;
    }
}

esp_err_t myspi_prepare_circular_buffer(
      const spi_host_device_t   spiHostDevice
    , const int                 dma_chan
    , const lldesc_t*           lldescs
    , const double              dmaClockSpeedInHz
    , const gpio_num_t          mosi_gpio_num
    , const gpio_num_t          clk_gpio_num
    , const gpio_num_t          cs_gpio_num
    , const int                 waitCycle
) {
    const bool spi_periph_claimed = spicommon_periph_claim(spiHostDevice);
    if(! spi_periph_claimed) {
        return MY_ESP_ERR_SPI_HOST_ALREADY_IN_USE;
    }

    const bool dma_chan_claimed = spicommon_dma_chan_claim(dma_chan);
    if(! dma_chan_claimed) {
        spicommon_periph_free(spiHostDevice);
        return MY_ESP_ERR_SPI_DMA_ALREADY_IN_USE;
    }

    spi_dev_t* const spiHw = myspi_get_hw_for_host(spiHostDevice);
    const int Cs = 0;
	const int CsMask = 1 << Cs;

    //Use GPIO
    PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[mosi_gpio_num], PIN_FUNC_GPIO);
    gpio_set_direction(mosi_gpio_num, GPIO_MODE_INPUT_OUTPUT);
    gpio_matrix_out(mosi_gpio_num, getSpidOutByHost(spiHostDevice), false, false);
    gpio_matrix_in(mosi_gpio_num, getSpidInByHost(spiHostDevice), false);

    PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[clk_gpio_num], PIN_FUNC_GPIO);
    gpio_set_direction(clk_gpio_num, GPIO_MODE_INPUT_OUTPUT);
    gpio_matrix_out(clk_gpio_num, getSpi_clk_out_ByHost(spiHostDevice), false, false);
    gpio_matrix_in(clk_gpio_num, getSpi_clk_in_ByHost(spiHostDevice), false);

    PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[cs_gpio_num], PIN_FUNC_GPIO);
    gpio_set_direction(cs_gpio_num, GPIO_MODE_INPUT_OUTPUT);
    gpio_matrix_out(cs_gpio_num, getSpid_cs0_out_ByHost(spiHostDevice), false, false);
    gpio_matrix_in(cs_gpio_num, getSpid_cs0_in_ByHost(spiHostDevice), false);

    //Select DMA channel.
    DPORT_SET_PERI_REG_BITS(
          DPORT_SPI_DMA_CHAN_SEL_REG
        , 3
        , dma_chan
        , (spiHostDevice * 2)
    );

    //Reset DMA
    spiHw->dma_conf.val        		|= SPI_OUT_RST|SPI_IN_RST|SPI_AHBM_RST|SPI_AHBM_FIFO_RST;
    spiHw->dma_out_link.start  		= 0;
    spiHw->dma_in_link.start   		= 0;
    spiHw->dma_conf.val        		&= ~(SPI_OUT_RST|SPI_IN_RST|SPI_AHBM_RST|SPI_AHBM_FIFO_RST);

    //Reset timing
    spiHw->ctrl2.val           		= 0;

    //Disable unneeded ints
    spiHw->slave.rd_buf_done   		= 0;
    spiHw->slave.wr_buf_done   		= 0;
    spiHw->slave.rd_sta_done   		= 0;
    spiHw->slave.wr_sta_done   		= 0;
    spiHw->slave.rd_buf_inten  		= 0;
    spiHw->slave.wr_buf_inten  		= 0;
    spiHw->slave.rd_sta_inten  		= 0;
    spiHw->slave.wr_sta_inten  		= 0;
    spiHw->slave.trans_inten   		= 0;
    spiHw->slave.trans_done    		= 0;

    //Set CS pin, CS options
	//spiHw->pin.master_ck_sel			&= ~CsMask;
	//spiHw->pin.master_cs_pol			&= ~CsMask;

  spiHw->pin.master_ck_sel      &= CsMask;
  spiHw->pin.master_cs_pol      &= CsMask;

	// Set SPI Clock:
  {
    const double	preDivider			= 4.0;
    const double	apbClockSpeedInHz	= APB_CLK_FREQ;
    const double	apbClockPerDmaCycle	= (apbClockSpeedInHz / preDivider / dmaClockSpeedInHz);

    const int32_t	clkdiv_pre	= ((int32_t) preDivider) - 1;
    const int32_t	clkcnt_n	= ((int32_t) apbClockPerDmaCycle) - 1;
    const int32_t	clkcnt_h	= (clkcnt_n + 1) / 2 - 1;
    const int32_t	clkcnt_l	= clkcnt_n;

    spiHw->clock.clk_equ_sysclk	= 0;
    spiHw->clock.clkcnt_n		= clkcnt_n;
    spiHw->clock.clkdiv_pre		= clkdiv_pre;
    spiHw->clock.clkcnt_h		= clkcnt_h;
    spiHw->clock.clkcnt_l		= clkcnt_l;
	}

    //Configure bit order
    spiHw->ctrl.rd_bit_order           = 0;    // MSb first
    spiHw->ctrl.wr_bit_order           = 0;    // MSb first

    //Configure polarity
    spiHw->pin.ck_idle_edge            = 0;
    spiHw->user.ck_out_edge            = 0;
    spiHw->ctrl2.miso_delay_mode       = 0;

    //configure dummy bits
    spiHw->user.usr_dummy              = 0;
    spiHw->user1.usr_dummy_cyclelen    = 0;

    //Configure misc stuff
    spiHw->user.doutdin                = 0;
    spiHw->user.sio                    = 0;

    //Configure CS timing
    spiHw->ctrl2.setup_time            = 0;
    spiHw->user.cs_setup               = 0;
    spiHw->ctrl2.hold_time             = 0;
    spiHw->user.cs_hold                = 0;

    //Configure CS pin
    spiHw->pin.cs0_dis                 = (Cs == 0) ? 0 : 1;
    spiHw->pin.cs1_dis                 = (Cs == 1) ? 0 : 1;
    spiHw->pin.cs2_dis                 = (Cs == 2) ? 0 : 1;

    //spiHw->pin.cs_keep_active = 1;

    //Reset SPI peripheral
    spiHw->dma_conf.val                |= SPI_OUT_RST|SPI_IN_RST|SPI_AHBM_RST|SPI_AHBM_FIFO_RST;
    spiHw->dma_out_link.start          = 0;
    spiHw->dma_in_link.start           = 0;
    spiHw->dma_conf.val                &= ~(SPI_OUT_RST|SPI_IN_RST|SPI_AHBM_RST|SPI_AHBM_FIFO_RST);
    spiHw->dma_conf.out_data_burst_en  = 1;

    //Set up QIO/DIO if needed
    spiHw->ctrl.val		&= ~(SPI_FREAD_DUAL|SPI_FREAD_QUAD|SPI_FREAD_DIO|SPI_FREAD_QIO);
    spiHw->user.val		&= ~(SPI_FWRITE_DUAL|SPI_FWRITE_QUAD|SPI_FWRITE_DIO|SPI_FWRITE_QIO);

    //DMA temporary workaround: let RX DMA work somehow to avoid the issue in ESP32 v0/v1 silicon
    spiHw->dma_in_link.addr            = 0;
    spiHw->dma_in_link.start           = 1;

    spiHw->user1.usr_addr_bitlen       = 0;
    spiHw->user2.usr_command_bitlen    = 0;
    spiHw->user.usr_addr               = 0;
    spiHw->user.usr_command            = 0;
    if(waitCycle <= 0) {
        spiHw->user.usr_dummy              = 0;
        spiHw->user1.usr_dummy_cyclelen    = 0;
    } else {
        spiHw->user.usr_dummy              = 1;
        spiHw->user1.usr_dummy_cyclelen    = (uint8_t) (waitCycle-1);
    }

    spiHw->user.usr_mosi_highpart      = 0;
    spiHw->user2.usr_command_value     = 0;
    spiHw->addr                        = 0;
    spiHw->user.usr_mosi               = 1;        // Enable MOSI
    spiHw->user.usr_miso               = 0;

    spiHw->dma_out_link.addr           = (int)(lldescs) & 0xFFFFF;

    spiHw->mosi_dlen.usr_mosi_dbitlen  = 0;		// works great! (there's no glitch in 5 hours)
    spiHw->miso_dlen.usr_miso_dbitlen  = 0;

    // Set circular mode
    //      https://www.esp32.com/viewtopic.php?f=2&t=4011#p18107
    //      > yes, in SPI DMA mode, SPI will alway transmit and receive
    //      > data when you set the SPI_DMA_CONTINUE(BIT16) of SPI_DMA_CONF_REG.
    spiHw->dma_conf.dma_continue       = 1;

    return ESP_OK;
}
Logic analyser capture:
20191018_spi_dma.png
20191018_spi_dma.png (43.43 KiB) Viewed 5537 times
Is it possible to let the DMA drive the CS signal for each 2-byte transmission?

Re: SPI continous transmission with DMA with circular linked list

Posted: Fri Nov 01, 2019 8:25 am
by sheinz
Hi Chris,
I think because your running your DMA output in a circular loop, the SPI peripheral sees it as one transmission, and therefore holds CS low during the entire transmission (i.e. forever). One trick you could do is use the MCPWM peripheral and run a PWM signal in parallel to your SPI output. Drive your pin from the MCPWM instead of using the SPI CS. The trick is, you have to start both simultaneously. They both run off the same clock, so they stay in sync. I used this approach to sync two MCPWM timers to the SPI, to do something similar:

Code: Select all

// this bit of code makes sure both timers and SPI transfer are started as close together as possible
portENTER_CRITICAL(&led_mux);
WRITE_PERI_REG( SPI_CMD_REG(3), 1 << SPI_USR_S ); // start SPI transfer
WRITE_PERI_REG( MCPWM_TIMER0_CFG1_REG(0), (1 << MCPWM_TIMER0_MOD_S) | (2 << MCPWM_TIMER0_START_S) ); // start timer 0
WRITE_PERI_REG( MCPWM_TIMER1_CFG1_REG(0), (1 << MCPWM_TIMER1_MOD_S) | (3 << MCPWM_TIMER1_START_S) ); // start timer 1
portEXIT_CRITICAL(&led_mux);
You will have to read the ESP32 technical reference manual a lot to understand all the MCPWM registers, but you can do a lot with it. Even slightly adjust the phase in case it's not quite right.

Code: Select all

MCPWM0.timer[0].sync.timer_phase         = 8; // phase compensation to align timer 0 waveform
MCPWM0.timer[0].sync.sync_sw            ^= 1; // update phase
MCPWM0.timer[1].sync.timer_phase         = 9; // phase compensation to align timer 1 waveform
MCPWM0.timer[1].sync.sync_sw            ^= 1; // update phase
Another possibility could be using the quad output of the SPI. You would have to encode your data so that it appears in the right format only on output 1, then encode your latch as data that only appears on output 2.

Hope this helps
_
Seb