Half-Duplex QSPI (Quad SPI/QIO) with DMA [IDFGH-3571]
Posted: Mon Jun 29, 2020 4:13 pm
Hi all, I'm trying to get QSPI working with DMA, but can't seem to get the read phase to work at all.
From the documentation, the phrasing is somewhat ambiguous:
Currently I'm connecting to an FPGA emulating a QSPI flash device. So in every transaction there will be a command, followed by address, then read or write phase.
Without DMA, everything works fine, but I am limited to 64 bytes per transaction.
With DMA, write works (I see the changed data being returned by the device on read using a logic analyser), but read only performs correctly for the first word (32-bits) of the first transaction.
To make it clear, each transaction is either read or write only (excluding the command and address fields, which are sent to the device every transaction, but which I assume does not count as being part of the "write phase")
To get a better sense of what's happening, I re-initialise the read buffer to a non-zero value after each transaction. It seems like the first word is always written to but everything else is untouched. Of all the transactions, only the first word of the first read transaction returns the correct value, further transactions return zero.
Thinking this may be a silicon bug (I read something about DMA bugs, but I don't think they're relevant), I also tested the same code on a 32E device with no difference in behaviour. Although I am not sure if v3.2.3 of the IDF currently bundled with ESP32 Arduino knows of the 32E silicon version.
I also have a logic analyser hooked up to the interface, and the QSPI flash device the FPGA is emulating is behaving correctly on both read and write. It might be worth noting that the flash device operates in mode 3.
This is the test program I have written, it uses the IOMUX pins for HSPI.
Can anyone tell me what seems to be the problem? Or are the assumptions I have made about DMA capability with half-duplex and/or command+address not counting as "write phase" erroneous?
The closest issue I can find is this: https://github.com/espressif/esp-idf/issues/598
Which does seem to suggest that "MISO phase after a MOSI phase" is the cause, and probably the origin of the quoted text (above) in the IDF documentation. Is this the issue I am facing? In the github issue, the phrasing seems to more strongly suggest read and write for a device instead of within a transaction, which worries me.
From the documentation, the phrasing is somewhat ambiguous:
I'm taking that to mean that it is not compatible when both read and write phases exist in a transaction as opposed to a device? And anyway since only half-duplex is possible in QSPI, this is the only choice I have.Half duplex mode is not compatible with DMA when both writing and reading phases exist.
Currently I'm connecting to an FPGA emulating a QSPI flash device. So in every transaction there will be a command, followed by address, then read or write phase.
Without DMA, everything works fine, but I am limited to 64 bytes per transaction.
With DMA, write works (I see the changed data being returned by the device on read using a logic analyser), but read only performs correctly for the first word (32-bits) of the first transaction.
To make it clear, each transaction is either read or write only (excluding the command and address fields, which are sent to the device every transaction, but which I assume does not count as being part of the "write phase")
To get a better sense of what's happening, I re-initialise the read buffer to a non-zero value after each transaction. It seems like the first word is always written to but everything else is untouched. Of all the transactions, only the first word of the first read transaction returns the correct value, further transactions return zero.
Thinking this may be a silicon bug (I read something about DMA bugs, but I don't think they're relevant), I also tested the same code on a 32E device with no difference in behaviour. Although I am not sure if v3.2.3 of the IDF currently bundled with ESP32 Arduino knows of the 32E silicon version.
I also have a logic analyser hooked up to the interface, and the QSPI flash device the FPGA is emulating is behaving correctly on both read and write. It might be worth noting that the flash device operates in mode 3.
This is the test program I have written, it uses the IOMUX pins for HSPI.
Code: Select all
// Include the libraries we need
#include "esp_heap_caps.h"
#include "driver/spi_master.h"
#define F_CLK 34
#define CRESET_B 32
#define F_3_SPI_P 14
#define F_3_SPI_N 12
#define F_10_SPI_P 13
#define F_10_SPI_N 4
#define F_26_SPI_P 2
#define F_26_SPI_N 15
//////// QSPI Master stuff
// In addr terms, not bytes
//#define SPI_TEST_LEN (SPI_MAX_DMA_LEN/2)
#define SPI_TEST_LEN 1024
//#define SPI_TEST_LEN 16
#define SPI_MOSI F_10_SPI_P
#define SPI_MISO F_3_SPI_N
#define SPI_WP F_26_SPI_P
#define SPI_HD F_10_SPI_N
#define SPI_CLK F_3_SPI_P
#define SPI_CS F_26_SPI_N
#define SPI_HOST HSPI_HOST
#define SPI_FREQ SPI_MASTER_FREQ_8M
//#define SPI_FREQ 10000
spi_device_handle_t dev;
// In addr terms, not bytes, +1 in case SPI_TEST_LEN is not even
#define SPI_BUFLEN (SPI_TEST_LEN+1)
#define SPI_QUEUELEN 6
// TX-RX DMA Buffers
uint16_t* spi_buf[SPI_QUEUELEN];
volatile uint8_t currbuf = 0;
// Data attached to transaction to specify which buffer it is using
uint8_t bufnum[SPI_QUEUELEN];
volatile bool spi_gotdata = false;
// Tracks number of transactions in flight
// Used to figure out when we have space for a new transaction
volatile uint8_t num_dat = 0;
//Called right before a transaction is started
void IRAM_ATTR SPI_pre_trans_cb(spi_transaction_t *trans) {
}
//Called after transaction is completed
void IRAM_ATTR SPI_post_trans_cb(spi_transaction_t *trans) {
// Signal to main thread if a read transaction has been completed
if (trans->rx_buffer != NULL) {
spi_gotdata = true;
}
}
bool queue_transaction(spi_transaction_t *t) {
num_dat++;
if (spi_device_queue_trans(dev, t, 0) != ESP_OK) {
num_dat--;
return false;
}
return true;
}
bool queue_write_transaction(uint8_t n, uint16_t addr, size_t len) {
spi_transaction_t* t = new spi_transaction_t;
// Ensure that data is a multiple of 4 and at least 8 bytes long
// Since len is counting number of 16-bit words, we compare 0b1 and 4
// instead of 0b11 and 8
while (len & 0b1 || len < 4) len++;
t->flags = SPI_TRANS_MODE_QIO | SPI_TRANS_MODE_DIOQIO_ADDR;
t->cmd = 0b00110010; //0x32
t->addr = addr;
t->length = len << 4;
t->rxlength = 0;
t->tx_buffer = spi_buf[n];
t->rx_buffer = NULL;
t->user = &(bufnum[n]);
return queue_transaction(t);
}
bool queue_read_transaction(uint8_t n, uint16_t addr, size_t len) {
spi_transaction_t* t = new spi_transaction_t;
len++;
// Ensure that data is a multiple of 4 and at least 8 bytes long
// Since len is counting number of 16-bit words, we compare 0b1 and 4
// instead of 0b11 and 8
while (len & 0b1 || len < 4) len++;
t->flags = SPI_TRANS_MODE_QIO | SPI_TRANS_MODE_DIOQIO_ADDR;
t->cmd = 0b01101011; //0x6B
t->addr = addr;
t->length = 0;
t->rxlength = len << 4;
t->tx_buffer = NULL;
t->rx_buffer = spi_buf[n];
t->user = &(bufnum[n]);
return queue_transaction(t);
}
uint16_t get_next_addr(bool advance) {
static uint16_t addr = 0;
uint16_t ret = addr;
if (advance) {
addr += SPI_TEST_LEN;
}
if (advance && !(ret & 0x1FFF)) {
Serial.println(F("\r\nWraparound!"));
}
return ret;
}
// Returns current pointer value and increment it (or warparound)
uint8_t increment_currbuf_pointer() {
uint8_t ret = currbuf;
currbuf++;
if (currbuf >= SPI_QUEUELEN) currbuf = 0;
return ret;
}
uint16_t* get_next_spi_buffer(uint8_t* n) {
*n = increment_currbuf_pointer();
return spi_buf[*n];
}
void setup_qspi() {
// Populate bufnum
for (int i = 0; i < SPI_QUEUELEN; i++) {
bufnum[i] = i;
}
// Configuration for the SPI bus
spi_bus_config_t buscfg;
memset(&buscfg, 0, sizeof(buscfg));
buscfg.mosi_io_num = (gpio_num_t)SPI_MOSI;
buscfg.miso_io_num = (gpio_num_t)SPI_MISO;
buscfg.sclk_io_num = (gpio_num_t)SPI_CLK;
buscfg.quadwp_io_num = (gpio_num_t)SPI_WP;
buscfg.quadhd_io_num = (gpio_num_t)SPI_HD;
buscfg.max_transfer_sz = SPI_BUFLEN * 2;
buscfg.flags = SPICOMMON_BUSFLAG_MASTER | SPICOMMON_BUSFLAG_QUAD | SPICOMMON_BUSFLAG_NATIVE_PINS;
buscfg.intr_flags = 0;
// Configuration of SPI device
spi_device_interface_config_t devcfg;
memset(&devcfg, 0, sizeof(devcfg));
devcfg.command_bits = 8;
devcfg.address_bits = 16;
devcfg.dummy_bits = 0; // Or 16 in QSPI?
devcfg.mode = 3;
devcfg.clock_speed_hz = SPI_FREQ;
devcfg.input_delay_ns = 0;
devcfg.spics_io_num = (gpio_num_t)SPI_CS;
devcfg.queue_size = SPI_QUEUELEN;
devcfg.flags = SPI_DEVICE_HALFDUPLEX;
devcfg.pre_cb = SPI_pre_trans_cb;
devcfg.post_cb = SPI_post_trans_cb;
pinMode(SPI_CS, OUTPUT);
digitalWrite(SPI_CS, HIGH);
pinMode(SPI_CLK, OUTPUT);
digitalWrite(SPI_CLK, HIGH);
pinMode(SPI_MOSI, INPUT_PULLUP);
pinMode(SPI_MISO, INPUT_PULLUP);
pinMode(SPI_WP, INPUT_PULLUP);
pinMode(SPI_HD, INPUT_PULLUP);
// Allocate DMA-capable (word-aligned DRAM) memory
for (int i = 0; i < SPI_QUEUELEN; i++) {
spi_buf[i] = (uint16_t*)heap_caps_malloc(SPI_BUFLEN * 2, MALLOC_CAP_DMA|MALLOC_CAP_8BIT|MALLOC_CAP_INTERNAL);
if (spi_buf[i] == NULL) {
Serial.println(F("Err, SPI buffer malloc failure! Restarting..."));
ESP.restart();
}
memset(spi_buf[i], 0xBB, SPI_BUFLEN * 2);
}
if (spi_bus_initialize(SPI_HOST, &buscfg, 2) != ESP_OK) {
Serial.println(F("Err, SPI failure! Restarting..."));
ESP.restart();
}
if (spi_bus_add_device(SPI_HOST, &devcfg, &dev) != ESP_OK) {
Serial.println(F("Err, SPI device add failure! Restarting..."));
ESP.restart();
}
// We acquire the bus forever
if (spi_device_acquire_bus(dev, portMAX_DELAY) != ESP_OK) {
Serial.println(F("Err, SPI not ready! Restarting..."));
ESP.restart();
}
}
void gen_const_data(uint16_t* dat, size_t len, uint16_t val) {
for (int i = 0; i < len; i++) {
dat[i] = SPI_SWAP_DATA_TX(val, 16);
}
}
void gen_test_data(uint16_t* dat, size_t len) {
for (int i = 0; i < len; i++) {
dat[i] = SPI_SWAP_DATA_TX(i + 1, 16);
}
}
void gen_rand_data(uint16_t* dat, size_t len) {
uint16_t randval;
for (int i = 0; i < len; i++) {
// Without radio, this is pseudo-random, but that's fine for testing
randval = esp_random() & 0xFFFF;
// We don't really need to swap for random data, but... eh
dat[i] = SPI_SWAP_DATA_TX(randval, 16);
}
}
void qspi_poll() {
uint8_t n;
int i;
spi_transaction_t *tx;
spi_transaction_t *rx;
if (spi_gotdata) {
spi_gotdata = 0;
if (spi_device_get_trans_result(dev, &tx, 0) != ESP_OK) {
Serial.println(F("Could not retrieve TX transaction?"));
return;
}
if (spi_device_get_trans_result(dev, &rx, 0) != ESP_OK) {
Serial.println(F("Could not retrieve RX transaction, unsynced! Reboot!"));
ESP.restart();
}
// Make sure data is correct
uint16_t* txd = (uint16_t*)(tx->tx_buffer);
uint16_t* rxd = (uint16_t*)(rx->rx_buffer);
size_t txl = (tx->length) >> 4;
size_t rxl = (rx->rxlength) >> 4;
// Skip the first received word during comparison as it is the dummy word
if (memcmp(txd, rxd+1, SPI_TEST_LEN * 2) != 0) {
Serial.print(F("\r\nFailed! "));
Serial.print(txl);
Serial.print(' ');
Serial.print(rxl);
Serial.print(F("\r\nTX: "));
for (i = 0; i < txl; i++) {
//for (i = 0; i < 10; i++) {
Serial.printf("%04X ", SPI_SWAP_DATA_RX(txd[i], 16));
}
Serial.print(F("\r\nRX: "));
for (i = 0; i < rxl; i++) {
//for (i = 1; i <= 10; i++) {
Serial.printf("%04X ", SPI_SWAP_DATA_RX(rxd[i], 16));
}
Serial.println();
} else {
Serial.print('.');
}
// Overwrite RX memory (not necessary if things work properly)
// We do this to show that DMA does not touch past the first word
memset(rx->rx_buffer, 0xCE, (rx->rxlength) >> 3);
delete tx;
// Decrease num_dat
num_dat--;
delete rx;
// Decrease num_dat
num_dat--;
}
if (num_dat != SPI_QUEUELEN) {
uint16_t* buf = get_next_spi_buffer(&n);
if (n & 0b1) {
// n is odd (read)
if (!queue_read_transaction(n, get_next_addr(true), SPI_TEST_LEN)) {
Serial.println(F("Err, unable to re-queue RX, rebooting..."));
ESP.restart();
}
} else {
// n is even (write)
// Generate new test data
//gen_const_data(buf, SPI_TEST_LEN, get_next_addr(false));
gen_const_data(buf, SPI_TEST_LEN, 0x5AA5);
//gen_test_data(buf, SPI_TEST_LEN);
if (!queue_write_transaction(n, get_next_addr(false), SPI_TEST_LEN)) {
Serial.println(F("Err, unable to re-queue TX, rebooting..."));
ESP.restart();
}
}
}
}
void setup() {
pinMode(F_CLK, INPUT);
pinMode(CRESET_B, INPUT_PULLUP);
//digitalWrite(CRESET_B, HIGH);
// SPI CLK
pinMode(SPI_CLK, OUTPUT);
digitalWrite(SPI_CLK, HIGH);
pinMode(SPI_MOSI, INPUT);
pinMode(SPI_MISO, INPUT);
pinMode(SPI_WP, INPUT);
pinMode(SPI_HD, INPUT);
// SPI CS
pinMode(SPI_CS, OUTPUT);
digitalWrite(SPI_CS, HIGH);
Serial.begin(115200);
Serial.println(esp_get_idf_version());
Serial.println(F("BOOT"));
// Wait for FPGA to initialise
delay(5000);
Serial.println(F("SPI"));
setup_qspi();
Serial.println(F("READY"));
}
void loop() {
qspi_poll();
}
The closest issue I can find is this: https://github.com/espressif/esp-idf/issues/598
Which does seem to suggest that "MISO phase after a MOSI phase" is the cause, and probably the origin of the quoted text (above) in the IDF documentation. Is this the issue I am facing? In the github issue, the phrasing seems to more strongly suggest read and write for a device instead of within a transaction, which worries me.