For background, I have successfully used the FT81x family of display/touchscreen controllers as an SPI slave in DMA, MOSI/MISO, full duplex mode many times (I wrote the ESP32 LVGL graphics driver for it). To use the FT81x you need to send and receive data (i.e. read/write). The FT81x also supports DIO and QIO modes so I recently gave that a try, using HALF_DUPLEX mode. I got it to work but I have found very strange results when trying to use DMA SPI in DIO mode for reading data from this device (writing works fine). Note that I am careful to not try to read and write in the same transaction as documented in half-duplex DMA mode.
The FT81x essentially functions like a memory-mapped like device. SPI transactions are used to send addresses for writing and reading – I use the transaction address field for this – followed by a stream of bytes - sometimes really large blocks of data up to 64k. Reading requires using 8 dummy bits between the address and byte streaming of the SPI transaction – I use the SPI transaction’s variable dummy bit field for this when reading.
I’ll also note here that there are several reports both here and on Github of using half duplex mode and dropping bytes while reading so I don’t think I am the first person to see this but I uncovered a lot of details.
On the logic analyzer everything looks perfect however reading data is wonky, at best. What I found does not appear to be a hardware issue but rather an ESP32 SPI driver issue in HALF_DUPLEX mode. I spent a lot of time trying to work through this and took careful notes and logic analyzer traces. It is 100% reproducible in all the scenarios I documented below. There are several really strange things going on in HALF_DUPLUX mode:
- When I use rx_buffer as a pointer to a data buffer, no matter how many bytes I request to read, from 1 to 4, the driver always writes out 4 bytes into the read buffer, filling it with extra zeros. It seems to behave as if it were using rx_buffer in SPI_TRANS_USE_RXDATA mode, always writing 4 bytes, and zeroing out any extra bytes when requested less than 4. This is of course a problem if the buffer is not big enough – for instance reading a single byte even though the slave clearly sends the data.
- When setting the SPI transaction for 8 variable dummy bits in DIO mode, you have to really set it to 4 or you will get extra clock cycles (another full byte). However, the rxlength is still specified in actual data bit count. Dummy bits seem to be counted as DIO pairs. Maybe this is just a documentation issue though as while inconsistent, it sort of makes sense.
- The biggest issue though, in DMA half duplex DIO mode, I can only reliably read one byte at a time. If I request more than that, only the first byte is copied to the receive buffer (and the extra zeros) yet all the data appears in the logic analyzer traces. It seems that the driver just doesn’t read or copy the other data to the receive buffer UNLESS you use this one really weird hack (details below).
Note: one important thing to note is, at boot the FT81x works in standard MOSI/MISO mode and it requires a few SPI transaction to switch it to DIO (or QIO) mode. I do not tear down the bus or device at this transition since DIO mode is just adding flags to all following SPI transactions. But I have also tried that and it makes no difference. At this point sending (writing) data to the slave works just fine – I can send a lot really fast (32-64k chunks at 32Mhz) but cannot read anything reliably beyond a single byte even though the slave clearly sends the data.
So here are a bunch of scenarios with logic traces, all using DMA, DIO, and HALF_DUPLEX:
Components/software
ESP32-WROVER-B
FT813 display controller
ESP-IDF version 4.1
SPI Bus config
DMA channel: 1
Max transfer: 64000
SPI Device config
Clock speed: 1,000,000 (slowed down for logic traces but it functions the same all the way up to 32,000,000)
Mode: 0
Queue size: 1
Flags: SPI_DEVICE_HALFDUPLEX (I have also tried adding SPI_DEVICE_NO_DUMMY and it does not appear to change anything, neither the scope display or the bytes received in the read buffer)
Common to all DMA read transactions
RX buffer: ptr to uint8_t buffer
Address bits: 24
Command bits: 0
Flags: SPI_TRANS_VARIABLE_ADDR | SPI_TRANS_VARIABLE_DUMMY | SPI_TRANS_MODE_DIO | SPI_TRANS_MODE_DIOQIO_ADDR
Note: Because all of these SPI transactions start with a 24-bit address followed by 8 dummy bits, the data the slave returns start in the 5th decoded byte of each of the logic traces below.
Failed DMA read Transaction for a single byte (expected 0x7c)
What seems to be happening here, is 16 bits are clocked as dummy instead of 8 so the single 0x7c data byte from the slave is skipped. The required dummy bits are 4th in decode blocks – 0x00, after 0x302000 24-bit address.
Variable Dummy bits: 8
RX Length: 8
Slave data transmitted: 0x7c + an extra 0x00 byte at end due to the extra dummy clocks
Transaction data written to receive buffer: 0, 0, 0, 0 (note: 4 bytes written to buffer – all zeros)
Successful DMA read Transaction for a single byte
Using ½ as many dummy bits (e.g. 4) seems to correctly read the single 0x7c data byte from the slave (i.e. in DIO mode dummy bits seem to be counted as DIO bit pairs – not the total bit count). Maybe this is just a documentation error?
Variable Dummy bits: 4
RX Length: 8
Slave data transmitted: 0x7c
Transaction data written to receive buffer: 0x7c, 0, 0, 0 (note: data + 3 extra zero bytes written)
Failed DMA read Transaction for two bytes (expected 0x04, 0x03)
In this case, only the first byte is correctly read – the second is not read at all – it simply just does not make it into the read buffer. Note that address 0x000004 was used this time and there is still one correct dummy byte that follows.
Variable Dummy bits: 4 bits
RX Length: 16
Slave data transmitted: 0x04, 0x03
Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: data + 3 extra zero bytes written)
Sometimes successful DMA read Transaction for two bytes (expected 0x04, 0x03)
This is a case I found by accident. I found that if I increase the RX length by between 1 and 7 extra bits (i.e. specifying 17 when I really want 16) then I get the 2nd byte but only every other read despite the scope looking exactly the same for each read cycle. Furthermore, if I instead increase RX by 8 or more extra bits (i.e. specifying 24 or more when I really want 16), then it goes back to only receiving the 1st byte. I can find no settings that makes it read both bytes every single time.
Variable Dummy bits: 4 bits
RX Length: 16
Slave data transmitted: 0x04, 0x03
1st Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: 1st byte + 3 extra zero bytes written)
2nd Transaction data written to receive buffer: 0x04, 0x03, 0, 0 (note: 1st + 2nd + 2 extra zero bytes written)
3rd Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: 1st + 3 extra zero bytes written)
4th Transaction data written to receive buffer: 0x04, 0x03, 0, 0 (note: 1st + 2nd + 2 extra zero bytes written)
Etc… - continues to flip back and forth with valid two-bytes of data.
Failed DMA read Transaction for 4 bytes (expected 0x04, 0x03, 0x02, 0x01)
In this case, again only the first byte is correctly read – the second, third, or forth bytes are not read at all. Very similar to the case when reading 2 bytes. I believe this is the case for reading anything more than a single byte – only the first byte is returned if the exact length is specified.
Variable Dummy bits: 4 bits
RX Length: 32
Slave data transmitted: 0x04, 0x03, 0x02, 0x01
Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: data + 3 extra zero bytes written)
Increasing weird patterns for DMA read Transaction for 4 bytes (0x04, 0x03, 0x02, 0x01)
Again, if I use the proper RX Length (32 for 4 bytes), then I only receive the 1st byte despite the slave sending all 4 bytes. However, if I increase the RX length by 1 extra bit, then I get another rotating pattern.
Variable Dummy bits: 4 bits
RX Length: 33
Slave data transmitted: 0x04, 0x03, 0x02, 0x01
1st Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: 1st + 3 extra zero bytes written)
2nd Transaction data written to receive buffer: 0x04, 0x03, 0x02, 0x01 (note: all 4 correct bytes written)
3rd Transaction data written to receive buffer: 0x04, 0x03, 0x02, 0 (note: 1, 2, 3 correct + 1 extra zero byte written)
4th Transaction data written to receive buffer: 0x04, 0x03, 0, 0 (note: 1, 2 + 2 extra zero bytes written)
5th Transaction data written to receive buffer: 0x04, 0, 0, 0 (note: 1st + 3 extra zero bytes written)
Etc… - continues to rotate through pattern.
It seems that if you set it up correctly you only ever get the first byte of the data no matter the size requested. However, if you get it slightly wrong (the one weird trick) then you can get the driver to read all the data but not consistently. If I had to guess, somewhere there seems to be a circular buffer that’s not locked in step with the number of bytes read, or something like that that produces the odd rotating byte patterns. Yet I can find no combinations of settings that makes it work reliably.
Does anyone have any idea what is really going on here? Is this a configuration issue. The code this is a part of is quite large but I can try and work up and isolated case, or at least show the SPI code if needed but it is pretty standard stuff and I believe it is correct because I can write data all day long - I just can't read data, which is just setting a rxlength in the end on the transaction.