SPI Rx chunk size in DMA mode
-
- Posts: 21
- Joined: Sun May 19, 2024 12:58 pm
SPI Rx chunk size in DMA mode
hi,
looking into SPI Master driver API i see term "Written by 4 bytes-unit if DMA is used" near "void *rx_buffer" structure member description.
in practice, if rx_buffer is declared e.g. as 'unsigned char Rx[2]', data is NOT being transferred into it from MISO line. but 'unsigned char Rx[4]' does the trick - even if only a single Rx byte is required.
is it SPI driver issue or DMA hardware limitation? for example, STM32 DMA is capable of byte-by-byte transfer.
the question is that there are structure fields, aligned exactly in the order SPI MISO data arrives. so God himself ordered to say: "do DMA from SPI into RAM starting at that structure address and be happy".
but... the structure size is not always a multiple of 4 bytes.
so what - declare even 1-byte structure(s) 4-bytes-multiple-sized to enable DMA on them? or intermediate 4-bytes-multiple-sized buffer is a must for DMA Rx(does malloc align data the way DMA work fine even on 1- 2- 3- 5- 7-bytes buffers btw)?
looking into SPI Master driver API i see term "Written by 4 bytes-unit if DMA is used" near "void *rx_buffer" structure member description.
in practice, if rx_buffer is declared e.g. as 'unsigned char Rx[2]', data is NOT being transferred into it from MISO line. but 'unsigned char Rx[4]' does the trick - even if only a single Rx byte is required.
is it SPI driver issue or DMA hardware limitation? for example, STM32 DMA is capable of byte-by-byte transfer.
the question is that there are structure fields, aligned exactly in the order SPI MISO data arrives. so God himself ordered to say: "do DMA from SPI into RAM starting at that structure address and be happy".
but... the structure size is not always a multiple of 4 bytes.
so what - declare even 1-byte structure(s) 4-bytes-multiple-sized to enable DMA on them? or intermediate 4-bytes-multiple-sized buffer is a must for DMA Rx(does malloc align data the way DMA work fine even on 1- 2- 3- 5- 7-bytes buffers btw)?
-
- Posts: 9709
- Joined: Thu Nov 26, 2015 4:08 am
Re: SPI Rx chunk size in DMA mode
What I think that remark says is that DMA always writes out 32 bits, so e.g. if you receive 8 bits in a buf[4], whatever was in buf[1..3] will also be overwritten even if only buf[0] contains the actual received data.
Aside from that, what you say doesn't make much sense to me - the DMA subsystem has no way of knowing how you happened to declare your buffer. Can you post some code to illustrate what you mean?
Aside from that, what you say doesn't make much sense to me - the DMA subsystem has no way of knowing how you happened to declare your buffer. Can you post some code to illustrate what you mean?
-
- Posts: 21
- Joined: Sun May 19, 2024 12:58 pm
Re: SPI Rx chunk size in DMA mode
well, i try and try, but still cannot reproduce the initial behavior.ESP_Sprite wrote: ↑Sun Jun 09, 2024 7:01 amWhat I think that remark says is that DMA always writes out 32 bits, so e.g. if you receive 8 bits in a buf[4], whatever was in buf[1..3] will also be overwritten even if only buf[0] contains the actual received data.
Aside from that, what you say doesn't make much sense to me - the DMA subsystem has no way of knowing how you happened to declare your buffer. Can you post some code to illustrate what you mean?
and what i see now:
- /**
- * Transmits-and-receives through SPI. OWERWRITES *data with Rx values,
- * starting from data[0]. REQUIRES room for 4-bytes chunks in data buffer
- * to Rx successfully through DMA(to Rx 1 byte buffer muzz have room for 4).
- * @param spiDevice Preconfigured by spi_bus_add_device(...) SPI device handle.
- * @param data Data buffer. Note 4-bytes chunk size to use DMA
- * @param rxSize Actual amount of bytes to receive
- * @param totalSize Total amount of bytes to transmit-receive
- * @return SPI IO result(ESP_OK == 0, etc...)
- */
- static int SPI_IO(spi_device_handle_t spiDevice, unsigned char *data, unsigned int rxSize, unsigned int totalSize){
- int result;
- spi_transaction_t t;
- memset(&t, 0, sizeof(t)); //Zerofill the transaction
- t.length = 8 * totalSize;
- t.tx_buffer = data;
- if(rxSize){
- t.rxlength = 8 * rxSize;
- t.rx_buffer = data;
- }
- result = spi_device_polling_transmit(spiDevice, &t);
- return result;
- }
- **
- * Read single register of BME280
- * @param hwIdx Device index in BME_Dev[x] array.
- * @param reg Regisdter address
- * @param data 1-byte variable to read to
- * @return SPI IO result
- */
- int BME_reg_read(unsigned char hwIdx, unsigned char reg, unsigned char *data){
- struct {
- unsigned char ioBuffer[4]; // DMA Rx requires 4-byte chunks, may fail!!!
- int8_t result;
- } wc;
- wc.ioBuffer[0] = BME_register(reg, false);
- wc.result = 0xFF;
- SPI_IO((spi_device_handle_t)BME_Dev[hwIdx].ioId, wc.ioBuffer, 1, 2);
- ESP_LOGI("SPI Rx test", "Rx buffer: %d %d", wc.ioBuffer[0], wc.ioBuffer[1]);
- ESP_LOGI("SPI Rx test", "Result: %d", wc.result);
- if(wc.result == ESP_OK){
- data[0] = wc.ioBuffer[1];
- }
- return wc.result;
- }
Code: Select all
I (15043) SPI Rx test: Rx buffer: 0 96
I (15043) SPI Rx test: Result: -1
Code: Select all
I (15043) SPI Rx test: Rx buffer: 0 96
I (15043) SPI Rx test: Result: 0
so, there is a question still remaining: if we malloc(2), is it DWORD-alligned so DMA can handle, and is there enough blank space after it to do DMA safely?
and one more question: when we use same buffer for Tx and Rx, doesn't DMA destroy the remaining 3 bytes in chunk after transmitting/receiving first one, does it? e.g we transmitting '10 2F EE 04' - is it overwritten after 4-th byte Tx with received 4-bytes result, or turns into '00 00 00 00' after 1-st byte Tx/Rx
Re: SPI Rx chunk size in DMA mode
Hi
it makes no sense to use dma mode when the data length is less than 32 bits
you will incur unnecessary overhead on dma initialization plus dma reads/writes memory always says 32 bits
spi always works through its internal buffer
set the flags SPI_TRANS_USE_TXDATA , SPI_TRANS_USE_RXDATA and the driver itself will align the data internally and give you exactly the number of bytes you requested without changing the high bytes in the word,
the transaction processing speed will decrease,
the CPU load will not change because everything will be processed in the spi buffer
it makes no sense to use dma mode when the data length is less than 32 bits
you will incur unnecessary overhead on dma initialization plus dma reads/writes memory always says 32 bits
spi always works through its internal buffer
set the flags SPI_TRANS_USE_TXDATA , SPI_TRANS_USE_RXDATA and the driver itself will align the data internally and give you exactly the number of bytes you requested without changing the high bytes in the word,
the transaction processing speed will decrease,
the CPU load will not change because everything will be processed in the spi buffer
-
- Posts: 21
- Joined: Sun May 19, 2024 12:58 pm
Re: SPI Rx chunk size in DMA mode
indeed, thanksok-home wrote: ↑Sun Jun 09, 2024 12:56 pmHi
it makes no sense to use dma mode when the data length is less than 32 bits
you will incur unnecessary overhead on dma initialization plus dma reads/writes memory always says 32 bits
spi always works through its internal buffer
set the flags SPI_TRANS_USE_TXDATA , SPI_TRANS_USE_RXDATA and the driver itself will align the data internally and give you exactly the number of bytes you requested without changing the high bytes in the word,
the transaction processing speed will decrease,
the CPU load will not change because everything will be processed in the spi buffer
the thing is would be great to have a single code for SPI IO of any size - used everywhere in the app; the only place SPI IO can be broken and only place containing SPI-related bugs, if any
was wondering, if internal buffer, why driver writes only 4-bytes chunks...
well, until saw this internal buffers magic in spi-master.h
- struct spi_transaction_t {
- uint32_t flags;
- uint16_t cmd;
- uint64_t addr;
- size_t length;
- size_t rxlength;
- void *user;
- union {
- const void *tx_buffer; ///< Pointer to transmit buffer, or NULL for no MOSI phase
- uint8_t tx_data[4]; ///< If SPI_TRANS_USE_TXDATA is set, data set here is sent directly from this variable.
- };
- union {
- void *rx_buffer; ///< Pointer to receive buffer, or NULL for no MISO phase. Written by 4 bytes-unit if DMA is used.
- uint8_t rx_data[4]; ///< If SPI_TRANS_USE_RXDATA is set, data is received directly to this variable
- };
- } ;
and why am i not surprised...
-
- Posts: 9709
- Joined: Thu Nov 26, 2015 4:08 am
Re: SPI Rx chunk size in DMA mode
Yeah, that sounds like it; seems whatever your struct is squashes multiple fields into one 32-bit word.powerbroker wrote: ↑Sun Jun 09, 2024 11:15 ami.e. as you say, 'wc.result' value gets destroyed by DMA, writing 4-bytes chunk to 'wc.ioBuffer' address.
so, there is a question still remaining: if we malloc(2), is it DWORD-alligned so DMA can handle, and is there enough blank space after it to do DMA safely?
Yes. Malloc always allocates on a 32-bit boundary and you get a size back that is rounded up to the next 32-bit-aligned byte.
I'd generally advice against using the same buffer for Tx and Rx. In practice, DMA has a bunch of FIFOs and buffering, so I think it'd only start out writing data after it already read up to 16 words or something into its internal buffers, so it would work out, but I don't think we'd guarantee that works the same in all future chips.and one more question: when we use same buffer for Tx and Rx, doesn't DMA destroy the remaining 3 bytes in chunk after transmitting/receiving first one, does it? e.g we transmitting '10 2F EE 04' - is it overwritten after 4-th byte Tx with received 4-bytes result, or turns into '00 00 00 00' after 1-st byte Tx/Rx
-
- Posts: 21
- Joined: Sun May 19, 2024 12:58 pm
Re: SPI Rx chunk size in DMA mode
thanks: malloc, allocating intermediate buffer is lifesaver hereESP_Sprite wrote: ↑Mon Jun 10, 2024 12:37 amMalloc always allocates on a 32-bit boundary and you get a size back that is rounded up to the next 32-bit-aligned byte.
I'd generally advice against using the same buffer for Tx and Rx. In practice, DMA has a bunch of FIFOs and buffering, so I think it'd only start out writing data after it already read up to 16 words or something into its internal buffers, so it would work out, but I don't think we'd guarantee that works the same in all future chips.
regarding future chips: it would be absolutely terrific to have byte- and word-length capable DMA as well(like e.g STM32 has). so, users don't need to care about all this DWORD-alignment and nearby data damage because of minimal chunk size.
in many cases it's as much, as one intermediate memory copy operation less.
-
- Posts: 1692
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: SPI Rx chunk size in DMA mode
I am 100% positively not sure what you're actually saying while throwing several unrelated things in. (What do the unions declared inside struct spi_transaction_t have to do with anything?)
Are you saying that the SPI driver writes more data to the user-provided RX memory than what documentation says it should? If so, I guess that'd potentially be a bug in the driver (or documentation), but I am under the impression that the SPI driver will handle all cases correctly, or at least as doumented. Can't tell though because, well, your posts are confusing and lacking essential information. (E.g. if you're using half- or full-duplex mode.)
If you want a certain memory alignment, you should explicitly ask for it, e.g.
a) uint8_t my_buffer[BUF_SIZE] __attribute__((aligned( 4 )))
b) uint8_t* my_buffer = (uint8_t*) aligned_alloc(4, BUF_SIZE)
c) uint8_t* my_buffer = (uint8_t*) heap_caps_aligned_alloc(4, BUF_SIZE, MALLOC_CAP_DMA)
Are you saying that the SPI driver writes more data to the user-provided RX memory than what documentation says it should? If so, I guess that'd potentially be a bug in the driver (or documentation), but I am under the impression that the SPI driver will handle all cases correctly, or at least as doumented. Can't tell though because, well, your posts are confusing and lacking essential information. (E.g. if you're using half- or full-duplex mode.)
If you want a certain memory alignment, you should explicitly ask for it, e.g.
a) uint8_t my_buffer[BUF_SIZE] __attribute__((aligned( 4 )))
b) uint8_t* my_buffer = (uint8_t*) aligned_alloc(4, BUF_SIZE)
c) uint8_t* my_buffer = (uint8_t*) heap_caps_aligned_alloc(4, BUF_SIZE, MALLOC_CAP_DMA)
Re: SPI Rx chunk size in DMA mode
Are you replying to what ESP_Sprite says? I think he was quite clear.
Anyway, I don't think it makes sense to start up DMA for just two (or four) bytes. The overhead will be huge and most of the handling will be done by the SPI module anyway. I remember having seen a table in the documentation with break-even points for amounts of data between handling the data directly from/to the module or using DMA and I also remember the break-even point wasn't very small, somehing around 128 bytes.
Also ESP_sprite already said that malloc (and variants) already return aligned memory blocks. This is by concept a requirement of malloc. It's will always return an address aligned to the largest alignment requirement of the processor.
If you want a static or auto (stack) to be aligned, you should indeed use the alignment attributes OR use the dirty trick to declare a "large" type (like uint32_t) and cast it to a char or small char array [sizeof(uint32_t)].
Anyway, I don't think it makes sense to start up DMA for just two (or four) bytes. The overhead will be huge and most of the handling will be done by the SPI module anyway. I remember having seen a table in the documentation with break-even points for amounts of data between handling the data directly from/to the module or using DMA and I also remember the break-even point wasn't very small, somehing around 128 bytes.
Also ESP_sprite already said that malloc (and variants) already return aligned memory blocks. This is by concept a requirement of malloc. It's will always return an address aligned to the largest alignment requirement of the processor.
If you want a static or auto (stack) to be aligned, you should indeed use the alignment attributes OR use the dirty trick to declare a "large" type (like uint32_t) and cast it to a char or small char array [sizeof(uint32_t)].
-
- Posts: 21
- Joined: Sun May 19, 2024 12:58 pm
Re: SPI Rx chunk size in DMA mode
about these unions inside spi_transaction_t: our colleague mentioned some intermediate buffer(s) SPI driver uses in case of user-provided ones don't fit dword-alignment. well... if driver uses e.g. rx_buffer pointer room as 4-bytes data buffer, it most likely relies on structure member alignment and most likely don't care if user aligned buffer area in case of rx_buffer-as-pointer(AFAIK there is a DMA failure code, indicating buffer misalignment - so, align-it-yourself again).MicroController wrote: ↑Mon Jun 10, 2024 10:37 amI am 100% positively not sure what you're actually saying while throwing several unrelated things in. (What do the unions declared inside struct spi_transaction_t have to do with anything?)
Are you saying that the SPI driver writes more data to the user-provided RX memory than what documentation says it should? If so, I guess that'd potentially be a bug in the driver (or documentation), but I am under the impression that the SPI driver will handle all cases correctly, or at least as doumented. Can't tell though because, well, your posts are confusing and lacking essential information. (E.g. if you're using half- or full-duplex mode.)
If you want a certain memory alignment, you should explicitly ask for it, e.g.
a) uint8_t my_buffer[BUF_SIZE] __attribute__((aligned( 4 )))
b) uint8_t* my_buffer = (uint8_t*) aligned_alloc(4, BUF_SIZE)
c) uint8_t* my_buffer = (uint8_t*) heap_caps_aligned_alloc(4, BUF_SIZE, MALLOC_CAP_DMA)
the kind of alignment can be imagined here is dropping data, received during transmission stage: e.g. when SPI master transmits 1-byte address or command and slave starts responding at second byte - the first one received has no sense and data can be shifted one byte left. not this one here, as well.
let us ensure correctness of SPI driver behavior here together: 13984.
in case of 9-bytes full-duplex I/O only 8 bytes of user rx buffer are populated. and i could believe this is correct behavior, but hardware DOES 9(since full-duplex and 2-channel scope there is no MOSI line on screen, sorry): so where is the 9-th byte, which is hardware-processed too?
i think, it's getting lost somewhere inside static inline void spi_ll_read_buffer(...) of spi_ll.h near it's for(...) loop, isn't it?
and all these explicit alignments are really useful, thanks
Who is online
Users browsing this forum: No registered users and 143 guests