PSRAM caching, CPU / DMA etc.
PSRAM caching, CPU / DMA etc.
I use ESP32-S3 for audio processing, and wonder if I can improve performance. As I can figure out, the bottleneck now is caching mechanism for PSRAM. Am I able of doing something around it within Arduino? Can I turn caching off, and do all the memory stuff in-app, cause actually I operate block-wise? Or is it impossible at all? I had some reading thru the Espressif docs, but much remain unclear(((...
Re: PSRAM caching, CPU / DMA etc.
Minimize the use of dynamic memory allocation (e.g., malloc, free) and prefer static allocation wherever possible.
Re: PSRAM caching, CPU / DMA etc.
I allocate buffers using malloc(), but only once per global init, so I doubt this to be an issue. After that I use pointer arithmetic to access these arrays. The matter is when I use heap_caps_malloc( BUF_SIZE_BYTES , MALLOC_CAP_INTERNAL) then it works fast enough, but this obviously limits the size of the buffers, and accordingly, data block size, which lowers the peripherals effectiveness. Initially I wanted to use MALLOC_CAP_SPIRAM, but it seems that OPI PSRAM is not fast enough. So to finalize this conclusion I wanted to test if caching tuning could improve my setup.
-
- Posts: 829
- Joined: Mon Jul 22, 2019 3:20 pm
Re: PSRAM caching, CPU / DMA etc.
AFAIK, the caching is part of the memory mapping, which is how the psram is addressed. If you want to do some caching, I'm afraid you will have to micro-manage it yourself.
-
- Posts: 1734
- Joined: Mon Oct 17, 2022 7:38 pm
- Location: Europe, Germany
Re: PSRAM caching, CPU / DMA etc.
Some ideas:
1) Try and refactor your algorithm to make better use of the cache, i.e. fully process one block of data in one go, then head to the next block, so that ideally each block only needs to be loaded/stored from/to PSRAM once.
2) See if you can limit the size of data blocks so that a whole block fits into the cache.
3) Use DMA to transfer one block from PSRAM to internal RAM or vice versa while the CPU is processing another block. Alternatively, check if you can leverage prefetching (see e.g. here) w/o DMA.
1) Try and refactor your algorithm to make better use of the cache, i.e. fully process one block of data in one go, then head to the next block, so that ideally each block only needs to be loaded/stored from/to PSRAM once.
2) See if you can limit the size of data blocks so that a whole block fits into the cache.
3) Use DMA to transfer one block from PSRAM to internal RAM or vice versa while the CPU is processing another block. Alternatively, check if you can leverage prefetching (see e.g. here) w/o DMA.
Who is online
Users browsing this forum: brebisson and 93 guests