ESP_Sprite wrote: ↑Thu Mar 31, 2022 1:17 am
Cache is described in section 3 of the TRM, but here's the Cliffs Notes: There's a certain amount (configurable, I think 32 or 16K) of cache between the CPU and the PSRAM/flash. The idea is that if you access a byte in psram, the cache loads not only that byte but an entire cache line (32 or 64 bytes) in its RAM. If you then need to modify this byte, re-read it, or read the bytes that are next to it (e.g. a memcpy loop), the cache can handle these requests and you don't need to wait for the relatively slow flash or PSRAM bus.
Ok. I read section 3. I set instruction cache with 32K, 8 ways, 32 bytes and data cache with 64k, 8 ways, 64 bytes.
As cache is not of unlimited size, if it's full and you want to read new data, it needs to 'push out' old data. For flash data (which is seen as read-only), the cache simply throws it away, but for PSRAM data that has been modified (the term is that the cache line is 'dirty'), it needs to write back the cache line to PSRAM entirely.
OK.
Now, this mostly works entirely transparent; if all you use is the CPU to access PSRAM, you never have to worry about the cache; while there may be a discrepancy between what the CPU sees and what is actually in the PSRAM, the CPU can't access the PSRAM directly anyway, so who cares, right?
OK.
This changes when you start doing DMA from PSRAM. DMA accesses PSRAM directly, not through the cache. While this may initially sound stupid, there's a good reason for it: for instance, if you were to use PSRAM to store audio data to send to an I2S device and that would happen through the cache, the wave data would need to be read into the cache, only to be used exactly once, so the cache doesn't give you any real benefit... but at the same time, it would 'push out' data that probably would be useful to have in cache, so the net performance would be diminished by a lot.
OK.
So what you probably ran into is the following: you write a chunk of data that is many times the size of the cache. This means the cache fills up a few times, and when more than the size of the cache is written, every new write triggers a writeback of the old data. That happens all the way up to the final byte, at which state everything except what's currently in the cache (the last 5%, as you said?) is written back. Then you start up DMA, and all the data is read correctly... except for the last 5%, as it's still hanging around in the cache and never made it back to the PSRAM.
The problem is not quite that 5%, i already reduced the size of the buffer and it even got worse. Already decreased the clock of the i8080 bus to 1 Mhz and the problem persists.
So, as I said, Cache_WriteBack_Addr solves this, as it forces a writeback of all data in a certain range. You call it with the starting address and the size of the buffer you wrote, and then the cache goes to work and sees if there's anything in that range that needs to be written back. From what I can tell, the call is blocking, so after it returns you can be sure the data made it to the PSRAM.
I am using Cache_WriteBack_Addr() in the entire buffer after i write in the buffer.
Are you sure this function is blocking ?
The reason that all the calls in that file are marked as 'do not use in the SDK' is that caching functions generally are a bit finnicky with regards to interrupts and multicore operation... while writeback is pretty innocuous, you also have functions like 'clean' (act like a cache line never has been written to) and 'invalidate' (throw away a cache line without writing it back). You can probably imagine that those ops can lead to data loss if one core does them while the other core is working on something in the same memory region. We still need to figure out how we can make these caching functions more accessible (as they're pretty useful if you want to speed up operations). For now, feel free to assume at least the writeback function is safe.
OK. I am using this function.
Does that make it understandable why you're getting the results you're getting?
Yes. Thank you for the explanation.