ESP32-S3 - esp_async_memcpy not working with PSRAM using GDMA

NZ Gangsta
Posts: 16
Joined: Wed Jul 20, 2022 8:32 am

Re: ESP32-S3 - esp_async_memcpy not working with PSRAM using GDMA

Postby NZ Gangsta » Tue Nov 21, 2023 9:49 pm

MicroController wrote:
Tue Nov 21, 2023 3:17 pm
useful benchmark of different memcpy techniques.
For anyone who might find this interesting, the code we built for this is on GitHub here and the benchmarks are below. Works with an ESP32-S3-WROOM-1U-N8R8 (Octal SPI PSRAM). Should work with any configuration of the ESP32-S3 with external RAM but the PSRAM results will depend on the width of the SPI bus to the RAM.
  1. I (394) Memory Copy:
  2.  
  3. memory copy version 1.
  4.  
  5. I (404) Memory Copy: Allocating 2 x 100kb in IRAM, alignment: 32 bytes
  6. I (464) Memory Copy: 8-bit for loop copy IRAM->IRAM took 819922 CPU cycles = 28.59 MB/s
  7. I (514) Memory Copy: 16-bit for loop copy IRAM->IRAM took 205776 CPU cycles = 113.90 MB/s
  8. I (564) Memory Copy: 32-bit for loop copy IRAM->IRAM took 103383 CPU cycles = 226.71 MB/s
  9. I (614) Memory Copy: 64-bit for loop copy IRAM->IRAM took 77682 CPU cycles = 301.71 MB/s
  10. I (664) Memory Copy: memcpy IRAM->IRAM took 64323 CPU cycles = 364.37 MB/s
  11. I (714) Memory Copy: async_memcpy IRAM->IRAM took 408520 CPU cycles = 57.37 MB/s
  12. I (764) Memory Copy: PIE 128-bit (16 byte loop) IRAM->IRAM took 19498 CPU cycles = 1202.05 MB/s
  13. I (814) Memory Copy: PIE 128-bit (32 byte loop) IRAM->IRAM took 13095 CPU cycles = 1789.81 MB/s
  14. I (864) Memory Copy: DSP AES3 IRAM->IRAM took 15813 CPU cycles = 1482.17 MB/s
  15.  
  16. I (914) Memory Copy: Freeing 100kb from IRAM
  17. I (914) Memory Copy: Allocating 100kb in PSRAM, alignment: 32 bytes
  18. I (964) Memory Copy: 8-bit for loop copy IRAM->PSRAM took 1075498 CPU cycles = 21.79 MB/s
  19. I (1014) Memory Copy: 16-bit for loop copy IRAM->PSRAM took 461778 CPU cycles = 50.75 MB/s
  20. I (1064) Memory Copy: 32-bit for loop copy IRAM->PSRAM took 404325 CPU cycles = 57.97 MB/s
  21. I (1114) Memory Copy: 64-bit for loop copy IRAM->PSRAM took 413871 CPU cycles = 56.63 MB/s
  22. I (1164) Memory Copy: memcpy IRAM->PSRAM took 413294 CPU cycles = 56.71 MB/s
  23. I (1214) Memory Copy: async_memcpy IRAM->PSRAM took 465457 CPU cycles = 50.35 MB/s
  24. I (1264) Memory Copy: PIE 128-bit (16 byte loop) IRAM->PSRAM took 403440 CPU cycles = 58.09 MB/s
  25. I (1314) Memory Copy: PIE 128-bit (32 byte loop) IRAM->PSRAM took 403638 CPU cycles = 58.07 MB/s
  26. I (1364) Memory Copy: DSP AES3 IRAM->PSRAM took 405830 CPU cycles = 57.75 MB/s
  27.  
  28. I (1414) Memory Copy: Swapping source and destination buffers
  29. I (1464) Memory Copy: 8-bit for loop copy PSRAM->IRAM took 1037131 CPU cycles = 22.60 MB/s
  30. I (1514) Memory Copy: 16-bit for loop copy PSRAM->IRAM took 621710 CPU cycles = 37.70 MB/s
  31. I (1564) Memory Copy: 32-bit for loop copy PSRAM->IRAM took 603621 CPU cycles = 38.83 MB/s
  32. I (1614) Memory Copy: 64-bit for loop copy PSRAM->IRAM took 603466 CPU cycles = 38.84 MB/s
  33. I (1664) Memory Copy: memcpy PSRAM->IRAM took 605957 CPU cycles = 38.68 MB/s
  34. I (1714) Memory Copy: async_memcpy PSRAM->IRAM took 447733 CPU cycles = 52.35 MB/s
  35. I (1764) Memory Copy: PIE 128-bit (16 byte loop) PSRAM->IRAM took 605494 CPU cycles = 38.71 MB/s
  36. I (1814) Memory Copy: PIE 128-bit (32 byte loop) PSRAM->IRAM took 605790 CPU cycles = 38.69 MB/s
  37. I (1864) Memory Copy: DSP AES3 PSRAM->IRAM took 607982 CPU cycles = 38.55 MB/s
  38.  
  39. I (1914) Memory Copy: Freeing 100kb from IRAM
  40. I (1914) Memory Copy: Allocating 100kb in PSRAM, alignment: 32 bytes
  41. I (1974) Memory Copy: 8-bit for loop copy PSRAM->PSRAM took 1412578 CPU cycles = 16.59 MB/s
  42. I (2034) Memory Copy: 16-bit for loop copy PSRAM->PSRAM took 1052370 CPU cycles = 22.27 MB/s
  43. I (2094) Memory Copy: 32-bit for loop copy PSRAM->PSRAM took 1046370 CPU cycles = 22.40 MB/s
  44. I (2154) Memory Copy: 64-bit for loop copy PSRAM->PSRAM took 1046215 CPU cycles = 22.40 MB/s
  45. I (2214) Memory Copy: memcpy PSRAM->PSRAM took 1045637 CPU cycles = 22.41 MB/s
  46. I (2274) Memory Copy: async_memcpy PSRAM->PSRAM took 887275 CPU cycles = 26.42 MB/s
  47. I (2334) Memory Copy: PIE 128-bit (16 byte loop) PSRAM->PSRAM took 1054866 CPU cycles = 22.22 MB/s
  48. I (2394) Memory Copy: PIE 128-bit (32 byte loop) PSRAM->PSRAM took 1053534 CPU cycles = 22.25 MB/s
  49. I (2454) Memory Copy: DSP AES3 PSRAM->PSRAM took 1055945 CPU cycles = 22.20 MB/s
  50. I (2504) main_task: Returned from app_main()

DrMickeyLauer
Posts: 163
Joined: Sun May 22, 2022 2:42 pm

Re: ESP32-S3 - esp_async_memcpy not working with PSRAM using GDMA

Postby DrMickeyLauer » Thu Mar 07, 2024 7:22 am

@Microcontroller: Now this is a crazy thread with lots of deep Xtensa knowledge, I must confess I didn‘t get all of it, but I really would love to see your C++ SIMD library when ready! I love the integrated ASM approach you‘re using. Good speed.

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 311 guests