How to reduce CPU load by tuning I2S DMA buffers

drewbharris
Posts: 1
Joined: Tue Dec 05, 2023 1:55 pm

How to reduce CPU load by tuning I2S DMA buffers

Postby drewbharris » Tue Dec 05, 2023 2:15 pm

Hello,

I'm building a polyphonic synth and am hitting CPU constraints. I haven't dug into optimizing my DSP yet as I expected to be able to increase my buffer size to avoid the problem, given that my current buffers sizes are giving me really good latency (better than I need, in fact - I have probably 6ms of latency headroom). I'm using DMA buffer settings of count 2, size 32, and doing my DSP in a loop with buffer size 32 samples. I'm able to do 6 voices with my DSP but when I try to do 7, the audio breaks up.

I've read enough that I thought that just increasing the buffer size would solve this problem - if I double the buffers, to say 10 and 32, what I'm seeing is that the CPU is never blocked by i2s_write, the watchdog timer complains and I'm getting a different periodic audio breakup (though it does sound different from the buffer too low issue). I haven't performance profiled my code yet, but I sort of expected that just increasing the DMA buffer sizes would "work". I also tried adjusting my own loop buffer size - BUFFER_LENGTH - but it didn't really seem o make much of a difference - given that the watchdog timer isn't firing, it seems like I'm spending 100% of my time in the loop and am never blocked on i2s_write.

Here's the important part of my code:
  1. #define BUFFER_LENGTH (32)
  2. #define NUM_VOICES (8)
  3.  
  4. void init_i2s() {
  5.   i2s_port_t i2s_port = (i2s_port_t)0;
  6.  
  7.   static const i2s_config_t i2s_config = {
  8.       .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
  9.       .sample_rate = 44100,
  10.       .bits_per_sample = (i2s_bits_per_sample_t)16,
  11.       .channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
  12.       .communication_format = I2S_COMM_FORMAT_STAND_MSB,
  13.       .intr_alloc_flags = 0, // default interrupt priority
  14.       .dma_buf_count = 2,
  15.       .dma_buf_len = 32,
  16.       .use_apll = true};
  17.  
  18.   i2s_driver_install(i2s_port, &i2s_config, 0, NULL);
  19.  
  20.   static const i2s_pin_config_t pin_config = {.bck_io_num = GPIO_NUM_4,
  21.                                               .ws_io_num = GPIO_NUM_5,
  22.                                               .data_out_num = GPIO_NUM_18,
  23.                                               .data_in_num = I2S_PIN_NO_CHANGE};
  24.  
  25.   i2s_set_pin(i2s_port, &pin_config);
  26. }
  27.  
  28. void i2s_task(void *data) {
  29.   while (1) {
  30.     int16_t buffer[BUFFER_LENGTH];
  31.  
  32.     for (unsigned int j = 0; j < BUFFER_LENGTH; j += 2) {
  33.       int16_t summed = 0;
  34.  
  35.       for (unsigned int k = 0; k < NUM_VOICES; k++) {
  36.         int16_t current_sample = voices[k].Process();
  37.         summed += (current_sample / 5);
  38.       }
  39.  
  40.       buffer[j] = summed;
  41.       buffer[j + 1] = summed;
  42.     }
  43.  
  44.     size_t i2s_bytes_write = 0;
  45.     i2s_write((i2s_port_t)0, buffer, sizeof(buffer), &i2s_bytes_write, 100);
  46.   }
  47. }
  48.  
  49. extern "C" void app_main() {
  50.   init_i2s();
  51.   xTaskCreatePinnedToCore(i2s_task, "i2s_task", 4096, NULL, 19, NULL, 1);
  52. }
What's the best way to approach this problem? Thanks!

ESP_Sprite
Posts: 9729
Joined: Thu Nov 26, 2015 4:08 am

Re: How to reduce CPU load by tuning I2S DMA buffers

Postby ESP_Sprite » Wed Dec 06, 2023 12:38 am

The I2S driver needs to handle an interrupt at the end of every DMA buffer - rather than increase the amount of buffers, you likely want to increase the buffer size.

MicroController
Posts: 1706
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: How to reduce CPU load by tuning I2S DMA buffers

Postby MicroController » Wed Dec 06, 2023 1:32 pm

On another note (no pun :D ),

Code: Select all

int16_t current_sample = voices[k].Process();
summed += (current_sample / 5)
is likely less efficient than it could be: For each of the 44k samples every second this makes one call to every voice's Process(); and the fact that Process() does not take any arguments indicates that every voice also needs to update its internal state once per sample. You might save a lot of CPU time by changing Process() to AddTo(int32_t* sampleBuffer, uint32_t numSamples), or AddTo(uint32_t currentSampleIndex, int32_t* sampleBuffer, uint32_t numSamples).
And, obviously(?), avoid using floating point operations or, worse, trigonometry functions (looking at you, sin(...)) in sample generation.

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 87 guests