ESP32 Forum

Posted: **Tue Aug 27, 2019 6:50 am**

I'd like to get feedback on an attempt to pull audio of a single MEMS I2S microphone on one channel and run it through the ESP-DSP FFT library...

The MEMS microphone produces I2S digital audio - 32bit words per channel, 24 bits of MSB aligned data per word, of which 18bits are significant. Data on non-connected channels are tri-state - adding a pull down resistor to the data line can clear up the unconected channel, which makes it easier to determine whats what.

Here's the I2S config:

Code: Select all

const i2s_config_t i2s_config = {
                .mode = i2s_mode_t( I2S_MODE_MASTER | I2S_MODE_RX ),    // Receive as Master
                .sample_rate = 22050,                         // 22kHz is equivalent to 11kHz maximum audio frequence, re Nyquist
                .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,    // 32 clock pulses per channel
                .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,    // Only capture the LHS channel
                .communication_format = i2s_comm_format_t( I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB ), // Philips format
                .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,     // Interrupt level 1
                .dma_buf_count = 9,                         // (9-1)*256 = 2048 samples per read (1 buffer in use and unavailable during read)
                .dma_buf_len = 256,                              //
                .use_apll = false,                              //
                .tx_desc_auto_clear = false,          //
                .fixed_mclk = 0 };

The I2S appears to be acting as one would expect, with values increasing in amplitude with sound pressure. I read off the uint32_t values and right shift them 14 bits and cast them into an int with 16 bit range. To this end I decide to try the ESP-DSP sc16 FFT following the DSP 32 bit float FFT basic_math demo logic. My FFT code looks like this:

Code: Select all

	// Create the complex vector
            for ( int i = 0; i < 2048; i++ ) 
            {
                int val = ( *(uint16_t*) &m_samples [ i ] );  // m_samples are the raw 32 bit words
                y_cf [ i * 2 + 0 ] = val;    // Not using windowing, but is in the basic math demo
                y_cf [ i * 2 + 1 ] = 0;
            }
            
            
            dsps_fft2r_sc16( data, length );			// The FFT calc
             dsps_bit_rev_sc16_ansi( data, length );		// Bit reverse used in the 32  - Do not know what this does or why       
            dsps_cplx2reC_sc16( data, length );	// Convert one complex vector to two complex vectors - Do not know what this does or why

From here I sum the y_cf array into 1024 processed values (half of the 2048 raw data vales) using the formula in the demo where the processed value for bucket i is:

Code: Select all

10 * log10f(
              ( y_cf [ i * 2 + 0 ] * y_cf [ i * 2 + 0 ]
               + y_cf [ i * 2 + 1 ] * y_cf [ i * 2 + 1 ] ) / 2048) )

Looking at the data that results here, the value ranges are between 0 and 25, with the mean being around 15, but if I supply sound pressure, I'm not seeing any variation. As the raw data from I2S does vary I'm wondering if my use of the FFT is flawed.

Any thoughts appreciated!

ESP32 Forum

MEMS I2S Audio and ESP-DSP FFT

MEMS I2S Audio and ESP-DSP FFT