EMAC and I2S interfering with each other?
Posted: Mon Nov 11, 2024 11:17 pm
I'm building a firmware for the ESP32 (specifically ESP32-WROOM-32D) to drive WS2812 LEDs (and similar).
I have a board on which the ESP32 is hooked up to a LAN8720 via RMII (external 50Mhz clock on GPIO0), and generally, all is well.
ESP-IDF version is v4.4.7 at the moment.
The board receives pixel data via ethernet and pushes it out of the I2S peripheral in LCD mode (much like the FastLED project). The I2S peripheral is supplied with data via DMA buffers that are cycled/refreshed on EOF interrupts. There are currently 3 DMA buffers in play (1 being drained, 1 full, 1 being replenished), so there is a certain amount of jitter buffering available.
The problem occurs when the board is dealing with a lot of pixels. The limit for the WS2812 protocol is roughly 800 pixels at 40fps, but we started seeing artefacts in the pixel data stream once we got to about 650-700px. It turns out that the data source starts transmitting the next frames-worth of pixel data at that point.
For a while, I assumed that the CPU was getting overloaded with receiving and processing the pixel data, but a deeper dive shows this isn't the case. I instrumented some functions (to toggle unused GPIO) and proved that the data stream corruption was occurring shortly before we were processing the incoming network packets. Instrumenting some functions within the emac driver reveals that the data corruption occurs just before the emac interrupt (emac_isr_default_handler) is fired.
As far as I'm aware, the 2 interrupts are executing on separate CPU cores (emac on CPU0, I2S on CPU1). Happy to check and confirm this if it helps, though.
This seems to imply that it's the PHY transferring the data to the EMAC that's causing the glitch. I know that this transfer is performed via DMA, so is there any chance that the transfer is interfering with the transfer of data to the I2S peripheral?
I noticed that receiving lots of small packets seemed to cause less problems than a few larger packets. I guessed that maybe the transfer was less disruptive if broken into smaller pieces, so I adjusted CONFIG_ETH_DMA_BUFFER_SIZE to 256. This does seem to resolve the issue with larger packets, but now I'm having issues with packet loss when receiving lots of small packets, so this isn't a great solution for me. In the final version of this, I need to be able to handle both scenarios for different incoming protocols. I have no control of the source of the ethernet data.
I have attached an image showing what I'm seeing - It's a screengrab from PulseView, as follows:
(And apologies if none of this makes any sense - More than happy to answer questions, etc )
I have a board on which the ESP32 is hooked up to a LAN8720 via RMII (external 50Mhz clock on GPIO0), and generally, all is well.
ESP-IDF version is v4.4.7 at the moment.
The board receives pixel data via ethernet and pushes it out of the I2S peripheral in LCD mode (much like the FastLED project). The I2S peripheral is supplied with data via DMA buffers that are cycled/refreshed on EOF interrupts. There are currently 3 DMA buffers in play (1 being drained, 1 full, 1 being replenished), so there is a certain amount of jitter buffering available.
The problem occurs when the board is dealing with a lot of pixels. The limit for the WS2812 protocol is roughly 800 pixels at 40fps, but we started seeing artefacts in the pixel data stream once we got to about 650-700px. It turns out that the data source starts transmitting the next frames-worth of pixel data at that point.
For a while, I assumed that the CPU was getting overloaded with receiving and processing the pixel data, but a deeper dive shows this isn't the case. I instrumented some functions (to toggle unused GPIO) and proved that the data stream corruption was occurring shortly before we were processing the incoming network packets. Instrumenting some functions within the emac driver reveals that the data corruption occurs just before the emac interrupt (emac_isr_default_handler) is fired.
As far as I'm aware, the 2 interrupts are executing on separate CPU cores (emac on CPU0, I2S on CPU1). Happy to check and confirm this if it helps, though.
This seems to imply that it's the PHY transferring the data to the EMAC that's causing the glitch. I know that this transfer is performed via DMA, so is there any chance that the transfer is interfering with the transfer of data to the I2S peripheral?
I noticed that receiving lots of small packets seemed to cause less problems than a few larger packets. I guessed that maybe the transfer was less disruptive if broken into smaller pieces, so I adjusted CONFIG_ETH_DMA_BUFFER_SIZE to 256. This does seem to resolve the issue with larger packets, but now I'm having issues with packet loss when receiving lots of small packets, so this isn't a great solution for me. In the final version of this, I need to be able to handle both scenarios for different incoming protocols. I have no control of the source of the ethernet data.
I have attached an image showing what I'm seeing - It's a screengrab from PulseView, as follows:
- The first line shows the emac interrupt activity - The GPIO calls are made at the start and end of the emac_isr_default_handler function in esp_eth_mac_esp.c
- The second line shows the emac_rx task activity - The GPIO calls are made in the function emac_esp32_rx_task. The line goes high just after the call to "ulTaskNotifyTake(pdTRUE, portMAX_DELAY)", and low after the while loop ( outside "} while (emac->frames_remain);")
- The third line is also in emac_esp32_rx_task, but the GPIO calls are wrapped around the call to emac_hal_receive_frame.
- The fourth line is the activity of the i2s interrupt. There is a little jitter on the interrupt behaviour, but nothing that should cause data stream issues.
- The fifth line is the WS2812 data stream, and below it is the PulseView decode. You can see I've added 2 arrows where the bitstream is weirdly elongated just before the emac interrupt fires - this data stream should be repeating perfectly at even intervals. The data stream content is intentionally all zeroes (no LED output), to make it easier to see the erroneous flashing. While PulseView still successfully decodes the data, these weirdly long bits corrupt the data stream for physical LED chips.
(And apologies if none of this makes any sense - More than happy to answer questions, etc )