ESP32-C3 Persistent Data Corruption in Custom Serial Protocol
Posted: Wed Apr 10, 2024 12:42 pm
Hello,
I have been struggling for a few weeks with the implementation of a custom serial protocol on the ESP32-C3. Unfortunately, I'm still experiencing packet loss and data corruption, which I suspect are caused by jitter in interrupts. I do not know how to proceed further.
Bit timing: 104us per bit, with no more than 6 consecutive bits in row.
Baud rate: 9600bps.
Despite the low baud rate, I can't achieve consistent results, which is puzzling, especially when I see the protocol running with zero fail rate on a 16MHz ATmega32 but not on the 160MHz ESP32-C3, even with WiFi disabled.
Attempts to resolve the issue:
Attempt 1:
Attached an interrupt for PIN change on RX, set the HW timer ISR to 104uS.
When PIN change interupt occur set the timer counter to 52us to capture the RX state in the middle of the pulse and continue sampling at 104us intervals until the next PIN change interrupt.
Result: Works perfectly on ATmega but results in over 50% loss on ESP due to massive jitter. Totally useless when WiFi and SPIFFS are active.[
Attempt 2:
Similar to Attempt 1, but timer interrupts were moved into IRAM, all variables included, with priority set to LEVEL3 and low-level access and direct registry manipulation implemented.
Result: Significant improvement, fail rate dropped to 15% "only".
Attempt 3:
Enhanced Attempt 2 with dynamic compensation for timing based on the delay of the timer interrupt relative to the expected interval.
Result: further improvement, fail rate dropped below 10%.
Attempt 4:
Relied solely on PIN change interrupts while counting the time elapsed between two interrupts and dividing by the timing (97us works best).
Used a timer to handle the end of the frame.
Result: Achieved a fail rate of 0.4% with WiFi ON, but still 0.14% loss with no other activities on the ESP.
ATmega runs Solution 1 with zero loss even while forwaring all data over UART.
Why is this happening and what can be done to resolve this?
I've already moved most of data handling out of the ISR into a queue for processing and I see no additional modifications to enhance performance. I already lost the possibility to reply ACK to the message because of this async processing.
It is frustrating, especially considering the low complexity of the protocol from the early '90s and 160mhz clock of ESP32-C3...
I have been struggling for a few weeks with the implementation of a custom serial protocol on the ESP32-C3. Unfortunately, I'm still experiencing packet loss and data corruption, which I suspect are caused by jitter in interrupts. I do not know how to proceed further.
Bit timing: 104us per bit, with no more than 6 consecutive bits in row.
Baud rate: 9600bps.
Despite the low baud rate, I can't achieve consistent results, which is puzzling, especially when I see the protocol running with zero fail rate on a 16MHz ATmega32 but not on the 160MHz ESP32-C3, even with WiFi disabled.
Attempts to resolve the issue:
Attempt 1:
Attached an interrupt for PIN change on RX, set the HW timer ISR to 104uS.
When PIN change interupt occur set the timer counter to 52us to capture the RX state in the middle of the pulse and continue sampling at 104us intervals until the next PIN change interrupt.
Result: Works perfectly on ATmega but results in over 50% loss on ESP due to massive jitter. Totally useless when WiFi and SPIFFS are active.[
Attempt 2:
Similar to Attempt 1, but timer interrupts were moved into IRAM, all variables included, with priority set to LEVEL3 and low-level access and direct registry manipulation implemented.
Result: Significant improvement, fail rate dropped to 15% "only".
Attempt 3:
Enhanced Attempt 2 with dynamic compensation for timing based on the delay of the timer interrupt relative to the expected interval.
Result: further improvement, fail rate dropped below 10%.
Attempt 4:
Relied solely on PIN change interrupts while counting the time elapsed between two interrupts and dividing by the timing (97us works best).
Used a timer to handle the end of the frame.
Result: Achieved a fail rate of 0.4% with WiFi ON, but still 0.14% loss with no other activities on the ESP.
ATmega runs Solution 1 with zero loss even while forwaring all data over UART.
Why is this happening and what can be done to resolve this?
I've already moved most of data handling out of the ISR into a queue for processing and I see no additional modifications to enhance performance. I already lost the possibility to reply ACK to the message because of this async processing.
It is frustrating, especially considering the low complexity of the protocol from the early '90s and 160mhz clock of ESP32-C3...