GPIO interrupts lost (hardware race condition)
Posted: Fri Apr 12, 2019 7:17 pm
Hi all,
I've found some serios problem with the GPIO interrupts. My hardware is generating 3 independent IRQs (about 200 per second each) on GPIO 34,36 & 39, configured as rising edge. Quite rarely (in a couple of hours or so) one of the IRQ signals is not getting latched in the GPIO IRQ status register and therefore does not generate the interrupt so this channel gets stuck.
Code uses the gpio_intr_service but that does not really matter. I've used the logic analyzer and a sync signal (some LED pin) generated in software to track this condition. I've found that it happens when new interrupt comes exactly (within ~100 ns) after I trigger the LED right before exiting the app IRQ handler, and next thing will be writing the corresponding GPIO bit to GPIO.out1_w1tc to clear the IRQ. Check the gpio_intr_service function in the esp-idf/components/driver/gpio.c:
That most likely means that writing into the GPIO.out1_w1tc to clear the previous IRQ causes race condition in hardware with another IRQ coming in at the same moment thus preventing the latter from latching in. This w1tc register definitely helps with software issues of the same kind but the hardware side of this thing is unfortunately still vulnerable.
It took me more than a week to actually figure this thing out (as stuff like this is the last thing you could really expect, one looks for signal integrity issues, own code bugs etc). There could be some workarounds depending on the actual hardware which uses the interrupts but this should be fixed in the silicon I guess.
I've found some serios problem with the GPIO interrupts. My hardware is generating 3 independent IRQs (about 200 per second each) on GPIO 34,36 & 39, configured as rising edge. Quite rarely (in a couple of hours or so) one of the IRQ signals is not getting latched in the GPIO IRQ status register and therefore does not generate the interrupt so this channel gets stuck.
Code uses the gpio_intr_service but that does not really matter. I've used the logic analyzer and a sync signal (some LED pin) generated in software to track this condition. I've found that it happens when new interrupt comes exactly (within ~100 ns) after I trigger the LED right before exiting the app IRQ handler, and next thing will be writing the corresponding GPIO bit to GPIO.out1_w1tc to clear the IRQ. Check the gpio_intr_service function in the esp-idf/components/driver/gpio.c:
Code: Select all
do {
if (gpio_num < 32) {
if (gpio_intr_status & BIT(gpio_num)) { //gpio0-gpio31
if (gpio_isr_func[gpio_num].fn != NULL) {
gpio_isr_func[gpio_num].fn(gpio_isr_func[gpio_num].args);
}
GPIO.status_w1tc = BIT(gpio_num);
}
} else {
if (gpio_intr_status_h & BIT(gpio_num - 32)) {
if (gpio_isr_func[gpio_num].fn != NULL) {
gpio_isr_func[gpio_num].fn(gpio_isr_func[gpio_num].args); // my handler is called here
}
GPIO.status1_w1tc.intr_st = BIT(gpio_num - 32); // that's the problematic write
}
}
} while (++gpio_num < GPIO_PIN_COUNT);
It took me more than a week to actually figure this thing out (as stuff like this is the last thing you could really expect, one looks for signal integrity issues, own code bugs etc). There could be some workarounds depending on the actual hardware which uses the interrupts but this should be fixed in the silicon I guess.