GPIO interrupts lost (hardware race condition)

ybuyankin
Posts: 9
Joined: Wed May 30, 2018 9:13 pm

GPIO interrupts lost (hardware race condition)

Postby ybuyankin » Fri Apr 12, 2019 7:17 pm

Hi all,

I've found some serios problem with the GPIO interrupts. My hardware is generating 3 independent IRQs (about 200 per second each) on GPIO 34,36 & 39, configured as rising edge. Quite rarely (in a couple of hours or so) one of the IRQ signals is not getting latched in the GPIO IRQ status register and therefore does not generate the interrupt so this channel gets stuck.

Code uses the gpio_intr_service but that does not really matter. I've used the logic analyzer and a sync signal (some LED pin) generated in software to track this condition. I've found that it happens when new interrupt comes exactly (within ~100 ns) after I trigger the LED right before exiting the app IRQ handler, and next thing will be writing the corresponding GPIO bit to GPIO.out1_w1tc to clear the IRQ. Check the gpio_intr_service function in the esp-idf/components/driver/gpio.c:

Code: Select all

    do {
        if (gpio_num < 32) {
            if (gpio_intr_status & BIT(gpio_num)) { //gpio0-gpio31
                if (gpio_isr_func[gpio_num].fn != NULL) {
                    gpio_isr_func[gpio_num].fn(gpio_isr_func[gpio_num].args);
                }
                GPIO.status_w1tc = BIT(gpio_num);
            }
        } else {
            if (gpio_intr_status_h & BIT(gpio_num - 32)) {
                if (gpio_isr_func[gpio_num].fn != NULL) {
                    gpio_isr_func[gpio_num].fn(gpio_isr_func[gpio_num].args); // my handler is called here
                }
                GPIO.status1_w1tc.intr_st = BIT(gpio_num - 32); // that's the problematic write
            }
        }
    } while (++gpio_num < GPIO_PIN_COUNT);
That most likely means that writing into the GPIO.out1_w1tc to clear the previous IRQ causes race condition in hardware with another IRQ coming in at the same moment thus preventing the latter from latching in. This w1tc register definitely helps with software issues of the same kind but the hardware side of this thing is unfortunately still vulnerable.

It took me more than a week to actually figure this thing out (as stuff like this is the last thing you could really expect, one looks for signal integrity issues, own code bugs etc). There could be some workarounds depending on the actual hardware which uses the interrupts but this should be fixed in the silicon I guess.
Thanks,
--yuri

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: GPIO interrupts lost (hardware race condition)

Postby WiFive » Sat Apr 13, 2019 2:56 am

There is a new version of that function in master, it might help. Otherwise if your interrupts are going to be that close together you may want to use rmt to sample the line at 12.5ns and you can count the number of edges.

ybuyankin
Posts: 9
Joined: Wed May 30, 2018 9:13 pm

Re: GPIO interrupts lost (hardware race condition)

Postby ybuyankin » Mon Apr 15, 2019 1:57 pm

I've checked the master, but there was no related changes for this function, it was just optimized. It's a bit tricky to compensate for hardware bug in software, but I'll probably do that by comparing actual GPIO state before and after clearing the IRQ and rising an IRQ bit on difference.

For your second suggestion, actually interrupts are not that frequent at ~100-200 Hz, and I don't need to count or somehow measure them, but to fetch data from external source which generates the interrupt on data arrival. The problem arises when interrupt signal comes exactly at the same time as another channel interrupt request gets cleared. So it could happen to absolutely anyone having several independent GPIO IRQs regardless of interrupts rate, just on coincidence. That's really dangerous.
Thanks,
--yuri

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: GPIO interrupts lost (hardware race condition)

Postby WiFive » Tue Apr 16, 2019 1:22 am

ybuyankin wrote:
Mon Apr 15, 2019 1:57 pm
I've checked the master, but there was no related changes for this function, it was just optimized.
Well the w1tc is only written once now rather than once per set bit.

With edge interrupt type if the edge and the clear happen on the same clock cycle you could be correct. Using level interrupt could be a workaround.

costaud
Posts: 55
Joined: Wed Dec 30, 2015 5:09 pm

Re: GPIO interrupts lost (hardware race condition)

Postby costaud » Tue Apr 16, 2019 4:07 am

Does this only happen on GPIO34/36/39 ?

ESP_houwenxiang
Posts: 118
Joined: Tue Jun 26, 2018 3:09 am

Re: GPIO interrupts lost (hardware race condition)

Postby ESP_houwenxiang » Tue Apr 16, 2019 12:13 pm

Hi, ybuyankin,

There is a similar issue on github https://github.com/espressif/esp-idf/issues/3132, please refer to it and take a test. at the same time, we will try to reproduce this issue. Can you provide the code how you configure the GPIOs ?

thanks !!
wookooho

ESP_houwenxiang
Posts: 118
Joined: Tue Jun 26, 2018 3:09 am

Re: GPIO interrupts lost (hardware race condition)

Postby ESP_houwenxiang » Thu Apr 18, 2019 5:00 am

Hi,
Can you share your progress?

thanks !!
wookooho

ybuyankin
Posts: 9
Joined: Wed May 30, 2018 9:13 pm

Re: GPIO interrupts lost (hardware race condition)

Postby ybuyankin » Thu Apr 18, 2019 1:31 pm

First, I have managed to fix the problem. Thanks for your suggestions.

My original code used the basic gpio_install_isr_service / gpio_isr_handler_add combo. IRQs were set up in loop like

Code: Select all

gpio_set_direction(GPIO_Pin, GPIO_MODE_INPUT);    
gpio_set_intr_type(GPIO_Pin, GPIO_PIN_INTR_POSEDGE);
gpio_isr_handler_add(GPIO_Pin, my_gpio_isr_handler, (void*) ch);
I've got the latest gpio.c code modified in a way that could check the 0->1 transition and call the handler if it was not locked in the IRQ register. Also I I'm only using higher pins:

Code: Select all

static void IRAM_ATTR gpio_intr_service(void* arg)
{

    //read status to get interrupt status for GPIO0-31 - don't have any yet, need to add workarond as below if would have
/*    const uint32_t gpio_intr_status = GPIO.status;
    if (gpio_intr_status) {
        gpio_isr_loop(gpio_intr_status, 0);
        GPIO.status_w1tc = gpio_intr_status;
    }*/

    //read status1 to get interrupt status for GPIO32-39
    uint32_t gpio_intr_status_h = GPIO.status1.intr_st;
    if (gpio_intr_status_h) {
        gpio_isr_loop(gpio_intr_status_h, 32);
        uint32_t gpio_state_before = GPIO.in1.val;
        GPIO.status1_w1tc.intr_st = gpio_intr_status_h; // could case IRQ loss
        uint32_t gpio_state_after = GPIO.in1.val;
        uint32_t gpio_state = (gpio_state_before ^ gpio_state_after) & gpio_state_after; // anything new has arrived?
        gpio_intr_status_h = GPIO.status1.intr_st; //read the new IRQ state
        if(!(gpio_state & gpio_intr_status_h)) // was IRQ lost during status clear?
        {
            gpio_isr_loop(gpio_state, 32); // it's tempting to just set IRQ bit and leave but it could cause exactly the same problem again! so just call the handler function
        }
    }
}

I have to add that it does not actually work exactly as expected as I'm getting these software interrupts (I've also added some debug code to count them) much more often than the issue was happening before fix. Maybe because of the propagation delay in the IRQ pipeline or whatever. Anyway it does the job. I'll probably look deeper into that later.

Thanks,
--yuri

ESP_houwenxiang
Posts: 118
Joined: Tue Jun 26, 2018 3:09 am

Re: GPIO interrupts lost (hardware race condition)

Postby ESP_houwenxiang » Fri Apr 19, 2019 3:17 am

Hi, ybuyankin

Are you experiencing this problem in this situation and are you using dual core or single core?
Attachments
GPIOdebug.png
GPIOdebug.png (35.54 KiB) Viewed 13313 times
wookooho

ESP_houwenxiang
Posts: 118
Joined: Tue Jun 26, 2018 3:09 am

Re: GPIO interrupts lost (hardware race condition)

Postby ESP_houwenxiang » Tue May 21, 2019 1:10 pm

Hi, ybuyankin

We can use level type interrupt to walk around this issue.

Code: Select all

const uint64_t INPUT_PIIN_MASK = (1ULL<<GPIO_NUM_34) | (1ULL<<GPIO_NUM_36) | (1ULL<<GPIO_NUM_39);

typedef struct io_isr_param_ {
    uint32_t io_tagle;
    uint32_t io_number;
    uint32_t isr_cnt;
} io_isr_param_t;

uint32_t isr_mode[2] = {GPIO_INTR_HIGH_LEVEL, GPIO_INTR_LOW_LEVEL};

io_isr_param_t io34_param = {
    .io_tagle = 0x1,
    .io_number = 34,
};

io_isr_param_t io36_param = {
    .io_tagle = 0x1,
    .io_number = 36,
};

io_isr_param_t io39_param = {
    .io_tagle = 0x1,
    .io_number = 39,
};

IRAM_ATTR void gpio_isr(void *param)
{
    io_isr_param_t *io_param = (io_isr_param_t *)param;
    if(io_param->io_tagle) {
        //do some thing here;
    }
    gpio_set_intr_type(io_param->io_number, isr_mode[io_param->io_tagle]);
    io_param->io_tagle ^= 0x1;
}

void io_init()
{
    gpio_config_t io_conf = {
        .intr_type = GPIO_PIN_INTR_DISABLE,
        .mode = GPIO_MODE_INPUT,
        .pin_bit_mask = INPUT_PIIN_MASK,
        .pull_down_en = 0,
        .pull_up_en = 1,
    };
    //configure GPIO with the given settings
    gpio_config(&io_conf);
    if(gpio_install_isr_service( 0 ) != ESP_OK) {
        printf("ISR install fail\n");
        return;
    }
    //hook isr handler for specific gpio pin
    gpio_isr_handler_add(GPIO_NUM_34, gpio_isr, (void*) &io34_param);
    gpio_isr_handler_add(GPIO_NUM_36, gpio_isr, (void*) &io36_param);
    gpio_isr_handler_add(GPIO_NUM_39, gpio_isr, (void*) &io39_param);
    gpio_set_intr_type(GPIO_NUM_34, GPIO_INTR_HIGH_LEVEL);
    gpio_intr_enable(GPIO_NUM_34);
    gpio_set_intr_type(GPIO_NUM_36, GPIO_INTR_HIGH_LEVEL);
    gpio_intr_enable(GPIO_NUM_36);
    gpio_set_intr_type(GPIO_NUM_39, GPIO_INTR_HIGH_LEVEL);
    gpio_intr_enable(GPIO_NUM_39);
}
thanks !!
wookooho

Who is online

Users browsing this forum: No registered users and 116 guests