I2C corruption when using ESP CAN (IDFGH-3307)

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby PeterR » Tue May 19, 2020 6:07 pm

Moving i2c_driver_install() to core 1 seems to resolve/reduce the spurious GPIO issue. I cleaned core 1 of other stuff, would normally have SPI there.
I am going to clean build and confirm then see what I can add back onto core 1.

1) CAN overrun is a moderate issue. Disabling Ethernet improves/may even fix. I drop overrun frames ATM. My fix may not be perfect but (2) is much worse.
2) CAN frame corruption on termination resistor change is a major issue. Not found a way to detect this yet.
3) CAN RX locked after FATFS write is a major issue. Only noted when I have high frame rate. Following I always receive the same frame content. can_initiate_recovery() does not recover. Power off/on needed.
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby PeterR » Wed May 20, 2020 6:03 pm

Hi,
The I2C/GPIO button issue seems resolved by moving I2C ISR to core 1. I have also reenabled a GPIO interrupt (around 2KHz) on core 1 and the I2C GPIO readings stayed solid. The ISR is light weight though, just wakes a thread.
It is not clear if the I2C issue is an ESP or GPIO chip fault. One would imagine that with all the bit bashers out there that its not likely to be the GPIO and that this might be worth further investigation.
Another suggestion is that it may be an idea to look at the 4.1 Ethernet driver. I am clear that Ethernet is a cause of CAN overflows and allows the I2C issue to happen. As you say the ISR lock out time must be around 2mS to explain which seems excessive.

This leaves the IDF CAN corrections open. I did quickly try the Errata changes but to no effect. I will double check my code. May I ask when the CAN errata fixes will be made available?
Presently I cannot recover the CAN driver following a FLASH write. I also might stop receiving valid frames when CPU has been heavily loaded. The two sound similar and related to the Erratta .
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby PeterR » Thu May 21, 2020 11:58 pm

Hi,
Weak github force runs in me.
Please add links to the CAN driver ticket and repo branch such that I may follow the update. Would also be cool to understand the timetable.
I am attempting the errata hint but am hitting SJA amature issues (errors between screen & keyboard). If I can see where you are going then that would help me get the CAN driver of the floor.
For a production fix I'm sure that you might want to place can_ in front of hal_ then hal_ll. I just need a short term hack ;) So your register insights etc would help a lot. We only have one ESP CAN after all & I have my own high level abstraction for cross platform.
I can use your structured/finessed solution latter.
PM if you need. Just needing some love ATM.
EDIT:
1) Termination resistor issues fit within the errata definition. My counter report logic was wrong. So your errata fix should help
2) FLASH writes cause frame id & data corruptions. Some CAN issue posts discusse overruns (the errata discusses bit errors). So perhaps the overrun solution resolves but errata documentation is not clear. I would expect overruns when FLASHing as I am not sure that all driver logic is IRAM. so lets hope.
& I also believe that IDF CAN should be fixed.

Zevver
Posts: 14
Joined: Sun Oct 17, 2021 5:18 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby Zevver » Tue Dec 14, 2021 12:18 pm

Sorry to revive an old thread like this, but it seems that I am hit by this - or a very similar - issue.

Symptoms are incidentally "corrupted" reads from I2C peripherals while having CAN communication enabled in our application. Inspection of I2C bus shows clean signals, sharp edges, no problems there.

I can not find any references to IDFGH-3307 other then this very thread. My questions:

- Has this issue been fixed in idf, and if so, what version?
- If it has not been fixed, is there a officially recommend workaround for this?

Thank you very much,

ESP_Dazz
Posts: 308
Joined: Fri Jun 02, 2017 6:50 am

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby ESP_Dazz » Tue Dec 14, 2021 12:53 pm

The original issue in this thread was caused by the following chain of events:
  • I2C had stuck bus issues (see 680, 922, 2494. The stuck bus prevents the pending interrupts of other drivers (such as CAN) from running at the same time.
  • If a CAN interrupt does not in due time, its RX FIFO fills up. This case was previously not handled by the driver. Furthermore, there is an RX FIFO overrun errata that was also not handled.
The I2C stuck bus issues should all be fixed by now. Likewise, the CAN driver handles FIFO overruns and works around the HW errata as of this commit.

Zevver
Posts: 14
Joined: Sun Oct 17, 2021 5:18 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby Zevver » Tue Dec 14, 2021 12:59 pm

Thanks for the blazing fast reply!

The issues you pointed to all seem to be much older then idf 4.2, so I'd expect these to be fixed in this version already. Still, I'll try if I can get our app to run on the latest v4.3 though to see if this changes the behavior. I'll report back with results later.

Thanks!

Zevver
Posts: 14
Joined: Sun Oct 17, 2021 5:18 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby Zevver » Tue Dec 14, 2021 2:49 pm

We upgraded idf to v4.3.1, and after running for about one hour we unfortunately still see this happen two times. The problem is that I have not yet figured out how to consistently trigger this, and the very low frequency of occurrences makes it a /very/ slow operation to minimize and pinpoint the problem.

We poll an I2C device 10 times a second, and transfer about 50 bytes of data every 250 msec - all from one single thread. We see fault bytes returned on the reads only once or twice an hour. On the scope signals look good, although there is some consistent clock stretching happening on all the cases where the error happens (but there is also sometimes clock stretching when there is *no* problem with the data)

In the thread it is mentioned that the problem was fixed by installing the i2c driver on core1; I'd like to give this a try as well, how would I do this?

dizcza
Posts: 56
Joined: Tue Sep 07, 2021 6:59 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby dizcza » Tue Dec 14, 2021 5:54 pm

Zevver wrote:
Tue Dec 14, 2021 12:18 pm

Symptoms are incidentally "corrupted" reads from I2C peripherals while having CAN communication enabled in our application. Inspection of I2C bus shows clean signals, sharp edges, no problems there.
Could it also be related to this issue https://github.com/espressif/esp-idf/issues/7781?

Zevver
Posts: 14
Joined: Sun Oct 17, 2021 5:18 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby Zevver » Tue Dec 14, 2021 6:18 pm

Status update: I probably need to wait for another 24 hours test to be sure, but from the looks of it doing the `i2c_driver_install()` from a thread pinned to core 1 solved my problem.

If this is the case, I guess the original IDFGH-3307 has not actually been solved, and moving the ISR to another core is a viable workaround for a bug that still exists?

Not sure if this is related to #7781 yet, as I have not seen corrupt I2C transactions on the bus: the main symptom still only is that reads incidently result other data then was actually on the wire.

Zevver
Posts: 14
Joined: Sun Oct 17, 2021 5:18 pm

Re: I2C corruption when using ESP CAN (IDFGH-3307)

Postby Zevver » Wed Dec 15, 2021 9:10 am

The overnight test showed no more occurrences of the problem in the last 17 hours, so I consider the workaround sufficient for production. Conclusions so far:

- There is likely a bug in idf causing spurious I2C read corruption when used in conjunction with other peripherals, including at least CAN
- The problem surfaced for us in idf 4.2 but also shows in latest release idf 4.3.1
- I can make the problem go away by installing the i2c interrupt handler on core1
- The problem is hard to consistently reproduce or minimize, making it a PITA to pinpoint where the problem lies.

Who is online

Users browsing this forum: No registered users and 221 guests