ESP32 IDF CAN issues

PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

ESP32 IDF CAN issues

Postby PeterR » Sat May 16, 2020 6:05 pm

I have receieved reports that our ESP32 product's CAN fault tolerence is significantly worse than other devices.

I have performed some basic tests & find that when faced with a marginal CAN bus the ESP is likely to reject more frames (refuses frames which other devices accept) but also allows frames which other devices reject and/or the ESP corrupts a valid frame the end result being that invalid data enters the ESP application.

CAN overflows are not presently handled in the IDF. An overflow typically results in 0x88 in the frames last data byte. The overflow flag must be manually reset if you are to detect the condition and ignore the bad frame (else once you have had one overflow you will ignore all frames).
It appears that you may recover with:

Code: Select all

can_ll_set_cmd_clear_data_overrun(can_context.dev);
Next & particularly when receiving high frame rate: If you add/remove the CAN termination resistor as if you had a loose CAN connector in a car driving along an English road (bumps n full of holes) then you may quite 'reliably' cause spurious data to appear in the 'valid' frames passed to you by the IDF.
In my IDF 4.1 build I see 0x00 0x35 0x40 (EDIT: Corruptions) but at random parts of the frame such that in some frames I may only have 0x00 or 0x40.
In an earlier IDF build I have seen other values but lacked the overrun patch above. I am fairly certain that the pattern is IDF/hardware determined as I have made quite a few diagnostic changes and the pattern remains the same. Not random pointer stuff.

Third party CAN bus monitoring systems did not spot the eronous data - all values were as expected EDIT: Third party monitors do not show corrupt frames - its not clear if this is a corrupt frame getting through or a reaction of the ESP/IDF to an earlier corrupt frame. The third party monitors (& I have quite a few types) reported lower frame error counts than the ESP. The ESP seemed periodically 'phased' and blanked out whilst others continued. I don't want too go to deep on that point as the main goal is just to have valid frames.

Late the other night I found a cryptic note in the CAN driver along the lines that there were issues in dealing with overrun because of the hardware. Forgot to add a reference..... but sure your ticket system will take you there.

So the questions is - how does one reliably receive CAN frames from the ESP?
Failing that what triggers would you suggest? 0x00 0x35 0x40 must mean something to someone.

PS Same result on an EVB.
& I also believe that IDF CAN should be fixed.

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: ESP32 IDF CAN issues

Postby WiFive » Sun May 17, 2020 7:46 am


PeterR
Posts: 621
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 IDF CAN issues

Postby PeterR » Sun May 17, 2020 11:10 am

Thanks, interesting. Couple of points:
1) Frame corruptions are possible even without overrun
2) Not all overruns cause a frame corruption
3) Enabling Ethernet increases the overrun rate

Wiggling the termination resistor causes the corrupt frames I mentioned. I do not always see overruns when I get the termination resistor frame error(s) however. Third party bus monitors do not see the invalid frame.

Did anyone find the overrun causes? An overrun must involve an excessive critical section and/or another interrupt blocking. I get the feeling that the ISR block is also the cause of other issues. I am seeing >2mS time outs including I2C transaction clock stretching. I also have I2C frame corruptions.

ESP: How does one escalate an issue? A CAN driver which returns corrupt frames is not fit for purpose. Similarly an I2C driver which returns invalid results will not pass muster. Both issues seems related to 4.1 Ethernet changes. The termination resistor issue is long standing, is different to overrun and does not appear to be related to Ethernet.
Of these I would prefer tackling the root cause of the overrun (Ethernet??) as I believe that it is implicated in I2C corruptions.
I did not get I2C corruptions in v4.0-dev-562-g2b301f53e & CAN overflow is very very much reduced. If I turn Ethernet off in 4.1 then overflow is very very much reduced & I believe I2C corruption are also removed or very much reduced.
Now I suppose I could downgrade but without knowing root cause I may just be reducing frequency.
& I also believe that IDF CAN should be fixed.

Who is online

Users browsing this forum: snutw_, Vitalii_Bondarenko and 84 guests