BNO055 I2C Intermittent Problems
Posted: Wed Dec 13, 2017 7:24 pm
I have been fighting with an I2C based IMU unit, the BNO055 from Adafruit. After running into random issues reading results in an larger application I decided to create a test application and board just for the BNO055 object I'm writing. That has allowed me to zero in on two problems that seem to reoccur at random intervals. One seems like it may be external, while the other definitely seems to be internal to the Arduino I2C library interface.
The issue that i'm most concerned about is the second. After minutes to hours of stable operation, the reads to the device will begin to fail. I was finally able to add enough error checking and instrumentation to close on what was going wrong. My code checks the results of the register write and the bytes read and stores a read or write error code based on what went wrong.
The results indicate that my read() function sees a failure to read the number of bytes specified and returns an error code of -18 to indicate this. Looking at the bus with a logic analyzer shows no activity as SCL never falls from high to low. This also produces a condition where the bytes returned from read() are always 0xFF. As I have no code that would alter the bytes on error this must be happening in the library code.
Sometimes this condition can clear in a reset, but often it does not. Wire.reset() has no effect and writes nothing to the bus while this condition persists. Often, only a power cycle will clear this condition.
This is the output showing the moment the condition occurs:
Secs.... Err..Heading..Pitch..Roll.....Temp
The -18 is the error code return value being recorded in the IMU object which indicates Wire.requestFrom() did not get the number of bytes we requested. The sensor is read 20 times per second and prints occur every 5 seconds. It also prints out when the total delta of the IMU fields exceeds 1.0 degrees. That actually lets me see the moment the read fails with this line:
That shows the first error code coming back because the IMU object handled that in in it's update() method by zeroing the object data values, which then triggers the delta output.
At this point the logic analyzer shows no bus activity at all. The CLK line is never toggled again.
Now a reset is attempted:
Here you see the indications that the bus is dead. All reads are returning 0xFF for all bytes. This happens because the register get/set methods do not check for errors from sys_imu.read() before they return the result so that comes directly from the library. The -18 error code continues to be presented indicating incorrect number of bytes read by Wire.requestFrom()
After a power cycle the device may then go on to operate for anywhere from 10 minutes to 4 hours or more. It's been extremely frustrating to try and debug since I can never capture exactly the moment when the bus stops working and the moments preceeding that event. I have some ideas now though that I may try since I can detect this error. I can perhaps trigger the logic analyzer with a free GPIO used to flag detection of the error. That might give some insight, or it might show nothing more than everything working fine until it isn't.
After a power cycle and reset:
I'll post some of the code next.
The issue that i'm most concerned about is the second. After minutes to hours of stable operation, the reads to the device will begin to fail. I was finally able to add enough error checking and instrumentation to close on what was going wrong. My code checks the results of the register write and the bytes read and stores a read or write error code based on what went wrong.
The results indicate that my read() function sees a failure to read the number of bytes specified and returns an error code of -18 to indicate this. Looking at the bus with a logic analyzer shows no activity as SCL never falls from high to low. This also produces a condition where the bytes returned from read() are always 0xFF. As I have no code that would alter the bytes on error this must be happening in the library code.
Sometimes this condition can clear in a reset, but often it does not. Wire.reset() has no effect and writes nothing to the bus while this condition persists. Often, only a power cycle will clear this condition.
This is the output showing the moment the condition occurs:
Secs.... Err..Heading..Pitch..Roll.....Temp
Code: Select all
1394.59 0 H 346.81 P 0.94 R -0.19 T 28
1399.73 0 H 346.81 P 0.94 R -0.19 T 28
1404.87 0 H 346.81 P 0.94 R -0.19 T 28
1410.01 0 H 346.81 P 0.94 R -0.19 T 28
1415.15 0 H 346.81 P 0.94 R -0.19 T 28
1420.29 0 H 346.81 P 0.94 R -0.19 T 28
1425.43 0 H 346.81 P 0.94 R -0.19 T 28
1430.57 0 H 346.81 P 0.94 R -0.19 T 28
1434.53^ -18 H 0 P 0 R 0
1435.68 -18 H 0.00 P 0.00 R 0.00 T -1
1440.68 -18 H 0.00 P 0.00 R 0.00 T -1
1445.68 -18 H 0.00 P 0.00 R 0.00 T -1
1450.68 -18 H 0.00 P 0.00 R 0.00 T -1
1455.68 -18 H 0.00 P 0.00 R 0.00 T -1
1460.68 -18 H 0.00 P 0.00 R 0.00 T -1
1465.68 -18 H 0.00 P 0.00 R 0.00 T -1
1470.68 -18 H 0.00 P 0.00 R 0.00 T -1
1475.68 -18 H 0.00 P 0.00 R 0.00 T -1
Code: Select all
1434.53^ -18 H 0 P 0 R 0
At this point the logic analyzer shows no bus activity at all. The CLK line is never toggled again.
Now a reset is attempted:
Code: Select all
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:812
load:0x40078000,len:0
load:0x40078000,len:11392
entry 0x40078a9c
Bosh BNO055 test
Ping: 0
BNO055 Present: 0
Software Rev: 65535
Boot Rev: 255
Self Test: FF
Calibration: FF
Error Status: 255
System Status: 255
Units Register: FF
Data Source: 0
Operation Mode: 255
Start loop.
1.70 -18 H 0.00 P 0.00 R 0.00 T -1
6.70 -18 H 0.00 P 0.00 R 0.00 T -1
11.70 -18 H 0.00 P 0.00 R 0.00 T -1
16.70 -18 H 0.00 P 0.00 R 0.00 T -1
21.70 -18 H 0.00 P 0.00 R 0.00 T -1
26.70 -18 H 0.00 P 0.00 R 0.00 T -1
31.70 -18 H 0.00 P 0.00 R 0.00 T -1
After a power cycle the device may then go on to operate for anywhere from 10 minutes to 4 hours or more. It's been extremely frustrating to try and debug since I can never capture exactly the moment when the bus stops working and the moments preceeding that event. I have some ideas now though that I may try since I can detect this error. I can perhaps trigger the logic analyzer with a free GPIO used to flag detection of the error. That might give some insight, or it might show nothing more than everything working fine until it isn't.
After a power cycle and reset:
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:812
load:0x40078000,len:0
load:0x40078000,len:11392
entry 0x40078a9c
Ping: 1
BNO055 Present: 1
Software Rev: 785
Boot Rev: 21
Self Test: F
Calibration: 3C
Error Status: 0
System Status: 5
Units Register: 80
Data Source: 0
Operation Mode: 8
Start loop.
1.69 0 H 0.00 P 0.00 R 0.00 T 28
1.75^ 0 H 313 P 1 R 0
6.83 0 H 313.44 P 0.62 R -0.19 T 28