Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed orientation sensor communication no longer hangs/reboots system #166

Merged
merged 1 commit into from
Nov 6, 2020

Conversation

BjarneBitscrambler
Copy link
Contributor

Summary

Issue #151 dealt with a failed BME280 sensor causing the system to fail to start. A similar situation exists with the NXP FXOS8700 / FXAS21002 combination orientation sensor: if I2C communications with the sensor fails, the system hangs. It is preferred that the SensESP program continues to run, to avoid halting data from possible other attached sensors.

Changes Made

Only two files have changed: sensor_NXP_FXOS8700_FXAS21002.h and sensor_NXP_FXOS8700_FXAS21002.cpp

  • SensorNXP_FXOS8700_FXAS21002::connect() changed so failure to init calibration helper (which retrieves cal values stored in EEPROM) does not halt program, but rather allows it to continue with warnings displayed on serial console.
  • `SensorNXP_FXOS8700_FXAS21002::initSensors() now checks to see whether communications with the sensor have previously been working. If so, it merely 'pings' the sensors to confirm that comms still work. If comms have previously been reported as failed, or if they have never been initialized, then initSensors() tries to initialize them. In either case, it returns true if comms are currently working, and false if not.
  • SensorNXP_FXOS8700_FXAS21002::gatherOrientationDataOnce() always calls initSensors() before attempting to retrieve data from the sensors, and skips the data retrieval if comms have failed.

New Program Flow

Each time data collection is attempted from the orientation sensor, comms are checked. A failure is flagged, which causes future data retrievals to be skipped until the failure condition is corrected. Data retrieval then continues normally until the next comms failure.

Testing Performed

  • Disconnecting the orientation sensor's I2C SCL line or VCC power before the system is powered up does not hang or reboot the system. Instead, unchanging orientation values are conveyed in the SignalK traffic, and the serial console displays a message that the orientation data are invalid. Other SignalK parameters (e.g. uptime) continue to update at the expected rate.
  • Attaching the orientation sensor any time afterwards causes the orientation data to start refreshing at the expected rate, and the serial console warning ceases.
  • Disconnecting the sensor any time during normal operation freezes the present orientation data and starts the console warnings.
  • Checking the SignalK sensesp.freemem during repeated disconnects/reconnects does not seem to show any memory leak: it is a little difficult to be absolutely certain as the freemem count bounces around a bit, but a dozen simulated failures over several minutes didn't show any unexpected behaviour.

Future Possible Work

Does SignalK define a way to flag invalid data? Currently the orientation data freeze at whatever value they had just prior to the sensor comms failure, and this might not be noticed for a while. I imagine a SignalK alert could be sent out...

@BjarneBitscrambler
Copy link
Contributor Author

@ba58smith - since you brought up the issue, and fixed it in the BME280 code, would you mind reviewing my orientation sensor fixes for the same problem? Thanks a bunch!

@ba58smith
Copy link
Collaborator

It appears that your code is doing what your EXCELLENT description says. Since you've tested it, and it works as expected - please resolve the one TODO comment above, and I'll merge it. Nice work. I need to start describing my PR's as well as you do.

@BjarneBitscrambler
Copy link
Contributor Author

@ba58smith - back to you. The TODO is DONE, and there are no adverse effects.

Thanks for your kind words re: documentation. I've noticed over the years that my memory is short enough that when I return to something a few months later I start cussing my past self if I haven't written stuff down :-) That, plus my work in the field of medical device design, motivates me to try for thoroughness...

Copy link
Collaborator

@mairas mairas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file is related to the PR topic?

@mairas
Copy link
Collaborator

mairas commented Sep 13, 2020

Please remove the first three unrelated commits from the PR.

You can squash the extra away by interactive rebasing (i.e. history editing), git rebase -i on the command line or (probably easier) by using some graphical user interface tool. If that proves difficult, you can clone a fresh copy of the repo and reapply the new changes.

To avoid creating bogus merge commits, use rebasing instead of merging when pulling new changes (git pull --rebase on command line, the respective option in a GUI).

@BjarneBitscrambler
Copy link
Contributor Author

@mairas - sorry, I've made a hash of this. I've tried rebasing using both command-line git and Github Desktop, but I can't seem to squash all the commits into one on my remote repository. Rebasing locally seems to work, but then when I try pushing that to my Github repository it complains that my current branch is behind its remote counterpart.

The closest I could come was to revert the unwanted commits, but while that gets rid of the unwanted changes, it also adds yet another commit to the history.

When you suggest "...clone a fresh copy of the repo and reapply the new changes" could you describe in more detail how I go about that? Is that the same as closing this PR and starting a new one?

@ba58smith
Copy link
Collaborator

@BjarneBitscrambler - I think what he means is for you to completely delete your local copy of the SensESP files - the ones that are your Git files. Don't just delete the files, delete the folder. Also, delete your .pio folder. under your PlatformIO SensESP project(s).

Now, in your Git "root" directory (the directory in which all your Git projects live - for me it's Documents/GitHubProjects), do git clone https://github.com/YOURGITHUBDIRECTORY/SensESP.git, which will not only give you the current version of the SensESP source files, it will also create a new, clean local git repo, that doesn't know anything about anything you've ever done with SensESP. (This may not be exactly right - Google "cloning my own github fork".)

Then you need to create your git remotes - typically called "origin" and "upstream", which point to your fork of SensESP on GitHub, and to the main SensESP repo on GitHub. (The "origin" remote might be created by the "clone" command - I don't remember. Google how to set up "upstream" and "origin" remotes.)

At this point, everything should be clean, and in sync with the main SensESP repo, and you can start making commits that shouldn't know anything about any previous commits. I think. Something like that, anyway.

@mairas
Copy link
Collaborator

mairas commented Sep 13, 2020

@BjarneBitscrambler Brian's advice is spot on if you want to start clean; however, if you managed to locally rebase, that's great! Now, when you try to push to the remote branch, yes, Git will complain because you'll be rewriting history. But when you're doing that on your own branch, that's perfectly fine! You just have to "force-push", that is, git push -f etc etc. That will update the PR and sync it with your local history.

However, be warned though; I made a PR to remove the magnetometer code. That is temporary only; there were dependency issues due to conflicting library versions, aggravated by the coinciding PlatformIO version update. Once the dependency issues have been resolved, your code must be taken back in, of course. But while this is under progress, this PR should remain on hold.

@BjarneBitscrambler
Copy link
Contributor Author

BjarneBitscrambler commented Sep 13, 2020

OK - thanks! The advice from @mairas to force the push was successful, but I appreciate the explanation of the how to do a clean restart from @ba58smith.

The extraneous commits have now been 'forgotten' and the two sensor files seem to be in the desired state.

Never mind about the following problem - I did another rebase to remove Brian's commit, and it seems fixed now. Rebasing and force-pushing are pretty powerful tools... :-)

There is one thing I am still concerned about: looking at the comment history for this PR I see a commit by Brian listed between Matti's change request and my reply. It's labelled Correct erroneous Debug message (#162) (a07dcb8) Any idea how that crept into the history for this PR? Was it caused when I merged upstream SensESP into my local repository? It's causing this PR to list 3 changed files, rather than two, on account of src/net/networking.cpp being affected by that commit.

@BjarneBitscrambler
Copy link
Contributor Author

@mairas - no worries about holding off on the orientation sensor until after the library dependencies are sorted out.

-also renamed some variables to conform with project coding style
-moved global instances of sensors from .cpp to inside class
-fleshed out setupSensors() to make gyro setting appropriate to sampling rate
Comment on lines -5 to -16
#include <Adafruit_FXAS21002C.h>
#include <Adafruit_FXOS8700.h>

//#define AHRS_DEBUG_OUTPUT //output diagnostics from sensor read and filter
#define MIN_PRINT_INTERVAL_MS \
(200) // when result printing is requested, it won't happen
// any more frequently than this,
// to avoid overloading the serial channel

// can the following be moved inside class? or into main.cpp?
Adafruit_FXOS8700 fxos = Adafruit_FXOS8700(0x8700A, 0x8700B);
Adafruit_FXAS21002C fxas = Adafruit_FXAS21002C(0x0021002C);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the fxos and fxas objects inside class (see .h file) and instantiated in SensorNXP_FXOS8700_FXAS21002 constructor.

Comment on lines -28 to +33
// TODO: set ranges/sensitivity. For now use ICs' default values.
bool SensorNXP_FXOS8700_FXAS21002::connect(uint8_t pin_i2c_sda,
uint8_t pin_i2c_scl) {
// Ranges/sensitivity use ICs' default values, then setupSensors() adjusts.
bool SensorNXP_FXOS8700_FXAS21002::connect(int pin_i2c_sda,
int pin_i2c_scl) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed TODO, as connect() calls setupSensors() which now does adjust some sensor parameters.

Comment on lines -32 to +42
if (!cal.begin()) {
debugE("Failed to initialize calibration helper");
while (1) yield();
}
if (!cal.loadCalibration()) {
if (!cal_.begin()) {
debugE("Failed to initialize calibration helper");
is_calibrated_ = false;
}else {
if (!cal_.loadCalibration()) {
debugI("No calibration loaded/found...will start with defaults");
isCalibrated = false;
} else {
is_calibrated_ = false;
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of entering infinite loop if calibration setup fails, we set flag saying that data is uncalibrated. Orientation readings will be made, but the debug console will warn that they are uncalibrated.

Comment on lines -45 to +50
Wire.begin(pin_i2c_sda, pin_i2c_scl);
Wire.setClock(400000); // 400KHz I2C
cal_.printSavedCalibration();
is_calibrated_ = true;
}
}//end of loading calibration

if (!initSensors()) {
Wire.begin(pin_i2c_sda, pin_i2c_scl, 400000 );
if (!startSensors()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined calls to setup I2C and set I2C speed into single call.

Comment on lines -53 to +55
setupSensors(); // TODO - can set ranges, etc
printSensorDetails();
setupSensors(); //set ranges, etc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printSensorDetails() removed, as it duplicates what was already done in main.cpp

Comment on lines -61 to +64
accelerometer->printSensorDetails();
gyroscope->printSensorDetails();
magnetometer->printSensorDetails();
accelerometer_->printSensorDetails();
gyroscope_->printSensorDetails();
magnetometer_->printSensorDetails();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed class variables to closer match Google coding style. Also seen in following lines of code (without a comment).

Comment on lines +132 to +146
//confirm we are connected to sensors, attempt reconnect if not
if( !startSensors() ) {
debugW("No connection with orientation sensors!");
//assign orientation value that in NMEA2000 means invalid/unavailable
gx_ = N2K_INVALID_FLOAT;
gy_ = N2K_INVALID_FLOAT;
gz_ = N2K_INVALID_FLOAT;
roll_ = N2K_INVALID_FLOAT;
pitch_ = N2K_INVALID_FLOAT;
heading_ = N2K_INVALID_FLOAT;
accel_event_.acceleration.x = N2K_INVALID_FLOAT;
accel_event_.acceleration.y = N2K_INVALID_FLOAT;
accel_event_.acceleration.z = N2K_INVALID_FLOAT;
return;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each time we call for an updated sensor value, we call startSensors(), which confirms that the sensor is still responding to an I2C ping. If not, then it attempts to re-initialize the connection. If no connection is made, the orientation variables are set to special values that NMEA2000 recognizes as "N/A" or "Undefined". This reduces chance that a failed sensor will not be recognized (as opposed to the displays just freezing at the most recent valid value).

Comment on lines -280 to +325
bool SensorNXP_FXOS8700_FXAS21002::initSensors() {
if (!fxos.begin() || !fxas.begin()) {
return false;
//If not already connected to sensors,
//try to connect and configure them.
//If already connected, check connection and
// return true if still responding.
bool SensorNXP_FXOS8700_FXAS21002::startSensors() {

if( !is_connected_ ) {
if( fxos_.begin() && fxas_.begin() ) {
//the Adafruit begin() routines setup the sensors with these defaults:
// gyro: 100 Hz data rate, +/-250 dps full-scale, HPF off, active state, no FIFO,
// default LPF is 32Hz for ODR=100Hz
is_connected_ = true;
accelerometer_ = fxos_.getAccelerometerSensor();
magnetometer_ = fxos_.getMagnetometerSensor();
gyroscope_ = &fxas_;
setupSensors();
}
}else { //already connected, we believe
//ping each sensor to ensure it's still working/attached
if( !pingFXAS21002() || !pingFXAS8700() ) {
is_connected_ = false; //ping unsuccessful: assume no connection
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed name of initSensors() to startSensors() (to better reflect function). Made the routine more intelligent: if not connected, it calls the sensor begin() methods as before. If already connected (based on results from previous call to fetch orientation data) it pings the sensors to confirm that they are still responding. This way is_connected_ always reflects the most current sensor status, and they can be unplugged/replugged without crashing SensESP. When is_connected is false, the orientation values provided to SensESP are the NMEA2000 values for not-available.

Comment on lines 331 to +332
void SensorNXP_FXOS8700_FXAS21002::setupSensors(void) {
//TODO: we could set the g range for accelerometer here, for example
//Sets ranges for sensors, when defaults are not optimal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setupSensors() now sets the gyro data rate to correspond with the sensor's update rate, which means the internal low-pass filter is appropriate for the sampling rate. The default power-on setting is OK for 100 Hz sampling, but can lead to digital aliasing at lower sampling rates.

Comment on lines +371 to +372
//checks I2C bus to see whether FXAS21002 is connected and responding
bool SensorNXP_FXOS8700_FXAS21002::pingFXAS21002( void ) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new functions to ping the sensors (these weren't available in the AdaFruit libraries unfortunately), so we can quickly check whether the sensor is responding before each reading of the data.

Comment on lines -38 to +40
#include "Adafruit_Sensor_Calibration.h"
#include <Adafruit_FXAS21002C.h>
#include <Adafruit_FXOS8700.h>
#include <Adafruit_Sensor_Calibration.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these two defines into .h from .cpp as they are needed for the Adafruit_FXOS8700 and Adafruit_FXAS21002C objects (which used to be globally variables created in the .cpp file)

Comment on lines -44 to +46
bool connect(uint8_t pin_i2c_sda, uint8_t pin_i2c_scl);
bool connect(int pin_i2c_sda, int pin_i2c_scl);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a type mismatch with what the I2C calls in connect() are expecting.

Comment on lines +70 to +71
Adafruit_FXOS8700 fxos_; // the combined magnetometer + accelerometer
Adafruit_FXAS21002C fxas_; // the gyroscope
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these here from the .cpp file, where they were global variables.

@BjarneBitscrambler
Copy link
Contributor Author

OK @ba58smith - I've implemented the changes needed to keep SensESP running during a sensor outage. Testing as described in the first comment of this PR has been done, and it functions OK on a ESP-Wrover board. It also builds OK for the d1_mini and esp32dev targets, but I don't have those boards to test on.

In addition to the changes needed for accounting for sensor failures, I also made 2 additional types of changes:

  • variable name changes to more closely adhere to our coding style guide. I didn't do all the variables, as I didn't want to affect any external files.
  • added content to setupSensors(), which used to be an empty method. It now adjusts the internal sampling rate of the gyro to match whatever the SensESP reporting rate is set to.

I've added comments to this PR inline with the code changes to hopefully help explain the changes. Let me know of any concerns you have! Cheers.

@tedenda
Copy link
Contributor

tedenda commented Oct 22, 2020

@BjarneBitscrambler , I got my NXP FX87 sensor in the mail yesterday, I can set it up with a Wemos D1-mini and see that it works. Can do that during weekend.

@ba58smith
Copy link
Collaborator

It all looks good to me, but again, it's so technical, I can't comment on how well it will work. I think it would be good to wait until you test it with the D1 Mini before merging, so after you test, update here, please. Then, given the recent difficulties we've been having with all sorts of newly merged changes - I'll leave it up to @mairas to do the merge.

@tedenda
Copy link
Contributor

tedenda commented Oct 22, 2020

@BjarneBitscrambler I have now pulled your code and are running it on a Wemos D1-mini on my desk with success. I see that my heading is drifting a lot at a constant rate, have done the mag calibration. But I don't know if that is something that is related to this PR at all. Maybe we can connect on Slack and discuss what I can test?
Drifting heading

@BjarneBitscrambler
Copy link
Contributor Author

@tedenda : That's great you have it running on your d1_mini! @mairas can now merge this PR, at his convenience.

The drifting heading is something I've seen before, and it's not related to this or any existing PR/Issue. It's happened twice in the last 2 weeks - the rest of the time the heading is quite steady when the sensor is stationary. I haven't tracked down what's causing it yet - my current theory is that there is a zero offset in the gyro data that makes the filter algorithm think there's motion when there isn't. There may also be a scaling issue, whereby the units of the gyro data aren't matching what the filter expects: I believe this because in my testing I also see overshooting, where a sudden motion causes the reported heading to go past the new resting position and then take several seconds to recover. Hopefully with two people now using the sensor we can figure it out quickly :-) I'm not on Slack, but if you want to discuss this over email I think that would be productive and great. You can reach me at bhansen.victoria@gmail.com

@tedenda
Copy link
Contributor

tedenda commented Oct 22, 2020

@BjarneBitscrambler Email sent, now you have my email too!

@DanielG86
Copy link

I received my FXOS8700 sensor as well last week and build it yesterday. I have it connected to a Lolin D32 and compiling went as a charm, apart from some of my beginner's mistakes. Calibration also very easy, and the sensor works fine.
The only thing I've noticed is that there is some overshoot in readings, when the sensor makes significant moves.
The above changes are not included in the code yet, as far as I can see. Furthermore, I had to change the SDA and SCL PIN numbers for my board, but that's easy.
Thanks for the great work sofar!

@tedenda
Copy link
Contributor

tedenda commented Oct 24, 2020

Good that you got it up and running, we are also one more user that can test an improve code for this sensor.
@BjarneBitscrambler is working on documentation and test setups for this sensor, please visit him on GitHub.

@tedenda
Copy link
Contributor

tedenda commented Oct 24, 2020

@DanielG86 Please checkout @BjarneBitscrambler code (this PR) and run that on your board and report back here, if it works fine we can merge this PR and move on with attitude and other things regarding this sensor.

@DanielG86
Copy link

DanielG86 commented Oct 24, 2020 via email

@DanielG86
Copy link

Just tested it on my board and it all seems fine. Works the same as before. Very stable outputs, except some overshoot on sudden moves, before the data settles on the correct number

@tedenda
Copy link
Contributor

tedenda commented Oct 25, 2020

@DanielG86 Thanks for testing this PR!
@mairas You can merge this PR when you have the time!

@DanielG86
Copy link

After initial excellent operation, I did some further testing, which showed me some unexpected behavior:

  1. The Lolin device stops supplying deltas to the server after approx 45 minutes. Exact timing not sure, but I've seen this now a couple of times. The Lolin is still there on the network, and I can access the web-page with configuration options, but the server doesn't receive updates anymore. A "Restart device" brings the updates back.
  2. The LED on the Lolin is flashing the first time after some down-time, but after restarts not anymore. There is no correlation to whether the device sends updates to the server. In other words: the LED can be off while updates are still sent. Not really an issue, but this way I can't check whether the device sends updates based on the LED signal
  3. The sensor sends saw-tooth signals like @tedenda saw, albeit with a period of approx 3 minutes.This could be due to calibration, because I re-arranged the breadboard and device. Will check whether re-calibration will do the trick.

@DanielG86
Copy link

Update: sawtooth behavior depends on proximity of USB cable. And recalibration does the trick, so forget about item 3

@tedenda
Copy link
Contributor

tedenda commented Oct 27, 2020

@DanielG86 I should mount my sensor away from the desk and recalibrate and see if that fixes my sawtooth heading. There is probably some interference that causes it.

Take a look on @BjarneBitscrambler guthub, there is a lot to read there about this sensor and maybe you can help with something.

@DanielG86
Copy link

DanielG86 commented Oct 27, 2020 via email

@BjarneBitscrambler
Copy link
Contributor Author

@mairas - do you have time to merge this PR? I'd hate to see it get stale again and overtaken by other changes to SensESP...

Just to summarize: this modification has been tested on the ESP32-WROVER, d1_mini, and LOLIN D32 boards.

@mairas
Copy link
Collaborator

mairas commented Nov 6, 2020

Sorry, I had failed to follow the PR closely enough! Will merge.

I've got an important-ish question, related to Issue #239, though: after this PR, are there any global variables left in the magnetometer code? They are always linked in regardless of their use, and any SensESP program was using 3 KiB more memory after the initial magnetometer code got merged. I saw that you moved two of the globals inside a class, which probably helps a lot, but it's essential that there are none.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants