Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

method for recovering I2C bus #1025

Closed
dave-prosee opened this issue Nov 15, 2015 · 22 comments
Closed

method for recovering I2C bus #1025

dave-prosee opened this issue Nov 15, 2015 · 22 comments

Comments

@dave-prosee
Copy link
Contributor

dave-prosee commented Nov 15, 2015

When a reset occurs during a I2C transmission of a slave to the ESP8266 the SDA might hang.
This situation is known and described e.g. in the AN10216 publication on page17 / slide 42 .
[Uploading AN10216.pdf…] In my case it's the DS1307 realtime clock providing the error situation.

The described procedure in AN10216 (clocking 8 times for remaining incoming bits, then sending a NACK and a STOP) makes the slave release the SDA line.
I have done this by a simple program (see underneath) but a subsequent inline restart of Wire with various variants (with/without STOP, with/without pins) didn't help me so far. On an oscilloscope both lines remained high.
After this code:

  • A restart of the program bij reset line is successful
  • An ESP.reset(), ESP.restart() or system_restart()
    are not successful.
    So a software reset left the SDA and CLK lines high but the communication on the I2C didn't start.

To complement the above: the I2C communication works fine, as well as the called I2C_scan. It is only after a reset that SDA might hang low and that I can't recover fully from an SDA line stuck low by software. I get the line high again but cannot get I2C working again without pulling the reset line low.

Hence I would appreciate any hint in getting this resolved inline.
Maybe the recovery could also be added to the library?

Thanks in advance for any support.

Code entered when hanging I2C is found:

  Serial.println("Starting I2C bus recovery");
  delay(2000);
  //try i2c bus recovery at 100kHz = 5uS high, 5uS low
  pinMode(SDAPIN, OUTPUT);//keeping SDA high during recovery
  digitalWrite(SDAPIN, HIGH);
  pinMode(CLKPIN, OUTPUT);
  for (int i = 0; i < 10; i++) { //9nth cycle acts as NACK
    digitalWrite(CLKPIN, HIGH);
    delayMicroseconds(5);
    digitalWrite(CLKPIN, LOW);
    delayMicroseconds(5);
  }

  //a STOP signal (SDA from low to high while CLK is high)
  digitalWrite(SDAPIN, LOW);
  delayMicroseconds(5);
  digitalWrite(CLKPIN, HIGH);
  delayMicroseconds(2);
  digitalWrite(SDAPIN, HIGH);
  delayMicroseconds(2);
  //bus status is now : FREE

  Serial.println("bus recovery done, starting scan in 2 secs");
  //return to power up mode
  pinMode(SDAPIN, INPUT);
  pinMode(CLKPIN, INPUT);
  delay(2000);
  //pins + begin advised in https://github.com/esp8266/Arduino/issues/452
  Wire.pins(SDAPIN, CLKPIN); //this changes default values for sda and clock as well
  Wire.begin(SDAPIN, CLKPIN);
  //only pins: no signal on clk and sda
  //only begin: no signal on clk, no signal on sda


  //no further processing in case of error
  while(true)
  {
    i2c_scan(); 
  }

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@drmpf
Copy link

drmpf commented Nov 15, 2015

I write a library for I2c recovery
Reliable Startup for I2C
http://www.forward.com.au/pfod/ArduinoProgramming/I2C_ClearBus/index.html
that looks similar

@dave-prosee
Copy link
Contributor Author

@drmpf
Thanks for the hint!
Your code is more extensive in that it deals with CLK being low also and a little different by using START. I will try it later today and will let you know (here).

I see it is written for Atmel but apart from the line below I think everything runs on ESP8266 as well.

 TWCR &= ~(_BV(TWEN)); //Disable the Atmel 2-Wire interface so we can control the SDA and SCL pins directly

Note that I did get SDA high again (CLK was already high) but could not get any signal out of CLK and SDA after Wire.pins() and/or Wire.begin(). To me it seems something in the OS.

Cheers.

@dave-prosee
Copy link
Contributor Author

@igrr Thanks for adopting it as an enhancement. If you want me to try something else, please let me know. I could even tweek libraries but would need some guidance.

@igrr
Copy link
Member

igrr commented Nov 16, 2015

At the moment I have no idea why would Wire library stop operating after this restart. Once this is determined and fixed, we can proceed to add this recovery code to the library itself.

If you have capacity to investigate this, Wire library source is here and software I2C implementation is here.

@dave-prosee
Copy link
Contributor Author

@drmpf I tried your code (without that single TWCR line) and it worked like a charm.
@igrr Need to investigate what the difference really is. Will come back on this later at the end of the week.
At both: THANKS.

@drmpf
Copy link

drmpf commented Nov 16, 2015

Did some more testing yesterday,
That code is sub-optimal for ESP as the ESP supports true OUTPUT_OPEN_DRAIN gpio modes
I tested

pinMode(pinNo, OUTPUT_OPEN_DRAIN);

mode on GPIO0,1,2,3 and it worked on all of them

So the multiple lines like

  pinMode(SDA, INPUT); // remove output low
  pinMode(SDA, INPUT_PULLUP); // and make SDA high i.e. send I2C STOP control.

should be updated to us the OUTPUT_OPEN_DRAIN mode.
Actually surprised the code worked so well because those sequences are very hardware dependent on the way the Atmel AVR control registers double up on INPUT_PULLUP settings and output state.

Would need to do some detailed testing on switching from OUTPUT_OPEN_DRAIN to INPUT to ensure no glitches.

Perhaps @igrr could advise on how mode/state GPIO registers work when transitioning from OUTPUT_OPEN_DRAIN to INPUT and vise versa

@drmpf
Copy link

drmpf commented Nov 16, 2015

Updated
http://www.forward.com.au/pfod/ArduinoProgramming/I2C_ClearBus/index.html
so that is compiles for both ESP8266 and UNO etc

@dave-prosee
Copy link
Contributor Author

@drmpf Super work.
Since I have only one master I found it curious that cycling the clock and having a START/STOP didn't work with pinMode output (allthough I now am aware that output is not the right mode for pullup). I did get the same IO pattern as your code but is was still not working. Then I realised I have a level shifter as underneath.
Allthough the reasoning is not there yet, i have the feeling that the different pinModes have a different effect and hence keep some I2C devices in the wrong mode. I was measuring the signal at the +5V side where the RTC chip is.
@igrr Anyways, my first impression that the wifi.begin() (and OS reset) didn't function sometimes should be abandoned. The code from drmf works flawless and the signal quality is much better than the code I used (i saw glitches on mine). The handling of clock stretch and second master is also elegant. Thumbs up for drmpf's code added in the next release.

biderectional level shifter

@dave-prosee
Copy link
Contributor Author

I am not familiar with forking and pulling so I didn't want to go that way right now.
Sorry to say this on Github but as I am using the Arduino IDE I think I need to work out how that relates first. For now I have been looking into wire.cpp and hacking si2c.c and twi.h and want to share my results.

In core_esp8266_si2c I added subroutine:

int twi_mediate(){           
    if (SCL_READ()==0)     return 1;       //SCL held low by another device, no procedure available to recover
    int clockCount = 20;                   

    while (SDA_READ()==0 && clockCount>0){ //if SDA low, read the bits slaves have to sent to a max
        twi_read_bit();                    
        if (SCL_READ()==0) return 2;       //I2C bus error. SCL held low beyond slave clock stretch time
    }

    if (SDA_READ()==0)     return 3;       //I2C bus error. SDA line held low by slave/another_master after n bits.

    if(!twi_write_start()) return 4;       //line busy. SDA again held low by another device. 2nd master?
    else                   return 0;       //all ok 
}

it follows the logic of @drmpf and uses the macros and subs found in there.
I skipped the delay because the user can add this in his/her program before calling.

To me it seems logical to add it this to the twi_init which seems to be called by all the variants of Wire.begin() and Wire.pins(). It would be nice to change the return value to an integer (needs a change in twi.h) but that would also need a subsequent change in Wire to finally deliver it to the main program.

int twi_init(unsigned char sda, unsigned char scl){
  ets_printf("twi_init\n"); 
  twi_sda = sda;
  twi_scl = scl;
  pinMode(twi_sda, INPUT_PULLUP);
  pinMode(twi_scl, INPUT_PULLUP);
  twi_setClock(100000);
  return twi_mediate();
}

and twi.h

int  twi_init(unsigned char sda, unsigned char scl);

I did not change wire in my setup. Tested all and it works well.

Maybe @igrr can use this for 2.1.0.
Regards

@Humancell
Copy link

I just wanted to add to this thread some things I noticed today. I have been using an old stable version (v1.6.5?) for a long time with no I2C problems. Just today I updated to the current "stable" version (2.0?) and now pretty much all of my I2C master software breaks within a very small amount of time of running ... usually in < 15 minutes. My master is polling a device every second, and requesting 15 bytes.

All of this was working flawlessly in the older version - running on ESP-12F modules - and now after ~5-15 minutes all of the Wire.requestFrom() requests begin to return 0 bytes.

I have not been able to find a way around this yet ... besides a full power cycle. I also noticed that sometimes I'll get back a partial/damaged payload ... some bytes, and then the rest FF.

I've now commented my connectWiFi() function to see if there is a difference without the WiFi operational.

BTW, if (WiFi.status() == WL_CONNECTED) returns TRUE, even when the Wifi calls have not been made to initialize it? (no call to Wifi.BEgin(), etc.)

@Humancell
Copy link

Not sure if this will help of not, but I wanted to add more testing results:

  1. It did not matter if Wifi was running or not ... the I2C bus issues occur either way.
  2. I added the I2C_ClearBus() code (from the link above) to my project and so far that is taking care of the problem! Each time it has been called, it properly resets the I2C bus and subsequent calls return properly.

What I did was put a counter in the loop() when I get a Wire.requestFrom() that returns 0 bytes. If this occurs 3 times, then I call the I2C_ClearBus() code, and it appears to clear the issue, reset the bus, and allow my code to continue to run. I might drop this to 2, as I noticed that several times the issue seems to clear after the first failure without having to call I2C_ClearBus().

I hope this might be of some value.

@mtnbrit
Copy link

mtnbrit commented Feb 12, 2016

Is there any insight as to what is broken here? I have the same experience, i2c used to work fine but since im now using 1.6.7 and git version, i get the same failure mode as described above where only power cycling the chip will get it back working.

@dave-prosee
Copy link
Contributor Author

I had dreadful times with I2C and other stuff. Now I am running stable apps (have one running for 6 weeks now, Wifi, serial, I2C, SPI all mixed). My lessons finally just got down to just two:

  • have a real stable power supply. Specs advice to be below 25mV ripple. That is my experience as well
  • I used the I2C bus too heavily so any power cycle or interrupt could be interrupting the I2C protocol. The AN10216 document gave me the insight and the code of drmpf the right answer to recover from some situations. So you have to understand all use cases the I2C bus can be in.
    Hopes it helps.

@mtnbrit
Copy link

mtnbrit commented Feb 16, 2016

My impression is that something broke or changed in the SDK or core at some point that causes previously stable sketch code to now have problems with i2c hanging up and requiring the un-blocking code, which although it works just fine, feels like a nasty hack. I'd rather have code that didnt hang up the i2c bus, like before.

@igrr igrr modified the milestones: 2.2.0, 2.1.0 Feb 27, 2016
@igrr igrr modified the milestones: 2.2.0, 2.3.0 Apr 18, 2016
@dave-prosee
Copy link
Contributor Author

dave-prosee commented May 14, 2016

@igrr It took me some time, but made a pull request today with the above mentioned solution. I have been working with it for several months and never ever had a problem.
Please notify me if there's something wrong as it happens to be my first pull request ever ;-)

@igrr igrr added waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. staged-for-release and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels Jun 2, 2016
@igrr igrr closed this as completed Jun 23, 2016
@lichtheini
Copy link

lichtheini commented Jul 31, 2017

Well, it's an old issue, but i had the same problem with a stuck I2C bus on startup (hangs at Wire.endTransmission). The SCL line was high, but SDA was being held low. I tried the solutions here and ended with error 3 ("I2C bus error. Could not clear. SDA data line held low").

Finally found a solution here and want to share it in the first Google result: I connect SCL parallel to a GPIO and check, if the error condition occurs. Then just put the GPIO (and SCL) to low for some time.
It however doesn't work to change the SCL pin directly to output and write LOW on my Feather nRF52832.

//Serial.print(digitalRead(PIN_SCL));    //should be HIGH
//Serial.println(digitalRead(PIN_SDA));   //should be HIGH, is LOW on stuck I2C bus

if(digitalRead(PIN_SCL) == HIGH && digitalRead(PIN_SDA) == LOW) {
      Serial.println("reset");
      pinMode(15, OUTPUT);      // is connected to SCL
      digitalWrite(15, LOW);
      delay(2000);              //maybe too long
      pinMode(15, INPUT);       // reset pin
      delay(50);
//Serial.print(digitalRead(PIN_SCL));    // HIGH
//Serial.println(digitalRead(PIN_SDA));  // HIGH
}

Hopefully, other people find this solution useful!

@drmpf
Copy link

drmpf commented Aug 1, 2017

Thanks for that. I have added this note to the webpage. (http://www.forward.com.au/pfod/ArduinoProgramming/I2C_ClearBus/index.html)
Let me know if you want a more explicit attribution.

@lichtheini
Copy link

That's more attention than I thought 👍 I've added a soft reset after this piece of code, sometimes it needs to restart up to 4 times, but always recovers the I2C bus.

@espkh4
Copy link

espkh4 commented Aug 16, 2017

Hi lichtheini,

What do you mean by soft reset? reset of the master or reset of the I2C bus?

I am currently trying to debug my I2C communication. I have several SRF10 sonars connected Arduino Uno with the necessary pullup resistors. After a while,(sometimes can be a day) the I2C(both SDA and SCL high) will hang. Arduino is still running as I can see other non-I2C devices communicating with it. The values I get back is either "0" or "512" from some or all of the SRF10 sonars. Somehow reflashing Arduino will get things going again hence I suspect its a software issue. Clockrate is at 100khz.

Any ideas or suggestions are welcomed.

@lichtheini
Copy link

Hi, I'm afraid I'm not able to help you. Soft reset is a reboot of the master (on the NRF it's NVIC_SystemReset(); or like pressing the reset button) The method I discovered is very specific to the NRF-Adafruit-combination, I think. The issue has not occurred with the same sensor and a RFduino as master. And looking on the Adafruit forums, the whole wire library should be improved.

Good luck!

@klucsik
Copy link
Contributor

klucsik commented Feb 22, 2018

I leave here a warning.
Do not attempt to use this method to 5v mcu without the proper protection!learned this in a hard way. Beside this, this small code is super usefull, thank you for sharing it!

@johnty
Copy link

johnty commented Feb 23, 2018

here's a good way to do level shifting. just need two 10k resistors and a mosfet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants