Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C/Wire/TwoWire can lock up or cease working in some conditions #349

Closed
Curclamas opened this issue May 4, 2017 · 24 comments
Closed

I2C/Wire/TwoWire can lock up or cease working in some conditions #349

Curclamas opened this issue May 4, 2017 · 24 comments

Comments

@Curclamas
Copy link
Contributor

Bug description

Steps to reproduce

Given the following minimum code example:

#include "Wire.h"

void setup() {
  Wire.begin(21,22);
  Serial.begin(115200);
  Serial.println("Start i2c-test"); 
}

void loop() {
  byte error;

  // 0x20 is the address of a pcf8574 GPIO expander
  Serial.println("Try to contact 0x20"); 

  Wire.beginTransmission(0x20);
  error = Wire.endTransmission();

  Serial.print("Error code is:");
  Serial.println(error);

  delay(1000);
}

We've run this on a Nano32, WROOM32 and on a custom board. Run it:

  • on a proper (10k) pulled up I2C-bus with no devices on it
  • on a bus that has PCF8574 on it
  • on a bus that is only a wire with no pullups in place
  • on a bus where SCL is pulled up and SDA is pulled down

also try to change the condition of the test during the runtime of the code (e.g. remove a pullup or a cable on a breadboard).

Observed behavior

  1. Initially we see error code 0 (no error, i.e. device ACKed) or 2 (NACK on address). After removing a wire or pull up we see error code 3 (NACK on data (⚠️ we have not even sent data here ⚠️)). When we then put in the missing component again the outputted error code will stay 3 until we reset the ESP32.

  2. If we pull down SDA the code execution will stop / there will be no more stuff printed to the console.

  3. Observing what happens on an oscilloscope we see in the first case (removed pullup) that after the faulty code (3) has occured the first time the I2C bus stays dark (high) on SCL and SDA.
    In the second case (SDA pulled down) we observed a steady oscillation at 100kHz on SCL until we reset the device and remove the pulldown on SDA. Removing the pulldown alone does not work.

  4. We have observed an error condition as in (1.) on a known good board after several days of polling a value from a PCF8754 every 20ms. It looks like if the request fails once it will fail over and over again until we reboot.

Expected behavior

  • When we have a faulty connection or device on the I2C bus the Wire commands MUST return a meaningful error code and MUST NOT freeze the ESP32/code execution (maybe timeout or throw an exception if inevitable?).
  • Also when we have removed the faulty hardware condition and retry the command it MUST return a 0 (if we have an i2c device) or 2 (if no device is present) error code. I.e. it MUST be resilient against faulty conditions without a reboot.
@me-no-dev
Copy link
Member

Excellent bug report!

@Curclamas
Copy link
Contributor Author

I've done a little bit more research on that using a DSO.
While doing that I've noticed quite some strange things.

First Screenshot: Spurious signals

Still given that example code above and just having a proper terminated bus with no other devices the first thing we see is:
image

What we see here: (Yellow is SCL and blue is SDA)

  • A spurious pulse on SDA and then SCL
  • Then we have a I2C transmission that looks like READ to addr 0x00 (what is that, general call addr? why?) with a NACK on the end
  • Then starts what looks like the expected: first addr: 0b100000 (0x20 in 7 bit) then 0b0 (read) then 0b1 (NACK) and then the 0b1 from the STOP condition (P)

This bothers me a little since I don't understand where the first pulse and the first strange package comes from. I know that spurious signals can fuck up the state machines of some I2C-sensors but I'm certainly no I2C-Expert. Also on a production board I am routinely running an I2C-scan on boot, sometimes it shows devices which are clearly not there, maybe that could have something to do with that?

Here are the two packages up close:
image
image

Second Screen: Lockup if SDA gets pulled low

In the following screenshot we see what happens if we pull SDA low (spike comes from bouncing) between two Packages that are 1000ms apart. You see the first package (just as a spike because of the zoom). Then you see the second message. Afterwards SDA gets pulled low (AFAIK that can happen in I2C if the bus is "busy").
We can see that after the bus has been pulled low no more spikes can be seen on SCL. That applies to SCL and SDA when that pulldown is removed and the bus goes back to normal operation. The system is locked then
image

Third screen: What happens with wire.begin() in the loop() ?

I was curios to know what happens if you put wire.begin() in the loop() . Because sometimes the system would not lock up but rather report error 3 all the time.
What happens then is: we always get those strange two packages from Screen one (the single pulse only occurs on the first package after boot) everytime.
image

If we have a faulty condition the device will lock up or - if it does not lock up - will still respond error 3 and not attempt do do any more activity on the I2C bus:
image

@me-no-dev
Copy link
Member

Yes I know about that first pulse. It comes from the hardware. I have tried many things to make it not happen, but all unsuccessful. I think I put a bit of delay there to help the device differentiate it from actual data, but that is as much as I can do about it. As for the slave pulling SDA low... that usually means that the slave has stuff to do and will release the data line once done. Used in many devices.

@me-no-dev
Copy link
Member

Teo (Espressif's boss) just gave me an idea :) I'll give it a go and report. If you do not hear from me in the next 3 days, please bug me to do it. I am not in my country currently and will return tomorrow, so there is a chance that I will forget.

@CarlosGS
Copy link

I've had similar problems with i2c. In my case the device locks up when SDA keeps pulled low.

This also happened with ESP8266, maybe the recovery code could be reused: esp8266/Arduino#1025

@me-no-dev
Copy link
Member

Looking at my code, I have already come up with the same idea... that is that strange transmission at init. A transmission that aims to reset the bus.

@lonerzzz
Copy link
Contributor

I have been fighting with this problem myself for a almost a week now. What I have observed is the following:

  • when the error occurs, the bus stays in a busy state, regardless of whether the SDA and SCL lines are recovered, preventing subsequent messages from being sent. The busy state is from the bus_busy parameter of the status_reg in this file: https://github.com/espressif/esp-idf/blob/master/components/soc/esp32/include/soc/i2c_struct.h
  • single byte reads following after single byte writes most often lead to the problem
  • conversely a single byte write followed by a five byte read results in the problem happening less frequently for the same overall number of messages
  • separating the read from the preceding write by a delay of 3 microseconds is enough to drop the frequency of the error considerably (this has me wondering if we have a variable that is maybe not volatile so not being read directly all the time?)
  • when the I2C gets into the stuck state, there is often an inability for WiFi to be established after a reboot so wondering if flash or a peripheral is sharing the first I2C bus and contributing to problems)

@me-no-dev
Copy link
Member

what if we give it timeout and then manually set bus_busy to 0?

@lonerzzz
Copy link
Contributor

Either or both of those are valid options for testing. I am willing to test anything you want to try. I am getting familiar with the esp32-hal-* files but haven't ventured into the idf code very deep just yet. Let me know how you wish to proceed.

@me-no-dev
Copy link
Member

@lonerzzz I went through the code and I do not see me checking after write if bus_busy is 0. It is done only in case of error. I have the feeling that I did this on purpose but my memory of the case is not that good. Maybe we need to make sure that the bus is not busy at the end of write, wait a bit and set it to 0 if it fails (then return some error I guess).

@lonerzzz
Copy link
Contributor

@me-no-dev I have been trying a number of things, but I am not able to reset the state such that the flag i2c->dev->status_reg.bus_busy gets set back to 0. I have also observed that in a number of situations, this flag is already set before i2cInitFix is even called just after starting to boot. I added code in i2cInitFix to check bus_busy because the i2cInitFix code was sitting in the tight loop at the end of the method after a reasonable percentage of reboots and preventing system startup. I am using the ESP32 as an I2C master and nothing is using the bus in my application prior to the Wire.begin() call.

@muktillc
Copy link

I am not being able to generate the I2C clock on the pin. Don't know why the simple clock is also not present. I have the following code.

Hardware:

Board: ESP32 Dev Module
Core Installation/update date: 11/jul/2017
IDE name: Arduino IDE
Flash Frequency: 40Mhz
Upload Speed: 115200

Description:

I am not able to generate the I2C clock. I have an ST microelectronics accelerator that I am using which has an I2C interface. So just wanted to see if I2C is working.

I have seen many people running into the same problem but could not find a solution for this issue. No errors generated when compiling the code.

Please find my code below.

Sketch:

//Change the code below by your sketch

#include "Wire.h"

void setup() {
Wire.begin(21,22);
Serial.begin(115200);
Wire.setClock(400000); // choose 400 kHz I2C rate
Serial.println("Start i2c-test");
}

void loop() {
byte error;

// 0x20 is the address of a pcf8574 GPIO expander
Serial.println("Try to contact 0x20");
uint8_t data = 0;
// Wire.beginTransmission(0x3A);
// error = Wire.endTransmission();
Wire.beginTransmission(0x3B); // Initialize the Tx buffer
Wire.write(0x0F); // Put WHO_AM_I address in Tx buffer
Wire.endTransmission(false); // Send the Tx buffer, but send a restart to keep connection alive
Wire.requestFrom(0x3B, 1); // Read two bytes from slave PROM address
while (Wire.available()) {
data = Wire.read(); } // Put read results in the Rx buffer
Serial.print("Error code is:");
Serial.println(data);

delay(1000);
}

@kharar
Copy link

kharar commented Jan 23, 2018

Something similar happens when I run the following code on my ESP-WROOM-32, however the problem seems to disappear when I comment out the WiFi stuff in setup()...

I had an oscilloscope hooked up to the clock line and got the same readings as OP describes in Observed Behavior step 3 second case: steady oscillation on clock line and nothing on the data line (except that I did not pull down the data line, at least not manually)

I noticed that for each reset/run it takes a different amount of time before the fault appears.
But if I uncomment the WiFi stuff in loop() it will almost certainly lock up I2C as soon as a GET request is received from an external browser.

Can anyone replicate this using different I2C perferals?

#include <WiFi.h>
#include <Wire.h>
#include <Adafruit_MMA8451.h>
#include <Adafruit_Sensor.h>
#include <VL53L0X.h>

const char* ssid     = "mySSID";
const char* password = "myPASSWORD";

Adafruit_MMA8451 mma = Adafruit_MMA8451();
VL53L0X sensor;
WiFiServer server(80);

void setup()
{
    Serial.begin(115200);
    pinMode(2, OUTPUT);      // set the LED pin mode
		
    delay(10);

  Serial.print("Connecting to SSID:");
  Serial.print(ssid);
  WiFi.begin(ssid, password);
  while(WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.print(" - WiFi connected at IP address: ");
  Serial.println(WiFi.localIP());
  server.begin();

  Serial.println();
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(ssid);
		
    WiFi.begin(ssid, password);

    while (WiFi.status() != WL_CONNECTED) {
      delay(500);
      Serial.print(".");
    }

    Serial.println("");
    Serial.println("WiFi connected.");
    Serial.println("IP address: ");
    Serial.println(WiFi.localIP());
    
    server.begin();

  Wire.begin();

  Serial.print("Setup VL53L0X...\n");
  sensor.init();
  sensor.setTimeout(500);

  Serial.print("Setup MMA8451...");
  if (! mma.begin(0x1C)) {
    Serial.println("Couldnt start MMA8451");
  }
  mma.setRange(MMA8451_RANGE_2_G);
  Serial.print("Range = "); Serial.print(2 << mma.getRange());  
  Serial.println("G");
}

int value = 0;
int MMA8451_Z = 0;
unsigned int distance_reading = 0;

void loop(){
  distance_reading = sensor.readRangeSingleMillimeters();
  if (sensor.timeoutOccurred()) {
    Serial.print(" VL53L0X TIMEOUT!\n");
  }

  mma.read();
  MMA8451_Z = mma.z;

  Serial.print(MMA8451_Z);
  Serial.print(" ");
  Serial.print(distance_reading);
  Serial.println();
  
  
/*
 WiFiClient client = server.available();   // listen for incoming clients

  if (client) {                             // if you get a client,
    Serial.println("New Client.");           // print a message out the serial port
    String currentLine = "";                // make a String to hold incoming data from the client
    while (client.connected()) {            // loop while the client's connected
      if (client.available()) {             // if there's bytes to read from the client,
        char c = client.read();             // read a byte, then
        Serial.write(c);                    // print it out the serial monitor
        if (c == '\n') {                    // if the byte is a newline character

          // if the current line is blank, you got two newline characters in a row.
          // that's the end of the client HTTP request, so send a response:
          if (currentLine.length() == 0) {
            // HTTP headers always start with a response code (e.g. HTTP/1.1 200 OK)
            // and a content-type so the client knows what's coming, then a blank line:
            client.println("HTTP/1.1 200 OK");
            client.println("Content-type:text/html");
            client.println();

            // the content of the HTTP response follows the header:
            client.print("Click <a href=\"/H\">here</a> to turn the LED on pin 5 on.<br>");
            client.print("Click <a href=\"/L\">here</a> to turn the LED on pin 5 off.<br>");

            // The HTTP response ends with another blank line:
            client.println();
            // break out of the while loop:
            break;
          } else {    // if you got a newline, then clear currentLine:
            currentLine = "";
          }
        } else if (c != '\r') {  // if you got anything else but a carriage return character,
          currentLine += c;      // add it to the end of the currentLine
        }

        // Check to see if the client request was "GET /H" or "GET /L":
        if (currentLine.endsWith("GET /H")) {
          digitalWrite(2, HIGH);               // GET /H turns the LED on
        }
        if (currentLine.endsWith("GET /L")) {
          digitalWrite(2, LOW);                // GET /L turns the LED off
        }
      }
    }
    // close the connection:
    client.stop();
    Serial.println("Client Disconnected.");
  }
*/
}

@ESP32DE
Copy link

ESP32DE commented Feb 3, 2018

@Curclamas
try this
#1073

@DjordjeMandic
Copy link

DjordjeMandic commented Mar 21, 2018

I dunno if this is closed or .. but i have a problem where my bus freezes with sda high and scl clocking constantly with duty of 50%. All devices fail to read and whole bus is frozen. Shorting sda to ground and then releasing it fixed the problem but after some random time it happens again. I have 7 devices on bus and 4 of them have pull ups(modules). My problem is explained here . I tried to short SDA with GND really fast and it is temp. fix.

Here is it explained if youre lazy to open that link:

Hi,

I am encountering a problem with I2C communication in dsPIC6014A with 3 I2C slaves, found work around but Root cause is unknown.I want to understand why I2C is behaving like and what can cause to behave like that.

Issue: I2C will be working normally , After some time (random interval) I2C not working and never recovers. At this state, Reading of all 3 slaves are failing.When i probe the I2C lines, SDA is always HIGH and clock Coming continuously on SCL.

Time out for I2C communication handled,So software not stuck in any loop & other functionalities working fine.

When i pull down the SDA momentarily by shorting SDA to GND, I2C comes back to working condition.

Thanks & Regards,
Venkatesh

So i was wondering if this is maybe part of this issue?

Edit:
Now i found out that after shorting stuck high SDA with ground WHILE SCL is giving 50% duty clock scl just goes high as well and i have to restart ESP. Does anybody know command to restart I2C from software?

@stickbreaker
Copy link
Contributor

@DjordjeMandic You might want to try my fork. stickbreaker/arduino-esp32

You can just grab the Release V0.2.0 Zip file, it only contains the modified i2c files, just overwrite the existing files of the same name.

The Release files are here: Release V0.2.0 Stickbreaker/Arduino-ESP32

Chuck.

@DjordjeMandic
Copy link

DjordjeMandic commented Mar 21, 2018

@stickbreaker Thank you, you made it so simple to indentify problem and recover bus. I tried shorting any of i2c pins and it shows error busy which is logical and also right after i let it go it recovered itself!!! Thank you a lot!

Shorting SDA to ground gives

I2C Error!!
Read of (4) bytes read 0 bytes
Failed lastError=5, text=BUSY
0 28 FF B6 
[E][esp32-hal-i2c.c:1127] i2cProcQueue(): I2C exitCode=0x112

Shorting SCL to ground gives

I2C Error!!
Read of (8) bytes read 0 bytes
Failed lastError=5, text=BUSY
0 10 48 BD 0 0 0 1 
I2C Error!!
Read of (4) bytes read 0 bytes
Failed lastError=5, text=BUSY
0 10 48 BD .....

I wrote it to print hex received data, not chars but it works like a charm!!

@stickbreaker
Copy link
Contributor

@DjordjeMandic It is kind of nice when the hardware and software work. Thanks for the compliment.

I should be releasing new version later this week. I'm testing multiple Master transaction arbitration. The current V0.2.0 assumes a Bus_Busy (Busy) error is a hardware fault, so, it immediately goes into a hardware reset, stimulation sequence. But in a MultiMaster environment, BUSY is just notification that the other Master is using the bus. If a hardware reset, stimulation sequence is initiated, it corrupts the 'other' Masters transaction. So, It it taking some thought and testing to distinguish a hardware fault from bus sharing.

Chuck.

@DjordjeMandic
Copy link

@stickbreaker What does [E][esp32-hal-i2c.c:1127] i2cProcQueue(): I2C exitCode=0x112 mean? Is this kernel message? Should i be worried if i see it in future?

@stickbreaker
Copy link
Contributor

@DjordjeMandic That message is an internal status value that was passed back from the ISR handler to the application HAL layer. 0x112 breaks out as: (these error values are defined in esp32-hal-i2c.h)
EVENT_ERROR 0x02
EVENT_DONE 0x10
EVENT_ERROR_ARBITRATION 0x100

The reason it emitted that message is because my code may not have handled the event correctly, my code is still in development I am not yet comfortable with how I have coded the response. So, I have the code emit this type of message so that users will give me feed back and describe the circumstances when this message happened.

Interpreting that error:

  • EVENT_DONE - the ISR exited normally, It detected that it should exit, and posted the EventGroup semaphore; (xEventGroupSetBitsFromISR(i2c->i2c_event, exitCode, &HPTaskAwoken);) This bit is only set as the last step in updating the exit code, so it marks the exitcode as valid.
  • EVENT_ERROR - This code means that the exit was abnormal. Not all queued transactions were completely processed. I have designed the ISR to handle multi task transactions. I am planning on setting up a 'central' dispatcher that packages and submits transactions from multiple concurrent overlapping tasks.
    My requirement for task level i2c functionality stem from my personal needs: I use 20x4 LCD with 4x4 keypads as user interfaces. I want all interactions with these devices to be prompt and reactive, I dislike having to insert mk.scan() calls into my foreground application loops to maintain responsive keypads. I already have my keypad hardware issuing an Interrupt when pad activity is detected, but calling Wire() to service this interrupt from an interrupt context is Very VERY dangerous! And Usually results is fatal events.
    So, this EVENT_ERROR alerts the i2cProcessQueue() that it needs to inspect each queue entry and update their individual error codes. When this status EVENT_ERROR is emitted, the current queue entry is the error source, all prior queue entries successfully completed, and all successor queue entries were never processed. The i2cProcessQueue() dispatcher may resubmit unprocessed queue entries. The successful queue items will be released with xEventGroupSetBits(i2c->dq[b].queueEvent,EVENT_DONE);
    The failing queue item will receive: xEventGroupSetBits(i2c->dq[b].queueEvent,exitCode);
    Currently the unprocessed queue Items receive: xEventGroupSetBits(i2c->dq[b].queueEvent,exitCode | EVENT_ERROR_PREV);
    I hope to complete this complex idea before I finish this code.
  • EVENT_ERROR_ARBITRATION was the actual error. This is a 'new' error from me. The I2C driver code has reached a stability point where I can now actually try to handle Multi Master configurations. Prior to this a BUS_BUSY status created an unrecoverable machine state. Now I can clear a BUS_BUSY if it was generated by a Glitch. An ARBITRATION status, will create a legitimate BUS_BUSY that is NOT indicative of a hardware fault, it just means the 'other' master won the race, and now has control of the bus. This master just need to politely wait for it's turn. My current code(V0.2.0) does not understand this situation. It blindly assumes (ass u me) that the BUS_BUSY is a hardware glitch that requires intervention. (Reset peripheral, manually stimulate I2C bus). When these action are taken in a Multi-Master environment, havoc ensues. So, this is what I am working on now. I am testing and tweaking my code to handle ARBITRATION errors.

Is this kernel message? Should i be worried if i see it in future?

Does your I2C bus have multiple Master devices? If does, then, V0.2.0 code will cause havoc. Every time a bus collision occurs and the ESP32 looses, it will attempt to regain control of the bus by resetting it's hardware and YELLING at the top of its lungs. (MINE, MINE, MINE). If the other Master is polite, this obstreperous ESP32 will grab the bus, use it, and then backoff allowing the polite Master to 'share' it.

If your hardware does not include Multiple Master I2C devices, then Yes you should be concerned, Because the error should not exist. Something is interfering with SDA (holding it low).

Chuck.

@stickbreaker
Copy link
Contributor

@Curclamas Have you tried the current main branch code?
Chuck.

@Curclamas
Copy link
Contributor Author

@stickbreaker thanks alot for your I2C driver! This seams to work much better and yet recovered from all spurious/provoked faults. I will close this ticket therefore.

@BorisBrock
Copy link

Is this modification already part of the main ESP32 Arduino project?

@atanisoft
Copy link
Collaborator

@Vankurt Yes it should be since at least 1.0.2 and has been pretty stable from a core I2C perspective. Libraries which use it though may not be 100% stable due to various reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests