Skip to content

Random hang at SERCOM.cpp:526 #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kb- opened this issue Mar 28, 2017 · 13 comments
Open

Random hang at SERCOM.cpp:526 #222

kb- opened this issue Mar 28, 2017 · 13 comments

Comments

@kb-
Copy link

kb- commented Mar 28, 2017

I'm trying to set up multi-master I2C communication on board running an ATSAMD21G18 (2 masters and slave so far) with this code: https://github.com/kb-/Arduino-I2C-2way-anydata/tree/dd5597835c9a8a89ff23cc17605bb6f59faa5699

One of the masters randomly hangs after a large number of transfers (around 70 000 on average), always at line 526 of SERCOM.cpp

The same behavior occurred on 5 different boards I tried, 2 different models.

I'm getting this in the backtrace but can't figure out what's wrong :

(gdb) bt
#0 0x0000331a in SERCOM::sendDataMasterWIRE (
this=this@entry=0x20000300 , data=)
at /home/oe/.arduino15/packages/misfittech/hardware/samd/1.0.0/cores/arduino/SERCOM.cpp:526
#1 0x0000275a in TwoWire::endTransmission (this=this@entry=0x2000018c ,
stopBit=stopBit@entry=true)
at /home/oe/.arduino15/packages/misfittech/hardware/samd/1.0.0/libraries/Wire/Wire.cpp:134
#2 0x00002784 in TwoWire::endTransmission (this=this@entry=0x2000018c )
at /home/oe/.arduino15/packages/misfittech/hardware/samd/1.0.0/libraries/Wire/Wire.cpp:151
#3 0x0000224c in loop ()
at /home/oe/sketchbook/Master_test_i2c_data2/Master_test_i2c_data2.ino:81
#4 0x00003846 in main ()
at /home/oe/.arduino15/packages/misfittech/hardware/samd/1.0.0/cores/arduino/main.cpp:46

It doesn't occur with one master only (4 000 000 successful transfers).

@kb-
Copy link
Author

kb- commented May 31, 2017

Does anyone have an idea of how to prevent this?

@lupalby
Copy link

lupalby commented Dec 14, 2017

From my experience with a multi-master environment, the problem is caused by conflict in the communication between the 2 masters. At some point the 2 masters eventually start communicating at the same time and one of the 2 loses arbitration of the bus. It is a normal scenario, but the library as it is right now does not handle loss of arbitration. I fixed it a couple of months ago in my local copy and used the system without any more hanging issues. I recently opened a pull request here to share the fix. Check out my very simple modification and try it on your system. Let me know if it fixes the problem.
#284

@kb-
Copy link
Author

kb- commented Apr 2, 2018

Hi, I finally got to try your fix. I'm still getting random hangs unfortunately, but this time at line 491 of SERCOM.cpp (infinite while loop)
Both Masters hang at the same time, but I only have the hardware to check where one of them stopped.

0x0000324c in SERCOM::startTransmissionWIRE (this=0x200002fc , address=24 '\030',
flag=flag@entry=WIRE_WRITE_FLAG)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino/SERCOM.cpp:491
+bt
#0 0x0000324c in SERCOM::startTransmissionWIRE (this=0x200002fc , address=24 '\030',
flag=flag@entry=WIRE_WRITE_FLAG)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino/SERCOM.cpp:491
#1 0x000026e0 in TwoWire::endTransmission (this=this@entry=0x20000188 , stopBit=stopBit@entry=true)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire/Wire.cpp:124
#2 0x00002738 in TwoWire::endTransmission (this=this@entry=0x20000188 )
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire/Wire.cpp:151
#3 0x00002228 in loop ()
at C:\Users\Olivier\Documents\GitHub\Arduino-I2C-2way-anydata\Master_test_i2c_data2/Master_test_i2c_data2.ino:77
#4 0x000037de in main ()
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino/main.cpp:51

485 bool SERCOM::startTransmissionWIRE(uint8_t address, SercomWireReadWriteFlag flag)
486 {
487 // 7-bits address + 1-bits R/W
488 address = (address << 0x1ul) | flag;
489
490 // Wait idle or owner bus mode
491 while ( !isBusIdleWIRE() && !isBusOwnerWIRE() );
492
493 // Send start and address
494 sercom->I2CM.ADDR.bit.ADDR = address;

@lupalby
Copy link

lupalby commented Apr 5, 2018

When it hangs, try to manually retrieve the value of the registers checked by isBusIdleWIRE() and by isBusOwnerWIRE().
Check the datasheet of the microcontroller (Atmel SAMD21). What bits are active and what do they mean? It should tell you what status the microcontroller get stuck into, and that can help you understand why it got there.
Let me know what you find

@kb-
Copy link
Author

kb- commented Apr 8, 2018

I've finally been able to do more debugging.
At hang time, sercom->I2CM.STATUS.bit.BUSSTATE == 3, meaning bus BUSY

  // Wait idle or owner bus mode
  while ( !isBusIdleWIRE() && !isBusOwnerWIRE() );  //!false&&!false at hang time

bool SERCOM::isBusIdleWIRE( void )
{
  return sercom->I2CM.STATUS.bit.BUSSTATE == WIRE_IDLE_STATE; //WIRE_IDLE_STATE==1
}
bool SERCOM::isBusOwnerWIRE( void )
{
  return sercom->I2CM.STATUS.bit.BUSSTATE == WIRE_OWNER_STATE;  //WIRE_OWNER_STATE==2
}

p454 from http://ww1.microchip.com/downloads/en/DeviceDoc/40001882A.pdf

When the bus is IDLE it is ready for a new transaction. If a start condition is issued on the bus by another I2C master in a multi-master setup, the bus becomes BUSY (0b11). The bus will re-enter IDLE either when a stop condition is detected, or when a time-out occurs (inactive bus time-out needs to be configured).
If a start condition is generated internally by writing the Address bit group in the Address register
(ADDR.ADDR) while IDLE, the OWNER state (0b10) is entered. If the complete transaction was
performed without interference, i.e., arbitration was not lost, the I2C master can issue a stop condition,
which will change the bus state back to IDLE.
However, if a packet collision is detected while in OWNER state, the arbitration is assumed lost and the
bus state becomes BUSY until a stop condition is detected. A repeated start condition will change the bus
state only if arbitration is lost while issuing a repeated start.
Regardless of winning or losing arbitration, the entire address will be sent. If arbitration is lost, only 'ones'
are transmitted from the point of losing arbitration and the rest of the address length.
Note: Violating the protocol may cause the I2C to hang. If this happens it is possible to recover from this
state by a software reset (CTRLA.SWRST='1').

There should be a way to set a time-out
p487

Bits 29:28 – INACTOUT[1:0]: Inactive Time-Out
If the inactive bus time-out is enabled and the bus is inactive for longer than the time-out setting, the bus
state logic will be set to idle. An inactive bus arise when either an I2C master or slave is holding the SCL
low.

Normal master1 to slave timings:
normal

At hang time:
at hang

@kb-
Copy link
Author

kb- commented Apr 8, 2018

Adding this prevents it from hanging at line 491 (while ( !isBusIdleWIRE() && !isBusOwnerWIRE() );

void SERCOM::enableWIRE() { // Set inactive time-out (prevent bus hang in multi-master set-up) sercom->I2CM.CTRLA.bit.INACTOUT = 3 ;

But now it hangs in SERCOM::sendDataMasterWIRE again, at while(!sercom->I2CM.INTFLAG.bit.MB), while the other master continues transmitting

0x000032bc in SERCOM::sendDataMasterWIRE (this=this@entry=0x200002fc , data=)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\SERCOM.cpp:548
548 while(!sercom->I2CM.INTFLAG.bit.MB) {
+bt
#0 0x000032bc in SERCOM::sendDataMasterWIRE (this=this@entry=0x200002fc , data=)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\SERCOM.cpp:548
#1 0x0000270e in TwoWire::endTransmission (this=this@entry=0x20000188 , stopBit=stopBit@entry=true)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire\Wire.cpp:134
#2 0x00002738 in TwoWire::endTransmission (this=this@entry=0x20000188 ) at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire\Wire.cpp:151
#3 0x00002228 in loop () at C:\Users\Olivier\Documents\GitHub\Arduino-I2C-2way-anydata\Master_test_i2c_data/Master_test_i2c_data.ino:77
#4 0x000037ea in main () at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\main.cpp:51

At hang time:
at hang2

(my bus is pretty occupied to see if the code can recover from collisions)
zoom out

@lupalby
Copy link

lupalby commented Apr 8, 2018

Do you mind explaining more in details the architecture of your system? How many masters? How many slaves? What are their addresses? Who is talking to who? How often?
Are you using all Arduinos? Are you using any modified version of libraries or drivers?
Sharing the entire project code would definitely help, if you can.

From the second screenshot you sent, you can see that after the end of the first message there is no repeated start or stop condition, but another byte is sent that looks like the address byte. If some device (including the current owner of the bus) is trying to begin a new message, that is done in violation of the protocol, because there is no stop bit that defines the end of the previous message. The reason why SDA is low is probably because the slave is giving the ack to the master, but the master never checks it (only 8 SCL pulses at the end, vs 9 of all the other bytes). The master never releases the bus so everything gets stuck.

@kb-
Copy link
Author

kb- commented Apr 8, 2018

There are 2 Masters on the bus (Sparkfun SAMD21 Mini). I replaced the shipped Wire and Sercom versions (which were hanging too) by the version in your Git with the fix + the line from my last post. I didn't set an address (the communication didn't work with 1 year ago's libraries if I set it, I don't know now)

There is 1 Slave on the bus (Nano Zero stepper, a SAMD21 based board; shipped libraries; address 12). I also tried with an Arduino Mega (non modified Arduino libraries). I never noticed hangs on the Slave with any of them.

Both Masters are talking to the Slave every 10 ms (I tried longer values but it just takes longer to hang), at 100 kHz bus clock. The Slave responds each time it gets a message from one of the Masters. The response is read by both Masters, who check if they're getting the correct data (around 99% of the transfers)

I uploaded the current version. Its only purpose is testing the Multimaster connection (to reuse in another project if I get it to be reliable). Arduino-I2C-2way-anydata 2-masters-1-slave

I guess communication collisions led to the situation in my screenshot. But shouldn't the libraries handle that? The SAMD21 manual tells about a way to recover from violations of the protocol. But I don't know how to use it. Setting sercom->I2CM.CTRLA.bit.SWRST=1 with the debugger didn't recover from the hang.

Violating the protocol may cause the I2C to hang. If this happens it is possible to recover from this
state by a software reset (CTRLA.SWRST='1').

@kb-
Copy link
Author

kb- commented Apr 24, 2018

I'm making a little progress but I'm not there yet...

I made 2 more modifications to the Sercom library:

The first to recover from loss of arbitration during data transfer from 1 master to the slave. It reduced the hangs from every 1000 transfers on average to 30000
kb-@8b79ff5#diff-1dc5324c3227aa55a3428c25dc0651a3 line 563

The second to recover from loss of arbitration during Transmitting Address Packets. It reduced the hangs from every 30000 transfers on average to 170000-210000.
kb-@b8f4af6#diff-1dc5324c3227aa55a3428c25dc0651a3

There's still an issue left... both masters hung after 170000 to 210000 transfers. The code hangs in startTransmissionWIRE at while( !sercom->I2CM.INTFLAG.bit.MB )
At that time, INTFLAG.ERROR==1 despite LENERR, SEXTTOUT, MEXTTOUT, LOWTOUT, ARBLOST, and BUSERR == 0. This doesn't match the chapter "Bit 7 – ERROR: Error" at p494 of the SAMD21 manual. BUSSTATE is busy at that time.

0x000032b0 in SERCOM::startTransmissionWIRE (this=0x20000534 , address=24 '\030', flag=flag@entry=WIRE_WRITE_FLAG)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\SERCOM.cpp:510
warning: Source file is more recent than executable.
510 while( !sercom->I2CM.INTFLAG.bit.MB )
+l
505 sercom->I2CM.ADDR.bit.ADDR = address;
506
507 // Address Transmitted
508 if ( flag == WIRE_WRITE_FLAG ) // Write mode
509 {
510 while( !sercom->I2CM.INTFLAG.bit.MB )
511 {
512 // Wait transmission complete
513 }
514 // Check for loss of arbitration (multiple masters starting communication at the same time)
+bt
#0 0x000032b0 in SERCOM::startTransmissionWIRE (this=0x20000534 , address=24 '\030', flag=flag@entry=WIRE_WRITE_FLAG)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\SERCOM.cpp:510
#1 0x00002752 in TwoWire::endTransmission (this=this@entry=0x20000188 , stopBit=stopBit@entry=true)
at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire\Wire.cpp:124
#2 0x000027b6 in TwoWire::endTransmission (this=this@entry=0x20000188 ) at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\libraries\Wire\Wire.cpp:151
#3 0x00002228 in loop () at C:\Users\Olivier\Documents\GitHub\Arduino-I2C-2way-anydata\Master_test_i2c_data/Master_test_i2c_data.ino:77
#4 0x00003ad6 in main () at C:\Users\Olivier\AppData\Local\Arduino15\packages\SparkFun\hardware\samd\1.4.0\cores\arduino\main.cpp:51
+p sercom->I2CS.ADDR.reg
$1 = 24
+p sercom->USART.CTRLA.reg
$2 = 1891631126
+p sercom->I2CM.STATUS.bit.BUSSTATE
$3 = 3
+p sercom->USART.INTFLAG.bit.ERROR
$4 = 1 '\001'
+p sercom->I2CM.STATUS.bit.ARBLOST
$5 = 0
+p sercom->I2CM.STATUS.bit.BUSERR
$6 = 0
+p sercom->I2CM.STATUS.bit.LENERR
$7 = 0
+p sercom->I2CM.STATUS.bit.LOWTOUT
$8 = 0
+p sercom->I2CM.STATUS.bit.SEXTTOUT
$9 = 0
+p sercom->I2CM.STATUS.bit.MEXTTOUT
$10 = 0
+p sercom->I2CM.STATUS.bit.RXNACK
$11 = 0
+p sercom->I2CM.STATUS.bit.CLKHOLD
$12 = 0
+p sercom->I2CM.INTFLAG.bit.MB
$13 = 0 '\000'
+p sercom->I2CM.INTFLAG.bit.SB
$14 = 0 '\000'
+p sercom->I2CS.ADDR.bit.TENBITEN
$15 = 0
+p sercom->I2CM.CTRLA.bit.INACTOUT
$16 = 3
+p sercom->I2CM.CTRLA.bit.LOWTOUTEN
$17 = 1
+p sercom->I2CM.CTRLA.bit.SEXTTOEN
$18 = 1
+p sercom->I2CM.CTRLA.bit.MEXTTOEN
$19 = 1
+p sercom->I2CM.CTRLB.bit.CMD
$20 = 0

I tried removing while( !sercom->I2CM.INTFLAG.bit.MB ). It stops hanging when I do this, but the data isn't transferred properly to the slave. My logic analyzer shows that the first bit after the address is missing.

Maybe the state change of I2CM.INTLAG.bit.MB sometimes isn't recognized because it's not checked at the right time in the sercom library?
In the examples codes from Atmel SERCOM guide for the SAMD21, this check occurs in an interrupt:
/* SERCOM2 I2C handler */ void SERCOM2_Handler(void) { /* Slave on bus interrupt checking */ if (SERCOM2->I2CM.INTFLAG.bit.SB) {...

@kb-
Copy link
Author

kb- commented Feb 3, 2019

I never got this to work.

It might be linked to the bug described here: http://misfittech.net/blog/samd20-i2c-slave-bug/

@sampscl
Copy link

sampscl commented Feb 8, 2019

I have very similar problems (see issue 9 at platformio), in particular my logic analyzer capture. I think there are 2 problems here: first, the SERCOM code assumes that it doesn't need to check the ERROR flag and second, the Wire and SERCOM code do not deal with misbehaving I2C devices and so the bus can enter a state that is valid for the MCU but not addressed in the library code.

I should have some work towards a working patch next week.

@sampscl
Copy link

sampscl commented Feb 12, 2019

I ended up with code that checks the bus state before any read or write. Like this:

if(SERCOM3->I2CM.STATUS.bit.BUSSTATE == WIRE_BUSY_STATE) { reset_i2c_bus(); }

In effect, if the bus is busy but should not be, reset (calls end() and begin()). This will not work correctly with multiple masters though it did prevent the deadlock I was experiencing. I'll make a pull request to the Arduino project to incorporate this more cleanly -- you shouldn't have to dig into registers.

@cheetor5923
Copy link

I'm getting the problem too, except on a much simpler configuration. Its on the IndIO board of the Industruino - https://static.industruino.com/downloads/diagrams/Industruino_INDIO_D21G_pinout_map_April'17_S.pdf

boseji pushed a commit to go-ut/combined-ArduinoCore-samd that referenced this issue May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants