Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periph_common/init.c: unblock i2c bus #20787

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DanielLockau-MLPA
Copy link
Contributor

@DanielLockau-MLPA DanielLockau-MLPA commented Jul 15, 2024

EDITED on 2024-08-13 (changed description)
EDITED on 2024-08-27 (patch file upload for testing)

Contribution description

If a transaction is interrupted an external i2c device can remain in an unwanted state where it drives the bus to low
and will not release it without action. This can happen whenever a MCU has a software reset but no power cycle
is done on any part of the board.

This PR suggests a bus recovery for the i2c peripheral of sam0_common, during i2c_init or `_i2c_start.

The basic mechanism for unblocking the bus is to "clock" it free until the state machine of the
corresponding device goes to idle and releases the bus. According to this reference (Section "2-wire software reset"),
a maximum of 9 clock cycles should be able to do this. The additional start and stop conditions might not be
required as other sources like this one
do not mention it. As they interfere with the error condition where a device has control over SDA by pulling it low,
this PR chooses only to use SDA as an input and intends to make a blocking i2c device release the bus.

Testing procedure

Reproduction was possible using a SAME54-XPRO, using tests/drivers/at24cxxx with the following changes applied:
i2c_unblock_test.zip (patch file in archive due to upload restrictions)

With the test patch applied, the program will reset during a write transaction on the EEPROM. For testing the recovery during initialization has been removed by the patch and console output activated to initially report on the bus state before initialization. Recovery will still be executed after error detection in the first call of _i2c_start. For this reason the first test in the output below will fail for the first test. Activating recovery during initialization will make all tests pass at all times. Enabling more debug output will make it improbable to actually reproduce the error. This test has been tailored to one board and the current software changes.

The output after a POR can be found below. In the first iteration the bus is not blocked, in following iterations it is blocked after reset. Executing the tests requires a console input of s<Enter> to avoid an endless loop.

2024-08-13 16:41:52,171 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:41:52,183 # i2c.c: i2c bus #1 with configuration at 0x00004b88 is ok
2024-08-13 16:41:52,199 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:01,383 # STARTING
2024-08-13 16:42:01,386 # Starting tests for module at24cxxx
2024-08-13 16:42:01,388 # EEPROM size: 256 byte
2024-08-13 16:42:01,389 # Page size  : 16 byte
2024-08-13 16:42:01,391 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:01,396 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,398 # [SUCCESS] at24cxxx_write_byte
2024-08-13 16:42:01,402 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:01,405 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:01,407 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:01,411 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,416 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:01,418 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:01,424 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:01,425 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:01,426 # [SUCCESS] write/read
2024-08-13 16:42:01,441 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:01,455 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:01,471 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:03,639 # STARTING
2024-08-13 16:42:03,642 # Starting tests for module at24cxxx
2024-08-13 16:42:03,644 # EEPROM size: 256 byte
2024-08-13 16:42:03,645 # Page size  : 16 byte
2024-08-13 16:42:03,648 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:03,661 # [at24cxxx] i2c_write_regs(): -116; polls: 6
2024-08-13 16:42:03,664 # [FAILURE] at24cxxx_write_byte: -116
2024-08-13 16:42:03,668 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:03,670 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:03,673 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:03,678 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:03,682 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:03,684 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:03,688 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:03,690 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:03,692 # [SUCCESS] write/read
2024-08-13 16:42:03,705 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:03,719 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:03,735 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)
s
2024-08-13 16:42:20,735 # STARTING
2024-08-13 16:42:20,738 # Starting tests for module at24cxxx
2024-08-13 16:42:20,739 # EEPROM size: 256 byte
2024-08-13 16:42:20,741 # Page size  : 16 byte
2024-08-13 16:42:20,743 # [SUCCESS] at24cxxx_init
2024-08-13 16:42:20,757 # [at24cxxx] i2c_write_regs(): -116; polls: 6
2024-08-13 16:42:20,760 # [FAILURE] at24cxxx_write_byte: -116
2024-08-13 16:42:20,764 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:20,766 # [SUCCESS] at24cxxx_read_byte
2024-08-13 16:42:20,769 # [SUCCESS] write_byte/read_byte
2024-08-13 16:42:20,774 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:20,777 # [at24cxxx] i2c_write_regs(): 0; polls: 6
2024-08-13 16:42:20,780 # [SUCCESS] at24cxxx_write
2024-08-13 16:42:20,784 # [at24cxxx] i2c_read_regs(): 0; polls: 6
2024-08-13 16:42:20,786 # [SUCCESS] at24cxxx_read
2024-08-13 16:42:20,789 # [SUCCESS] write/read
2024-08-13 16:42:20,802 # i2c.c: i2c bus #0 with configuration at 0x00004b74 is ok
2024-08-13 16:42:20,816 # i2c.c: i2c bus hangup detected for bus #1 with configuration at 0x00004b88
2024-08-13 16:42:20,831 # main(): This is RIOT! (Version: 2024.10-devel-66-ga1c53d-dl/riot/20240715__periph_init__unblock_i2c)

Issues/PRs references

None.

@github-actions github-actions bot added the Area: drivers Area: Device drivers label Jul 15, 2024
@benpicco benpicco requested review from dylad and maribu July 15, 2024 15:45
@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation labels Jul 15, 2024
@maribu
Copy link
Member

maribu commented Jul 15, 2024

Both the already upstream ESP soft I2C bus as well as my PR for a soft I2C bus for based on GPIO LL have a recovery function to unblock the bus. They detect a locked bus at and recover it during normal operation, failing only the transfer that was affected by the issue.

IMO this is the better approach, as a bus can lock up during normal operation as well when the peripheral misses a clock cycle or the controller misses clock stretching.

IMO if a periph_i2c does not detect and recover from a locked bus, that is a bug in that driver and needs to be fixed there.

* we do not aim to mitigate here.
*/
gpio_init(i2c_config[dev].sda_pin, GPIO_IN);
gpio_init(i2c_config[dev].scl_pin, GPIO_OUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this will not work on all platforms as sda_pin/scl_pin is not used everywhere.

You can use i2c_pin_sda()/i2c_pin_scl() instead, but this requires the periph_i2c_reconfigure feature.

@riot-ci
Copy link

riot-ci commented Jul 15, 2024

Murdock results

✔️ PASSED

b0e02ff address comments

Success Failures Total Runtime
10192 0 10193 14m:43s

Artifacts

Copy link
Member

@dylad dylad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition !

@@ -73,6 +122,7 @@ void periph_init(void)
/* initialize configured I2C devices */
#ifdef MODULE_PERIPH_INIT_I2C
for (unsigned i = 0; i < I2C_NUMOF; i++) {
_i2c_sync_devices(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this into i2c_init() ?

There may be weird cases where users manually initialize their i2C peripherals without relying on auto_init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I'll check that. It will however require touching the i2c_init files of all cpus. Maybe it makes more sense to include it in this auto_init but also provide it in the form of a function whenever periph_i2c_reconfigure is available so that people can deliberately use it at any time if they want to manually set up things?

" recovering...\n", dev);
}
else {
DEBUG("periph_init: i2c bus #%u is ok\n", dev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bus is "OK" Aren't we suppose to skip the next clock ?

Copy link
Contributor Author

@DanielLockau-MLPA DanielLockau-MLPA Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clock is intended to be skipped for the gpio_read returning 1 in the while loop below. This section is only for purposes of debug output.

Maybe I am not getting correctly what you refer to as "next clock".

Copy link
Member

@dylad dylad Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get my morning coffee yet...
I mean "the next BLOCK".
If the bus is OK, couldn't we skip entirely the whole loop that comes after ?

maribu
maribu previously requested changes Jul 16, 2024
Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's step back a bit and reconsider the approach. I think this is motivated by a broken SAM I2C driver or broken SAM I2C peripheral.

Other I2C peripherals can detect a locked bus or at least have a timeout, that allows to unblock a blocked bus when it happens.

Let's not add a work around needed for broken SAM hardware to common code.

@DanielLockau-MLPA
Copy link
Contributor Author

@benpicco Where should I properly document this added behavior?

@dylad
Copy link
Member

dylad commented Jul 16, 2024

The whole idea can fit in a dedicated pseudomodule if we don't want to impact all MCUs.

@DanielLockau-MLPA
Copy link
Contributor Author

Other I2C peripherals can detect a locked bus or at least have a timeout, that allows to unblock a blocked bus when it happens.

@maribu Just to verify that we're talking about the same thing here: It is not the SAM I2C peripheral which is broken here. The state machine of the remote EEPROM is locked in a state where it pulls the bus low. But if there are some i2c peripherals for other MCUs which automatically recover from this behavior, I agree with your comment. Where would you put it then? board_init will only run after cpu_init, so periph_init is already done at that point in time.

We could add it as a pseudo-module and put it in the dependencies of selected CPUs and additionally make it available as a function if periph_i2c_reconfigure is used.

@maribu
Copy link
Member

maribu commented Jul 16, 2024

The state machine of the remote EEPROM is locked in a state where it pulls the bus low

That can happen with every peripheral I2C device, if it missing a clock transition. That most likely happens when the MCU is resetting in the middle of a transfer, but also at any point in time.

As a result, the MCU controller must check for a locked bus and recover. That may look like an arbitration lost event at first, but if SDA stays low, the bus is locked.

So: Either the I2C peripheral needs to monitor SDA after an arbitration lost even, or the driver should. This cannot be done in common code.

@benpicco
Copy link
Contributor

benpicco commented Jul 16, 2024

Looking at it, we never check the ERROR bit in INTFLAG - would that already tell us about the condition in _i2c_start()?

@DanielLockau-MLPA
Copy link
Contributor Author

I will rewrite this PR and include the unblocking into the driver in the sam0_common driver. As @benpicco indicates, the buserror interrupt/flag is present but currently not used.

@DanielLockau-MLPA DanielLockau-MLPA force-pushed the dl/riot/20240715__periph_init__unblock_i2c branch from c06da25 to a1c53d9 Compare August 13, 2024 14:35
@github-actions github-actions bot added Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: cpu Area: CPU/MCU ports and removed Area: drivers Area: Device drivers labels Aug 13, 2024
@DanielLockau-MLPA
Copy link
Contributor Author

DanielLockau-MLPA commented Aug 13, 2024

@maribu @dylad @benpicco

I brought the unlock mechanism to cpu/sam0_common/periph/i2c.c and attempt a recovery just before initializing the peripheral or on an error in _i2c_start.

Touching the i2c driver I noticed that part of the implementation was not in sync with the register description of the data sheet. In the current state of the peripheral, the implementation seemed to make sense after examining the registers on the device. I still chose to alter it in _i2c_start for more clarity. If you disagree with this, please drop me a line and I'll try to stay closer to the original implementation.

UPDATE: for testing, please note that I also altered the instructions in the description of this PR

@maribu maribu dismissed their stale review August 13, 2024 18:26

point of critic no longer applies

Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick peek, this looks good to me. I dismissed my prior review, as my concerns have been addressed.

cpu/sam0_common/periph/i2c.c Outdated Show resolved Hide resolved
cpu/sam0_common/periph/i2c.c Outdated Show resolved Hide resolved
DEBUG_PUTS("i2c.c: STATUS_ERR_TIMEOUT");
unblock_bus = true;
}
else if (dev->STATUS.bit.BUSSTATE == BUSSTATE_BUSY) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this verbose checking for readability.

I haven't checked, but I'm almost certain the compiler would under normal circumstances optimizes the more verbose code quite well, so that there is no performance penalty compared to less readable and more dense C code. However, these are not normal circumstances because SercomI2cm::STATUS is volatile and the compiler will be forces to re-load the memory on every access and not combine checks.

How about just reading out the status register into a local variable and check on that? I assume that this would yield optimal or near-optimal machine code. In addition, @dylad will be happy if we cease using the bitfield types (SERCOM_I2CM_STATUS_Type), as the CMSIS header files provided by Microchip are no longer generated with bitfield types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address your comments in my last commit. However, I also read the data sheet (same54/dt51 and saml21) more thoroughly and found a section on address transmission. I reorganized a few of the checks to adhere to that part. The original implementation on master does not completely conform to that part of the data sheet.

Comment on lines 445 to 465
bool any_error = dev->STATUS.reg & (
SERCOM_I2CM_STATUS_SEXTTOUT /* XXX: do we need this? */ |
SERCOM_I2CM_STATUS_ARBLOST | SERCOM_I2CM_STATUS_BUSERR
/* not included as always active (hardware bug?) in the
* current configuration:
* SERCOM_I2CM_STATUS_MEXTTOUT, SERCOM_I2CM_STATUS_LOWTOUT
*
* not included as only useful with DMA:
* SERCOM_I2CM_STATUS_LENERR
*/
) || (
/* bus state should not be unknown at this point */
(dev->STATUS.reg & SERCOM_I2CM_STATUS_BUSSTATE_Msk) == BUSSTATE_UNKNOWN
) || (
/* we don't handle multi-master layouts and treat this as error */
(dev->STATUS.reg & SERCOM_I2CM_STATUS_BUSSTATE_Msk) == BUSSTATE_BUSY
) ||
(
/* slave busy or address not on bus */
dev->STATUS.reg & SERCOM_I2CM_STATUS_RXNACK
) || res;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to dev->STATUS being volatile, this will force the compiler to re-load that register for every check. Saving that into a temporary variable would allow the compiler to allocate this in a register and keep accessing that register for the bit checks without re-loading the register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last commit I copy the register for read access.

@benpicco
Copy link
Contributor

If this fixes the bug for you now, please squash!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants