Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[201911][patch] mlxsw: i2c: Prevent transaction execution for special chip st… #278

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions patch/0080-TMP-mlxsw-i2c-Prevent-transaction-execution-for-spec.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
From 392127480c91a26dae2449288229d4957213dcc1 Mon Sep 17 00:00:00 2001
From: Vadim Pasternak <vadimp@nvidia.com>
Date: Tue, 31 May 2022 10:27:38 +0300
Subject: [PATCH v4.9 ISSU WA 1/1] TMP: mlxsw: i2c: Prevent transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the TMP tag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the TMP tag needed (as in, will this bug fix be removed in the future), or is it expected to be permanent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected to be permanent.
We want to mark it as downstream patch.
There are a lot of not up-streamed patches in Sonic kernel, But there is no convention in Sonic to tag such patches.
"TMP" in patch name could be replaced with "DS" or "Downstream" or with something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have "Sonic-specific" or "Sonic-only" in the patch rather than TMP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Sonic-specific" sounds better to me. Can we close with this name?
I suggest to use this tag for all new-coming patches (not only from Nvidia), which are not going to be up-streamed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe such a tag is unnecessary. The README suggests, that patches are not wanted, that are not going to be upstreamed. The only exception is:

Platform specific kernel modules which are impossible or very difficult to be built out of kernel tree.

Why were the patches accepted to be in SONiC in the first place. I really thought these were only backports, and it’s not made easy for reviewers to find information about the feature (ISSU), and why it’s not going to be upstreamed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no ISSU feature in up-stream driver, so this patch can't be accepted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there are a number of patches here that should be upstreamed. However, there can still be patches that are not useful to be upstreamed, such as patches that would break the general usage of the Linux kernel, or some hack that really shouldn't be upstreamed.

In this case, since this is functionality that is not supported by the upstream driver, it wouldn't make sense to commit this bug fix for functionality that's not even present in the upstream Linux kernel. If this changes (i.e. ISSU/warmboot/equivalent functionality) becomes available in the upstream kernel, then this should be upstreamed as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but I still do not understand. Does SONiC add a special module/patch to support ISSU in it’s Linux kernel build? Where are those patches? I could apply this patch here (for current SONiC branch) to the upstream Linux kernel source just fine.

$ git am ~/src/sonic-linux-kernel/patch/0064-TMP-mlxsw-i2c-Prevent-transaction-execution-for.patch # nvidia-stepan/202106-prevent-transaction
Wende an: TMP: mlxsw: i2c: Prevent transaction execution for special chip states

MLXSW_CMD_STATUS_FW_ISSU sounds to me like it’s support in the firmware and independent from the Linux kernel, so should be submitted upstream.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since upstream doesn’t support ISSU, it means nobody can set FW to ISSU and FW will never get MLXSW_CMD_STATUS_FW_ISSU. That’s we it can’t be up-streamed.

execution for special chip states

Do not run transaction in cases chip is in reset or in-service update
states.
In such case firmware is not accessible and will reject transaction
with the relevant status "RUNNING_RESET" or "FW_ISSU_ONGOING".
In case transaction is failed do to one of these reasons, stop sending
transactions. In such case driver is about to be removed since it
cannot continue running after reset or in-service update. And
re-probed again after reset or in-service update is completed.

Copy link
Contributor

@paulmenzel paulmenzel Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that commit upstream? Please explain the TMP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not and it cannot be upstream. this is a way for bug fix and we wont be able to upstream it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand. Please elaborate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream mlxsw driver doesn't support ISSU, so we can't submit ISSU related filtering to upstream.

Copy link
Contributor

@paulmenzel paulmenzel Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 4.19(?) or all versions? The policy of SONiC is that long term all patches have to be upstream to lessen the maintenance burden especially when updating to newer Linux kernel releases.

Can you please add that to the patch description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the repository, I did git grep -i issu, and nothing showed up regarding ISSU. Where can I find more information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within SONiC, ISSU may be called warm reboot or warmboot. Some more details are here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you: ISSU = In-Service-Software-Upgrade

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/cmd.h | 4 ++++
drivers/net/ethernet/mellanox/mlxsw/i2c.c | 29 ++++++++++++++++++++---
2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
index 0772e4339b33..6329ebb09523 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/cmd.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
@@ -149,6 +149,8 @@ enum mlxsw_cmd_status {
MLXSW_CMD_STATUS_BAD_NVMEM = 0x0B,
/* Device is currently running reset */
MLXSW_CMD_STATUS_RUNNING_RESET = 0x26,
+ /* FW ISSU ongoing. */
+ MLXSW_CMD_STATUS_FW_ISSU = 0x27,
/* Bad management packet (silently discarded). */
MLXSW_CMD_STATUS_BAD_PKT = 0x30,
};
@@ -180,6 +182,8 @@ static inline const char *mlxsw_cmd_status_str(u8 status)
return "BAD_NVMEM";
case MLXSW_CMD_STATUS_RUNNING_RESET:
return "RUNNING_RESET";
+ case MLXSW_CMD_STATUS_FW_ISSU:
+ return "FW_ISSU_ONGOING";
case MLXSW_CMD_STATUS_BAD_PKT:
return "BAD_PKT";
default:
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
index e04d521d9376..1205c6111b0b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
@@ -62,6 +62,7 @@
* @core: switch core pointer;
* @bus_info: bus info block;
* @block_size: maximum block size allowed to pass to under layer;
+ * @status: status to indicate chip reset or in-service update;
*/
struct mlxsw_i2c {
struct {
@@ -75,6 +76,7 @@ struct mlxsw_i2c {
struct mlxsw_core *core;
struct mlxsw_bus_info bus_info;
u16 block_size;
+ u8 status;
};

#define MLXSW_I2C_READ_MSG(_client, _addr_buf, _buf, _len) { \
@@ -221,6 +223,19 @@ static int mlxsw_i2c_write_cmd(struct i2c_client *client,
return 0;
}

+static bool
+mlxsw_i2c_cmd_status_verify(struct device *dev, struct mlxsw_i2c *mlxsw_i2c,
+ u8 status)
+{
+ if (status == MLXSW_CMD_STATUS_FW_ISSU ||
+ status == MLXSW_CMD_STATUS_RUNNING_RESET) {
+ mlxsw_i2c->status = status;
+ dev_info(dev, "FW status=%x(%s)): Access to device is not allowed in this state\n", status, mlxsw_cmd_status_str(status));
+ return true;
+ }
+ return false;
+}
+
/* Routine posts initialization command to ASIC through mail box. */
static int
mlxsw_i2c_write_init_cmd(struct i2c_client *client,
@@ -404,6 +419,10 @@ mlxsw_i2c_cmd(struct device *dev, u16 opcode, u32 in_mod, size_t in_mbox_size,

WARN_ON(in_mbox_size % sizeof(u32) || out_mbox_size % sizeof(u32));

+ /* Do not run transaction if chip is in reset or in-service update state. */
+ if (mlxsw_i2c->status)
+ return 0;
+
if (in_mbox) {
reg_size = mlxsw_i2c_get_reg_size(in_mbox);
num = reg_size / mlxsw_i2c->block_size;
@@ -478,6 +497,8 @@ mlxsw_i2c_cmd(struct device *dev, u16 opcode, u32 in_mod, size_t in_mbox_size,

cmd_fail:
mutex_unlock(&mlxsw_i2c->cmd.lock);
+ if (mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, *status))
+ err = 0;
return err;
}

@@ -607,14 +628,16 @@ static int mlxsw_i2c_probe(struct i2c_client *client,
/* Wait until go bit is cleared. */
err = mlxsw_i2c_wait_go_bit(client, mlxsw_i2c, &status);
if (err) {
- dev_err(&client->dev, "HW semaphore is not released");
+ if (!mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, status))
+ dev_err(&client->dev, "HW semaphore is not released");
goto errout;
}

/* Validate transaction completion status. */
if (status) {
- dev_err(&client->dev, "Bad transaction completion status %x\n",
- status);
+ if (!mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, status))
+ dev_err(&client->dev, "Bad transaction completion status %x\n",
+ status);
err = -EIO;
goto errout;
}
--
2.20.1
1 change: 1 addition & 0 deletions patch/series
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ linux-4.13-thermal-intel_pch_thermal-Fix-enable-check-on.patch
0077-mlxsw-core-Increase-critical-threshold-for-ASIC-ther.patch
0078-mlxsw-core-Add-validation-of-transceiver-temperature.patch
0079-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
0080-TMP-mlxsw-i2c-Prevent-transaction-execution-for-spec.patch
linux-4.16-firmware-dmi-handle-missing-DMI-data-gracefully.patch
mellanox-backport-introduce-psample-a-new-genetlink-channel.patch
mellanox-backport-introduce-tc-sample-action.patch
Expand Down