-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[201911][patch] mlxsw: i2c: Prevent transaction execution for special chip st… #278
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
From 392127480c91a26dae2449288229d4957213dcc1 Mon Sep 17 00:00:00 2001 | ||
From: Vadim Pasternak <vadimp@nvidia.com> | ||
Date: Tue, 31 May 2022 10:27:38 +0300 | ||
Subject: [PATCH v4.9 ISSU WA 1/1] TMP: mlxsw: i2c: Prevent transaction | ||
execution for special chip states | ||
|
||
Do not run transaction in cases chip is in reset or in-service update | ||
states. | ||
In such case firmware is not accessible and will reject transaction | ||
with the relevant status "RUNNING_RESET" or "FW_ISSU_ONGOING". | ||
In case transaction is failed do to one of these reasons, stop sending | ||
transactions. In such case driver is about to be removed since it | ||
cannot continue running after reset or in-service update. And | ||
re-probed again after reset or in-service update is completed. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that commit upstream? Please explain the TMP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is not and it cannot be upstream. this is a way for bug fix and we wont be able to upstream it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not understand. Please elaborate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upstream mlxsw driver doesn't support ISSU, so we can't submit ISSU related filtering to upstream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only 4.19(?) or all versions? The policy of SONiC is that long term all patches have to be upstream to lessen the maintenance burden especially when updating to newer Linux kernel releases. Can you please add that to the patch description? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the repository, I did There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Within SONiC, ISSU may be called warm reboot or warmboot. Some more details are here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you: ISSU = In-Service-Software-Upgrade |
||
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> | ||
--- | ||
drivers/net/ethernet/mellanox/mlxsw/cmd.h | 4 ++++ | ||
drivers/net/ethernet/mellanox/mlxsw/i2c.c | 29 ++++++++++++++++++++--- | ||
2 files changed, 30 insertions(+), 3 deletions(-) | ||
|
||
diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h | ||
index 0772e4339b33..6329ebb09523 100644 | ||
--- a/drivers/net/ethernet/mellanox/mlxsw/cmd.h | ||
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h | ||
@@ -149,6 +149,8 @@ enum mlxsw_cmd_status { | ||
MLXSW_CMD_STATUS_BAD_NVMEM = 0x0B, | ||
/* Device is currently running reset */ | ||
MLXSW_CMD_STATUS_RUNNING_RESET = 0x26, | ||
+ /* FW ISSU ongoing. */ | ||
+ MLXSW_CMD_STATUS_FW_ISSU = 0x27, | ||
/* Bad management packet (silently discarded). */ | ||
MLXSW_CMD_STATUS_BAD_PKT = 0x30, | ||
}; | ||
@@ -180,6 +182,8 @@ static inline const char *mlxsw_cmd_status_str(u8 status) | ||
return "BAD_NVMEM"; | ||
case MLXSW_CMD_STATUS_RUNNING_RESET: | ||
return "RUNNING_RESET"; | ||
+ case MLXSW_CMD_STATUS_FW_ISSU: | ||
+ return "FW_ISSU_ONGOING"; | ||
case MLXSW_CMD_STATUS_BAD_PKT: | ||
return "BAD_PKT"; | ||
default: | ||
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c | ||
index e04d521d9376..1205c6111b0b 100644 | ||
--- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c | ||
+++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c | ||
@@ -62,6 +62,7 @@ | ||
* @core: switch core pointer; | ||
* @bus_info: bus info block; | ||
* @block_size: maximum block size allowed to pass to under layer; | ||
+ * @status: status to indicate chip reset or in-service update; | ||
*/ | ||
struct mlxsw_i2c { | ||
struct { | ||
@@ -75,6 +76,7 @@ struct mlxsw_i2c { | ||
struct mlxsw_core *core; | ||
struct mlxsw_bus_info bus_info; | ||
u16 block_size; | ||
+ u8 status; | ||
}; | ||
|
||
#define MLXSW_I2C_READ_MSG(_client, _addr_buf, _buf, _len) { \ | ||
@@ -221,6 +223,19 @@ static int mlxsw_i2c_write_cmd(struct i2c_client *client, | ||
return 0; | ||
} | ||
|
||
+static bool | ||
+mlxsw_i2c_cmd_status_verify(struct device *dev, struct mlxsw_i2c *mlxsw_i2c, | ||
+ u8 status) | ||
+{ | ||
+ if (status == MLXSW_CMD_STATUS_FW_ISSU || | ||
+ status == MLXSW_CMD_STATUS_RUNNING_RESET) { | ||
+ mlxsw_i2c->status = status; | ||
+ dev_info(dev, "FW status=%x(%s)): Access to device is not allowed in this state\n", status, mlxsw_cmd_status_str(status)); | ||
+ return true; | ||
+ } | ||
+ return false; | ||
+} | ||
+ | ||
/* Routine posts initialization command to ASIC through mail box. */ | ||
static int | ||
mlxsw_i2c_write_init_cmd(struct i2c_client *client, | ||
@@ -404,6 +419,10 @@ mlxsw_i2c_cmd(struct device *dev, u16 opcode, u32 in_mod, size_t in_mbox_size, | ||
|
||
WARN_ON(in_mbox_size % sizeof(u32) || out_mbox_size % sizeof(u32)); | ||
|
||
+ /* Do not run transaction if chip is in reset or in-service update state. */ | ||
+ if (mlxsw_i2c->status) | ||
+ return 0; | ||
+ | ||
if (in_mbox) { | ||
reg_size = mlxsw_i2c_get_reg_size(in_mbox); | ||
num = reg_size / mlxsw_i2c->block_size; | ||
@@ -478,6 +497,8 @@ mlxsw_i2c_cmd(struct device *dev, u16 opcode, u32 in_mod, size_t in_mbox_size, | ||
|
||
cmd_fail: | ||
mutex_unlock(&mlxsw_i2c->cmd.lock); | ||
+ if (mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, *status)) | ||
+ err = 0; | ||
return err; | ||
} | ||
|
||
@@ -607,14 +628,16 @@ static int mlxsw_i2c_probe(struct i2c_client *client, | ||
/* Wait until go bit is cleared. */ | ||
err = mlxsw_i2c_wait_go_bit(client, mlxsw_i2c, &status); | ||
if (err) { | ||
- dev_err(&client->dev, "HW semaphore is not released"); | ||
+ if (!mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, status)) | ||
+ dev_err(&client->dev, "HW semaphore is not released"); | ||
goto errout; | ||
} | ||
|
||
/* Validate transaction completion status. */ | ||
if (status) { | ||
- dev_err(&client->dev, "Bad transaction completion status %x\n", | ||
- status); | ||
+ if (!mlxsw_i2c_cmd_status_verify(&client->dev, mlxsw_i2c, status)) | ||
+ dev_err(&client->dev, "Bad transaction completion status %x\n", | ||
+ status); | ||
err = -EIO; | ||
goto errout; | ||
} | ||
-- | ||
2.20.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the TMP tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the TMP tag needed (as in, will this bug fix be removed in the future), or is it expected to be permanent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expected to be permanent.
We want to mark it as downstream patch.
There are a lot of not up-streamed patches in Sonic kernel, But there is no convention in Sonic to tag such patches.
"TMP" in patch name could be replaced with "DS" or "Downstream" or with something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have "Sonic-specific" or "Sonic-only" in the patch rather than TMP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Sonic-specific" sounds better to me. Can we close with this name?
I suggest to use this tag for all new-coming patches (not only from Nvidia), which are not going to be up-streamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe such a tag is unnecessary. The README suggests, that patches are not wanted, that are not going to be upstreamed. The only exception is:
Why were the patches accepted to be in SONiC in the first place. I really thought these were only backports, and it’s not made easy for reviewers to find information about the feature (ISSU), and why it’s not going to be upstreamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ISSU feature in up-stream driver, so this patch can't be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there are a number of patches here that should be upstreamed. However, there can still be patches that are not useful to be upstreamed, such as patches that would break the general usage of the Linux kernel, or some hack that really shouldn't be upstreamed.
In this case, since this is functionality that is not supported by the upstream driver, it wouldn't make sense to commit this bug fix for functionality that's not even present in the upstream Linux kernel. If this changes (i.e. ISSU/warmboot/equivalent functionality) becomes available in the upstream kernel, then this should be upstreamed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, but I still do not understand. Does SONiC add a special module/patch to support ISSU in it’s Linux kernel build? Where are those patches? I could apply this patch here (for current SONiC branch) to the upstream Linux kernel source just fine.
MLXSW_CMD_STATUS_FW_ISSU
sounds to me like it’s support in the firmware and independent from the Linux kernel, so should be submitted upstream.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since upstream doesn’t support ISSU, it means nobody can set FW to ISSU and FW will never get MLXSW_CMD_STATUS_FW_ISSU. That’s we it can’t be up-streamed.