Skip to content

Commit

Permalink
mlxsw: core: Use variable timeout for EMAD retries
Browse files Browse the repository at this point in the history
The driver sends Ethernet Management Datagram (EMAD) packets to the
device for configuration purposes and waits for up to 200ms for a reply.
A request is retried up to 5 times.

When the system is under heavy load, replies are not always processed in
time and EMAD transactions fail.

Make the process more robust to such delays by using exponential
backoff. First wait for up to 200ms, then retransmit and wait for up to
400ms and so on.

Fixes: caf7297 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Reported-by: Denis Yulevich <denisyu@nvidia.com>
Tested-by: Denis Yulevich <denisyu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
idosch authored and kuba-moo committed Nov 18, 2020
1 parent fb738b9 commit 1f492ea
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion drivers/net/ethernet/mellanox/mlxsw/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,8 @@ static void mlxsw_emad_trans_timeout_schedule(struct mlxsw_reg_trans *trans)
if (trans->core->fw_flash_in_progress)
timeout = msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_DURING_FW_FLASH_MS);

queue_delayed_work(trans->core->emad_wq, &trans->timeout_dw, timeout);
queue_delayed_work(trans->core->emad_wq, &trans->timeout_dw,
timeout << trans->retries);
}

static int mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core,
Expand Down

0 comments on commit 1f492ea

Please sign in to comment.