Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Dell issue#46 : Adding MUX reset logic to fix probe failures #2356

Merged
merged 2 commits into from
Jan 8, 2019
Merged

Address Dell issue#46 : Adding MUX reset logic to fix probe failures #2356

merged 2 commits into from
Jan 8, 2019

Conversation

padmanarayana
Copy link
Contributor

- What I did

Address Dell issue #46 : sonic-net/sonic-platform-modules-dell#46

- How I did it

After adding a mux, perform extra sanity checks to verify if the channels do get created. If not, retry with an increasing delay per retry upto 2 secs.

- How to verify it

Kept a mux under reset to attempt to add mux which results in a similar symptoms as was seen when the probe failed. Verified the mux reset during the retry as well as the sanity checks post every mux creation.

- Description for the changelog

  1. Added a new function in dell_i2c_utils.sh:i2c_mux_create() which is used for mux creation. This function, after creating a mux device also sanity checks if the channels are created. There is a retry mechanism (with increasing delay per retry) along with a mux reset as part of retry.
  2. S6100 needed to expose the reset register (to reset the IOM based muxes)
  3. io_rd_wr.py is added to Z9100 (and is made common) since LPC access is necessary to reset the CPU & switchboard muxes on both platforms.

- A picture of a cute animal (not mandatory but encouraged)

@lguohan
Copy link
Collaborator

lguohan commented Dec 8, 2018

@stcheng, any update on this pr?

Copy link
Contributor

@stcheng stcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the comments, thanks!

# "echo [mux driver] <i2c-mux-address> > <i2c-bus/operation>" i2c-channel-first
# where operation = "new_device"
# i2c-channel-first is the first of the 8 channels that this mux should create
i2c_mux_create() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, this piece of code is a little bit hard to understand. since the i2c tree is fixed on each platform, it would be better to have some more predefined numbers instead of number manipulation here.

the function could be simpler like i2c_mux_create <root> <sub_root> <first>
so that in the script you only need to call the function e.g. i2c_mux_create 4 0x71 10 or i2c_mux_create 7 0x71 50 or i2c_mux_create 7 0x72 58

the echo command and the details of the retry could be inside the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this.

i2c_bus_num=`echo $(basename $i2c_bus) | cut -d'-' -f 2`
i2c_mux=`echo "$1" | cut -d' ' -f 3 | sed 's/^0x//'`
i2c_mux_channel_first=$i2c_bus/i2c-$2
i2c_mux_channel_last=$i2c_bus/i2c-$(expr $2 + $MAX_MUX_CHANNELS - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above logic is too complex to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

# where operation = "new_device"
# i2c-channel-first is the first of the 8 channels that this mux should create
i2c_mux_create() {
local MAX_MUX_CHANNEL_RETRY=5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel 5 is too large, could you change to 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this

break;
else
# If the channel did not get created, remove the mux, reset the mux tree and retry
i2c_config "echo 0x$i2c_mux > /sys/bus/i2c/devices/i2c-$devnum/i2c-$i2c_bus_num/delete_device"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rename the function i2c_config to i2c_mux_remove? and make the function symmetric with i2c_mux_remove?

else
# If the channel did not get created, remove the mux, reset the mux tree and retry
i2c_config "echo 0x$i2c_mux > /sys/bus/i2c/devices/i2c-$devnum/i2c-$i2c_bus_num/delete_device"
reset_muxes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before resetting the mux, could you add one line specific log indicating which mux failed to be initialized? what's the root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added debug with mux details.

@@ -191,6 +195,28 @@ xcvr_presence_interrupts() {
esac
}

# Reset the mux tree
reset_muxes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for resetting the muxes, is it possible to reset only one mux per the failure? or do we need to reset all the muxes? if we reset all the muxes, do we need to re-build the whole i2c tree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to reset only the specific mux that are part of this channel - however the complexity of that logic is not warranted. Resetting the muxes does not necessitate rebuilding the kernel's i2c tree.

@stcheng
Copy link
Contributor

stcheng commented Dec 12, 2018

By the way, could you also change the title of the pull request to indicate what is the change of the pull request? e.g. Adding MUX reset logic to fix probe failure issue.

@lguohan
Copy link
Collaborator

lguohan commented Dec 15, 2018

agree with shuotian.

@padmanarayana padmanarayana changed the title Address Dell issue#46 Address Dell issue#46 : Adding MUX reset logic to fix probe failures Dec 19, 2018
Copy link
Contributor

@stcheng stcheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you change the title of the commit to remove the "Address Dell issue#46 :" part?
you could add some more details in the commit message.

@lguohan lguohan merged commit 14a0297 into sonic-net:master Jan 8, 2019
vivekrnv pushed a commit to vivekrnv/sonic-buildimage that referenced this pull request Aug 26, 2022
What I did
This PR is to fix the issue of adding mux_acl_rule into IngressTableDrop.
The error log is

 Jun 25 08:02:37.159020 svcstr-7050-acs-4 ERR swss#orchagent: :- validateAclRuleMatch: Match SAI_ACL_ENTRY_ATTR_FIELD_IN_PORTS in rule mux_acl_rule is not supported by table IngressTableDrop
PR sonic-net#2341 added support for different matching field in different stage (INGRESS/EGRESS). For example, SAI_ACL_ENTRY_ATTR_FIELD_IN_PORTS is only supported at INGRESS stage.

However, PR sonic-net#2341 only handled one path for creating ACL table, that is by CONFIG_DB entry.
There is a case that addAclTable is directly called from other orch, such as MuxOrch. In that case, the stage dependent matcing field is not added. As a resule, we will see the above error logs.
To address the issue, I moved the call of addStageMandatoryMatchFields from doAclTableTask to addAclTable to ensure addStageMandatoryMatchFields is always called.
Please be noted that addMandatoryActions is called from both doAclTableTask and addAclTable to ensure the validation of ACL table is passing.

Why I did it
To fix ACL rule issue for mux.

How I verified it

Verified by running test_pfcwd
Verified by checking syslog

Signed-off-by: bingwang <wang.bing@microsoft.com>
yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Sep 9, 2022
…x-kernel] advance submodule head

linkmgrd:
* ab5b2c1 2022-09-02 | Fix mux config (sonic-net#128) (HEAD -> 202205, github/202205) [Longxiang Lyu]

utilities:
* 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (sonic-net#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355) [Sudharsan Dhamal Gopalarathnam]
* 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (sonic-net#2347) [jingwenxie]
* bc7b845 2022-09-04 | Add Password Hardening CLI support (sonic-net#2338) [davidpil2002]
* 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335) [Sudharsan Dhamal Gopalarathnam]
* f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (sonic-net#2328) [Zhaohui Sun]
* 4054ebb 2022-09-05 | Add verification for override (sonic-net#2305) [jingwenxie]
* 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (sonic-net#2183) [Hua Liu]

platform-daemons:
* 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (sonic-net#289) (HEAD -> 202205) [vdahiya12]
* 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (sonic-net#287) [vdahiya12]

platform-common:
* d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready  (sonic-net#305) (HEAD -> 202205) [Kebo Liu]

linux-kernel:
* 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (sonic-net#292) (HEAD -> 202205) [Lawrence Lee]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
yxieca added a commit that referenced this pull request Sep 10, 2022
…x-kernel] advance submodule head (#12025)

linkmgrd:
* ab5b2c1 2022-09-02 | Fix mux config (#128) (HEAD -> 202205, github/202205) [Longxiang Lyu]

utilities:
* 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (#2355) [Sudharsan Dhamal Gopalarathnam]
* 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (#2347) [jingwenxie]
* bc7b845 2022-09-04 | Add Password Hardening CLI support (#2338) [davidpil2002]
* 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (#2335) [Sudharsan Dhamal Gopalarathnam]
* f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (#2328) [Zhaohui Sun]
* 4054ebb 2022-09-05 | Add verification for override (#2305) [jingwenxie]
* 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (#2183) [Hua Liu]

platform-daemons:
* 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (#289) (HEAD -> 202205) [vdahiya12]
* 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (#287) [vdahiya12]

platform-common:
* d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready  (#305) (HEAD -> 202205) [Kebo Liu]

linux-kernel:
* 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (#292) (HEAD -> 202205) [Lawrence Lee]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this pull request Sep 14, 2022
Update sonic-utilities submodule pointer to include the following:
* 4d377a6 [subinterface]Added additional checks in portchannel and subinterface commands ([sonic-net#2345](sonic-net/sonic-utilities#2345))
* bbcdf2e disk_check: Publish event  for RO state ([sonic-net#2320](sonic-net/sonic-utilities#2320))
* 3fd537b Support the bandit check by GitHub Action ([sonic-net#2358](sonic-net/sonic-utilities#2358))
* 491d3d3 [generate dump]Added error message when saisdkdump fails ([sonic-net#2356](sonic-net/sonic-utilities#2356))
* 6830e01 [counterpoll]Fixing counterpoll show for tunnel and acl stats ([sonic-net#2355](sonic-net/sonic-utilities#2355))
* 3be2ad7 [fast-reboot]Avoid stopping masked services during fast-reboot ([sonic-net#2335](sonic-net/sonic-utilities#2335))
* 0e1b0cf [GCU] Fix missing backend in dry run ([sonic-net#2347](sonic-net/sonic-utilities#2347))
* 676c31b Add verification for override ([sonic-net#2305](sonic-net/sonic-utilities#2305))
* 48997c2 Add Password Hardening CLI support ([sonic-net#2338](sonic-net/sonic-utilities#2338))
* 414e239 update unit tests for swap ([#locato](https://github.com/Azure/sonic-utilities/pull/locato))
* a91a492 consider swap checking memory in ([#stalle](https://github.com/Azure/sonic-utilities/pull/stalle))
* f0ce586 [route_check]: Ignore standalone tunnel routes ([sonic-net#2325](sonic-net/sonic-utilities#2325))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Sep 21, 2022
Update sonic-utilities submodule pointer to include the following:
* 0a7557b [minigraph] add option to specify golden path in load_minigraph ([sonic-net#2350](sonic-net/sonic-utilities#2350))
* 322aefc [GCU]Remove GCU unique lane check for duplicate lanes platforms ([sonic-net#2343](sonic-net/sonic-utilities#2343))
* 7099fff [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot ([sonic-net#2286](sonic-net/sonic-utilities#2286))
* 09026ed [warm-reboot] fix warm-reboot when /tmp/cache is missing ([sonic-net#2367](sonic-net/sonic-utilities#2367))
* a3c404c Fix typo in platform_sfputil_helper.is_rj45_port ([sonic-net#2374](sonic-net/sonic-utilities#2374))
* 637d834 Vnet_route_check Vxlan tunnel route update. ([sonic-net#2281](sonic-net/sonic-utilities#2281))
* 29a3e51 Added support for tunnel route status in show vnet routes all. ([sonic-net#2341](sonic-net/sonic-utilities#2341))
* 1ac584b Use 'default' VRF when VRF name is not provided ([sonic-net#2368](sonic-net/sonic-utilities#2368))
* 4d377a6 [subinterface]Added additional checks in portchannel and subinterface commands ([sonic-net#2345](sonic-net/sonic-utilities#2345))
* bbcdf2e disk_check: Publish event  for RO state ([sonic-net#2320](sonic-net/sonic-utilities#2320))
* 3fd537b Support the bandit check by GitHub Action ([sonic-net#2358](sonic-net/sonic-utilities#2358))
* 491d3d3 [generate dump]Added error message when saisdkdump fails ([sonic-net#2356](sonic-net/sonic-utilities#2356))
* 6830e01 [counterpoll]Fixing counterpoll show for tunnel and acl stats ([sonic-net#2355](sonic-net/sonic-utilities#2355))
* 3be2ad7 [fast-reboot]Avoid stopping masked services during fast-reboot ([sonic-net#2335](sonic-net/sonic-utilities#2335))
* 0e1b0cf [GCU] Fix missing backend in dry run ([sonic-net#2347](sonic-net/sonic-utilities#2347))
* 676c31b Add verification for override ([sonic-net#2305](sonic-net/sonic-utilities#2305))
* 48997c2 Add Password Hardening CLI support ([sonic-net#2338](sonic-net/sonic-utilities#2338))
* 414e239 update unit tests for swap ([#locato](https://github.com/sonic-net/sonic-utilities/pull/locato))
* a91a492 consider swap checking memory in ([#stalle](https://github.com/sonic-net/sonic-utilities/pull/stalle))
* f0ce586 [route_check]: Ignore standalone tunnel routes ([sonic-net#2325](sonic-net/sonic-utilities#2325))

Signed-off-by: dprital <drorp@nvidia.com>
@wen587 wen587 mentioned this pull request Sep 23, 2022
7 tasks
qiluo-msft pushed a commit that referenced this pull request Oct 1, 2022
0a7557bd9 [minigraph] add option to specify golden path in load_minigraph (#2350)
322aefc37 [GCU]Remove GCU unique lane check for duplicate lanes platforms (#2343)
7099fffa7 [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot (#2286)
09026edbb [warm-reboot] fix warm-reboot when /tmp/cache is missing (#2367)
a3c404c74 Fix typo in platform_sfputil_helper.is_rj45_port (#2374)
637d834ce Vnet_route_check Vxlan tunnel route update. (#2281)
29a3e5180 Added support for tunnel route status in show vnet routes all. (#2341)
1ac584bb3 Use 'default' VRF when VRF name is not provided (#2368)
4d377a620 [subinterface]Added additional checks in portchannel and subinterface commands (#2345)
bbcdf2ed7 disk_check: Publish event  for RO state (#2320)
3fd537b0a Support the bandit check by GitHub Action (#2358)
491d3d380 [generate dump]Added error message when saisdkdump fails (#2356)
6830e01ec [counterpoll]Fixing counterpoll show for tunnel and acl stats (#2355)
3be2ad7de [fast-reboot]Avoid stopping masked services during fast-reboot (#2335)
0e1b0cf20 [GCU] Fix missing backend in dry run (#2347)
676c31bd0 Add verification for override (#2305)
48997c266 Add Password Hardening CLI support (#2338)
414e239ea update unit tests for swap allocator
a91a4922f consider swap checking memory in installer
f0ce58635 [route_check]: Ignore standalone tunnel routes (#2325)
roberthong-qct pushed a commit to QuantaSwitch/sonic-buildimage that referenced this pull request Nov 18, 2022
0a7557bd9 [minigraph] add option to specify golden path in load_minigraph (sonic-net#2350)
322aefc37 [GCU]Remove GCU unique lane check for duplicate lanes platforms (sonic-net#2343)
7099fffa7 [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot (sonic-net#2286)
09026edbb [warm-reboot] fix warm-reboot when /tmp/cache is missing (sonic-net#2367)
a3c404c74 Fix typo in platform_sfputil_helper.is_rj45_port (sonic-net#2374)
637d834ce Vnet_route_check Vxlan tunnel route update. (sonic-net#2281)
29a3e5180 Added support for tunnel route status in show vnet routes all. (sonic-net#2341)
1ac584bb3 Use 'default' VRF when VRF name is not provided (sonic-net#2368)
4d377a620 [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345)
bbcdf2ed7 disk_check: Publish event  for RO state (sonic-net#2320)
3fd537b0a Support the bandit check by GitHub Action (sonic-net#2358)
491d3d380 [generate dump]Added error message when saisdkdump fails (sonic-net#2356)
6830e01ec [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355)
3be2ad7de [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335)
0e1b0cf20 [GCU] Fix missing backend in dry run (sonic-net#2347)
676c31bd0 Add verification for override (sonic-net#2305)
48997c266 Add Password Hardening CLI support (sonic-net#2338)
414e239ea update unit tests for swap allocator
a91a4922f consider swap checking memory in installer
f0ce58635 [route_check]: Ignore standalone tunnel routes (sonic-net#2325)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants