Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[syncd]: Fix reload flow for Mellanox platforms #2386

Merged
merged 1 commit into from
Dec 15, 2018

Conversation

volodymyrsamotiy
Copy link
Collaborator

  • Perform stop/start of Mellanox driver tools for all types of reboot
  • Don't set Mellanox FAST_BOOT option for "cold" reboot
  • Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms

Signed-off-by: Volodymyr Samotiy volodymyrs@mellanox.com

- What I did
Fixed reload flow for syncd on Mellanox platforms.
- How I did it
Currently there are problems for syncd stop/start flow on Mellanox platforms which cause errors during switch/config reload and as a result switch is not initialized correctly. It happens due to broken syncd shutdown/start flow on Mellanox platforms.

Below is the list of changes made in order to fix the problem
Note: All changes are relevant only for Mellanox and doesn't change behavior for other platforms.

  • On Mellanox platforms stop/start of driver tools should be executed for all types of reboot, so changed syncd.sh script accordingly.
  • On Mellanox platforms FAST_BOOT option should be set only for fast/warm start and not for cold reboot, so changed syncd.sh script accordingly.
  • For now on Mellanox SAI remove_switch API doesn't have full support and returns an error on cold reboot (currently need to remove all config and then call switch remove). Changed syncd.sh stop flow in order to not send syncd_request_shutdown event for cold reboot.

- How to verify it
Deploy an image and verify that the following is working without any problems and errors during shutdown/start flow:

  • reboot
  • config load_minigraph
  • config reload
  • systemctl restart swss
  • fast-reboot
  • warm-reboot

- Description for the changelog
[syncd]: Fix reload flow for Mellanox platforms

* Perform stop/start of Mellanox driver tools for all types of reboot
* Don't set Mellanox FAST_BOOT option for "cold" reboot
* Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms

Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
@lguohan lguohan merged commit b506241 into sonic-net:master Dec 15, 2018
@@ -52,6 +52,24 @@ function wait_for_database_service()
done
}

function getBootType()
{
case "$(cat /proc/cmdline | grep -o 'SONIC_BOOT_TYPE=\S*' | cut -d'=' -f2)" in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONIC_BOOT_TYPE= [](start = 41, length = 16)

We should handle both cases for backward-compatible with 201803:

fast-reboot
SONIC_BOOT_TYPE=fast-reboot
Otherwise we cannot fast-reboot from 201803 into 201811.

vivekrnv added a commit to vivekrnv/sonic-buildimage that referenced this pull request Oct 20, 2022
aedc05ecf [QoS] Support dynamic headroom calculation for Barefoot platforms (sonic-net#2306)
7f4da26f2 [app_ext] [auto-ts] Add available_mem_threshold option (sonic-net#2423)
b25070176 YANG Validation for ConfigDB Updates: Fix Decorator Bug (sonic-net#2405)
f62d1e596 [watermarkstat] Add new warning message for the 'q_shared_multi' counters (sonic-net#2408)
25fda264e [chassis]Add fabric counter cli commands (sonic-net#1860)
ae97e597e Update sonic command doc to add CLIs relative to SONiC fips (sonic-net#2377)
abd5eba49 [generate_dump]: Enhance show techsupport for cisco-8000 platform (sonic-net#2403)
ee15b74a2 Include configuring laser frequency and tx power (sonic-net#2437)
70be50cdc Add a subcommand to display a hexdump of transceiver EEPROM page (sonic-net#2379)
c246801ba Filter port invalid MTU configuration (sonic-net#2378)
362ec9bd7 [show] vnet advertised-route command (sonic-net#2390)
2372e2983 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (sonic-net#2386)

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
liat-grozovik pushed a commit that referenced this pull request Oct 23, 2022
aedc05ecf [QoS] Support dynamic headroom calculation for Barefoot platforms (#2306)
7f4da26f2 [app_ext] [auto-ts] Add available_mem_threshold option (#2423)
b25070176 YANG Validation for ConfigDB Updates: Fix Decorator Bug (#2405)
f62d1e596 [watermarkstat] Add new warning message for the 'q_shared_multi' counters (#2408)
25fda264e [chassis]Add fabric counter cli commands (#1860)
ae97e597e Update sonic command doc to add CLIs relative to SONiC fips (#2377)
abd5eba49 [generate_dump]: Enhance show techsupport for cisco-8000 platform (#2403)
ee15b74a2 Include configuring laser frequency and tx power (#2437)
70be50cdc Add a subcommand to display a hexdump of transceiver EEPROM page (#2379)
c246801ba Filter port invalid MTU configuration (#2378)
362ec9bd7 [show] vnet advertised-route command (#2390)
2372e2983 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (#2386)

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Oct 25, 2022
…rm-common] advance submodule head

linkmgrd:
* d7d6635 2022-10-21 | Fix link prober state event report twice issue (sonic-net#149) (HEAD -> 202205) [Longxiang Lyu]
* 0ef3296 2022-10-21 | [active-active] Add support to send/handle mux probe request (sonic-net#147) [Longxiang Lyu]
* a66fa34 2022-10-17 | [active-active] Fix config reload (sonic-net#145) [Longxiang Lyu]
* 7e1c820 2022-10-11 | [Active-Standby] avoid posting mux metrics event when receiving unsolicited mux state notification  (sonic-net#142) [Jing Zhang]
* 237cfd2 2022-10-07 | [Active-Active] Update default route shutdown heartbeat logic (sonic-net#141) [Jing Zhang]

utilities:
* 415d30e 2022-10-23 | [techsupport] Adding FRR EVPN dumps (sonic-net#2442) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam]
* b3ffe45 2022-10-21 | [show][muxcable] add support for show mux firmware version all (sonic-net#2441) [vdahiya12]
* 7d68534 2022-10-19 | [app_ext] [auto-ts] Add available_mem_threshold option (sonic-net#2423) [Vivek]
* 52b9c16 2022-10-07 | [muxcable][config] add CLI support for mux mode detach (sonic-net#2425) [Jing Zhang]
* 14646ff 2022-10-10 | [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (sonic-net#2386) [Andriy Yurkiv]
* dffcc53 2022-10-11 | Add a subcommand to display a hexdump of transceiver EEPROM page (sonic-net#2379) [mihirpat1]
* 86175c2 2022-10-17 | [chassis]Add fabric counter cli commands (sonic-net#1860) [Maxime Lorrillere]

swss:
* 6fe0afd 2022-10-25 | [portsorch] remove port OID from saiOidToAlias map on port deletion (sonic-net#2483) (HEAD -> 202205, github/202205) [Stepan Blyshchak]
* 7290d66 2022-10-07 | [vlanmgr] Disable `arp_evict_nocarrier` for vlan host intf (sonic-net#2469) [Longxiang Lyu]
* d074001 2022-10-05 | [chassis][voq]Collect counters for fabric links (sonic-net#1944) [Maxime Lorrillere]
* 3a0353a 2022-10-18 | [counters][202205] Improve performance by polling only configured ports buffer queue/pg counters (sonic-net#2474) [Vadym Hlushko]
* 2feb39d 2022-10-14 | [202205] [crm] Fix issue with continues EXCEEDED and CLEAR logs for ACL group/table counters (sonic-net#2482) [Volodymyr Samotiy]

sairedis:
* 326b630 2022-10-21 | [gbsyncd] Add asic db prefix for channel NOTIFICATIONS (sonic-net#1129) (HEAD -> 202205) [Junhua Zhai]

platform-daemon:
* 6dbda9b 2022-10-25 | [ycabled] fix no port/state returned by grpc server (sonic-net#308) (HEAD -> 202205) [vdahiya12]
* 3d1228a 2022-10-20 | Fix xcvrd to support 400G ZR optic (sonic-net#293) [Bohan Yang]

platform-common:
* c04d710 2022-09-29 | Read CMIS data path state duration (sonic-net#312) (HEAD -> 202205) [Bohan Yang]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
yxieca added a commit that referenced this pull request Oct 27, 2022
…rm-common] advance submodule head (#12492)

linkmgrd:
* d7d6635 2022-10-21 | Fix link prober state event report twice issue (#149) (HEAD -> 202205) [Longxiang Lyu]
* 0ef3296 2022-10-21 | [active-active] Add support to send/handle mux probe request (#147) [Longxiang Lyu]
* a66fa34 2022-10-17 | [active-active] Fix config reload (#145) [Longxiang Lyu]
* 7e1c820 2022-10-11 | [Active-Standby] avoid posting mux metrics event when receiving unsolicited mux state notification  (#142) [Jing Zhang]
* 237cfd2 2022-10-07 | [Active-Active] Update default route shutdown heartbeat logic (#141) [Jing Zhang]

utilities:
* 415d30e 2022-10-23 | [techsupport] Adding FRR EVPN dumps (#2442) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam]
* b3ffe45 2022-10-21 | [show][muxcable] add support for show mux firmware version all (#2441) [vdahiya12]
* 7d68534 2022-10-19 | [app_ext] [auto-ts] Add available_mem_threshold option (#2423) [Vivek]
* 52b9c16 2022-10-07 | [muxcable][config] add CLI support for mux mode detach (#2425) [Jing Zhang]
* 14646ff 2022-10-10 | [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (#2386) [Andriy Yurkiv]
* dffcc53 2022-10-11 | Add a subcommand to display a hexdump of transceiver EEPROM page (#2379) [mihirpat1]
* 86175c2 2022-10-17 | [chassis]Add fabric counter cli commands (#1860) [Maxime Lorrillere]

swss:
* 6fe0afd 2022-10-25 | [portsorch] remove port OID from saiOidToAlias map on port deletion (#2483) (HEAD -> 202205, github/202205) [Stepan Blyshchak]
* 7290d66 2022-10-07 | [vlanmgr] Disable `arp_evict_nocarrier` for vlan host intf (#2469) [Longxiang Lyu]
* d074001 2022-10-05 | [chassis][voq]Collect counters for fabric links (#1944) [Maxime Lorrillere]
* 3a0353a 2022-10-18 | [counters][202205] Improve performance by polling only configured ports buffer queue/pg counters (#2474) [Vadym Hlushko]
* 2feb39d 2022-10-14 | [202205] [crm] Fix issue with continues EXCEEDED and CLEAR logs for ACL group/table counters (#2482) [Volodymyr Samotiy]

sairedis:
* 326b630 2022-10-21 | [gbsyncd] Add asic db prefix for channel NOTIFICATIONS (#1129) (HEAD -> 202205) [Junhua Zhai]

platform-daemon:
* 6dbda9b 2022-10-25 | [ycabled] fix no port/state returned by grpc server (#308) (HEAD -> 202205) [vdahiya12]
* 3d1228a 2022-10-20 | Fix xcvrd to support 400G ZR optic (#293) [Bohan Yang]

platform-common:
* c04d710 2022-09-29 | Read CMIS data path state duration (#312) (HEAD -> 202205) [Bohan Yang]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
@volodymyrsamotiy volodymyrsamotiy deleted the mellanox_reload_fix branch February 14, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants