Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[201803] [services] Restart SwSS service upon unexpected critical process exit #2546

Merged
merged 6 commits into from
Feb 26, 2019
Merged

[201803] [services] Restart SwSS service upon unexpected critical process exit #2546

merged 6 commits into from
Feb 26, 2019

Conversation

jleveque
Copy link
Contributor

@jleveque jleveque commented Feb 10, 2019

- What I did
Restart SwSS service (and also restart dependent services) if orchagent process exits abnormally.

NOTE: Will need to port this patch to the master branch in the near future, as well.

- How I did it

  • Add supervisor-proc-exit-listener event listener plugin for Supervisor running in SwSS Docker container which is set to listen for abnormal orchagent exits
  • Configure swss.service to always auto-restart the service if it stops, with a delay of 30 seconds [Also set a rate limit of 3 restarts within 20 minutes (1200 seconds). If this is exceeded, systemd will stop attempting to restart the service. To restart the service after entering this state, one must first run systemctl reset-failed -- we should probably call this command in config load_minigraph before restarting services]
  • Add dependency between teamd and SwSS to ensure teamd also gets restarted along with SwSS
  • Also change the way the DHCP relay Docker container waits for interfaces to be ready before starting the relay agent process. Rather than using ip commands, now check STATE_DB for interface entries with "state" == "ok"
  • Also add "WantedBy=swss.service" option to unit files of services which need to be started with SwSS (currently teamd, snmp, dhcp_relay and radv). The "Requires=swss.service" option causes the dependent services to stop and restart along with SwSS (when calling systemctl stop swss.service and systemctl restart swss.service). However this will not cause them to start with SwSS (when calling systemctl start swss.service). This functionality is enabled with the addition of the "WantedBy=" option.
  • supervisor-proc-exit-listener script resides in files/scripts/ so that the same script can be installed in multiple Docker containers. To add this solution to another container, one simply needs to do the following:
    1. Add the script to the container's "_FILES" variable in the container's Makefile, and ensure it gets copied into the container in the container's Dockerfile
    2. Add a /etc/supervisor/critical_processes file to the container specifying all critical processes, one per line
    3. Add the event listener as a process to the container's supervisor config file

- How it Works

  • If orchagent process crashes/exits abnormally, supervisor-proc-exit-listener will send a SIGTERM signal to Supervisor, causing it to exit also
  • Since Supervisor is running as PID 1 within the Docker container, when Supervisor process exits, it will cause the container to stop
  • When the SwSS Docker container stops, systemd will consider the swss.service to have stopped unexpectedly. Systemd will wait 30 seconds, then stop dependent services, restart SwSS, and lastly restart dependent services (currently teamd, snmp, dhcp_relay and radv).

- How to verify it

Send a signal to orchagent to cause it to appear to exit abnormally (e.g., pkill -11 orchagent). Ensure the swss, syncd, teamd, snmp, dhcp_relay and radv services get restarted per the above details.

- A picture of a cute animal (not mandatory but encouraged)

               )\._.,--....,'``.       
 .b--.        /;   _.. \   _\  (`._ ,. 
`=,-,-'~~~   `----(,_..'--(,_..'`-.;.'

@jleveque
Copy link
Contributor Author

Retest this please

Copy link
Contributor

@renukamanavalan renukamanavalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ e87b9ba#diff-527fc13fdccd498aa46471bac1c51a2d

supervisor-proc-exit-listener python script:
Will there be any more additional info in payload that can be logged to get more insight ?

@jleveque
Copy link
Contributor Author

@renukamanavalan: More info with regards to what? The reason the process exited? If so, Supervisor already logs this info upon process exit. For example, INFO exited: orchagent (terminated by SIGSEGV (core dumped); not expected).

@renukamanavalan
Copy link
Contributor

The changes looks good to me, per my little knowledge on the system.

Personally, I would need to understand more at higher level. For example, the script wait_for_intf.sh (both old & current) waits forever, until the IF is up. Wonder what would happen if interface init has failed and reporting some error, which would need some explicit action to fix, at the system level. Like, what logs gets fired and what auto-mitigation happens or possibilities for such ...

@jleveque
Copy link
Contributor Author

jleveque commented Feb 11, 2019

Please do not close pull request. Please use the "Review changes" feature to add a review, either approving or requesting changes. PR owner or repo maintainer will merge PR once approved.

@jleveque jleveque reopened this Feb 11, 2019
@renukamanavalan
Copy link
Contributor

Sorry for closing, by mistake.

I don't have any request for any code changes. I will leave approval to Guohan, as I am still learning the system.

@jleveque
Copy link
Contributor Author

Regarding the wait_for_intf.sh script, this is to ensure all interfaces required by the processes in the container are up and ready before starting them to ensure they do not get into a bad state. Auto-mitigation should be handled outside of this container, by orchagent or a related process.

if expected == 0 and processname in critical_processes:
MSG_FORMAT_STR = "Process {} exited unxepectedly. Terminating supervisor..."
msg = MSG_FORMAT_STR.format(payload_headers['processname'])
syslog.syslog(syslog.LOG_INFO, msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered also printing message to users' terminals? Usinng wall command or something?

Copy link
Contributor Author

@jleveque jleveque Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was not considered. Thanks for the suggestion.

@lguohan: What do you think about also broadcasting this message to all logged-in users' terminals?

@jleveque jleveque changed the title [201803] [services] Restart SwSS service upon abnormal orchagent process exit [201803] [services] Restart SwSS service upon unexpected critical process exit Feb 15, 2019
'vlanmgrd',
'intfmgrd',
'buffermgrd'
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put this script as a SCRIPT so it can be shared by all dockers, then the process list should be move to a separate file instead of being hard-coded in the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan: Where do you suggest we store the event listener script in the repo? We currently don't have files which are shared by multiple containers.

We could potentially add the script to the docker-base image and have it source a file containing the list of critical processes, but this might not be very straightforward.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check rules/script.mk the CONFIGDB_LOAD_SCRIPT, it is shared by multiple containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit b4c0ab2

@jipanyang
Copy link
Collaborator

Are you planning to make similar change for master/201811 branches? We may even consider unplanned warm restart support beyond this.

@jleveque
Copy link
Contributor Author

@jipanyang: Yes, a similar change will be made to the master/201811 branches. The changes are a bit more complex due to warm reboot. After this PR is merged, I will begin work on a solution for those branches.

@lguohan lguohan merged commit 2a8af27 into sonic-net:201803 Feb 26, 2019
@jleveque jleveque deleted the restart_swss_201803 branch February 26, 2019 19:27
lguohan added a commit that referenced this pull request Apr 6, 2019
vivekrnv added a commit to vivekrnv/sonic-buildimage that referenced this pull request Dec 17, 2022
6185324 dereg acl-rule counters during acl-table del (sonic-net#2574)
b865352 Align watermark flow with port configuration (sonic-net#2525)
f2d2fb3 L3 / L3 V6  Egress ACL table creation failure (sonic-net#2561)
577f696 [muxorch] Skip programming ACL for standby `active-active` ports (sonic-net#2569)
242ee11 [muxorch] Skip programming SoC IP kernel tunnel route (sonic-net#2557)
6695113 [gearbox] Support setting tx taps on gearbox ports (sonic-net#2158)
872f7bf [portinit] Do not call GET on SAI_PORT_ATTR_SPEED when AUTONEG is enabled (sonic-net#2484)
6afefe1 [vstest][virtual chassis] Removed dvs.runcmd using click commands (sonic-net#2214)
b8521cc [p4orch]: PINS Extension tables support (sonic-net#2506)
d0419dc sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (sonic-net#2552)
bd652a0 [muxorch] Adding case for maintaining current state (sonic-net#2280)
6b6dda6 [Centec]for support mclag of centec to configure port isolate-group sonic-net#2529
ec507a4 [ACL] Support ACTION_COUNTER action in custom ACL table type (sonic-net#2550)
1a74604 Use github code scanning instead of LGTM (sonic-net#2546)
bc3c894 [dual-tor] add missing SAI attribte in order to create IPNIP tunnel (sonic-net#2503)
dca78d8 (origin/202211) [Fdbsyncd] Bug Fix for remote MAC move to local MAC and Fix for Static MAC advertisement in EVPN. (sonic-net#2521)

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
liat-grozovik pushed a commit that referenced this pull request Dec 20, 2022
6185324 dereg acl-rule counters during acl-table del (#2574)
b865352 Align watermark flow with port configuration (#2525)
f2d2fb3 L3 / L3 V6  Egress ACL table creation failure (#2561)
577f696 [muxorch] Skip programming ACL for standby `active-active` ports (#2569)
242ee11 [muxorch] Skip programming SoC IP kernel tunnel route (#2557)
6695113 [gearbox] Support setting tx taps on gearbox ports (#2158)
872f7bf [portinit] Do not call GET on SAI_PORT_ATTR_SPEED when AUTONEG is enabled (#2484)
6afefe1 [vstest][virtual chassis] Removed dvs.runcmd using click commands (#2214)
b8521cc [p4orch]: PINS Extension tables support (#2506)
d0419dc sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (#2552)
bd652a0 [muxorch] Adding case for maintaining current state (#2280)
6b6dda6 [Centec]for support mclag of centec to configure port isolate-group #2529
ec507a4 [ACL] Support ACTION_COUNTER action in custom ACL table type (#2550)
1a74604 Use github code scanning instead of LGTM (#2546)
bc3c894 [dual-tor] add missing SAI attribte in order to create IPNIP tunnel (#2503)
dca78d8 (origin/202211) [Fdbsyncd] Bug Fix for remote MAC move to local MAC and Fix for Static MAC advertisement in EVPN. (#2521)
yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Jan 5, 2023
…atform-common][py-swsssdk] advance submodule head

linkmgrd:
* bf75a93 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#157) (HEAD -> 202205, github/202205) [Liu Shilong]

utilities:
* c1fa31d 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#2530) (HEAD -> 202205) [Liu Shilong]
* 9990208 2022-05-19 | Add sonic-delayed.target to Application Extension .timer file generator (sonic-net#2176) [noaOrMlnx]

swss:
* 7b3170a 2023-01-05 | Revert "sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (sonic-net#2552)" (HEAD -> 202205) [Ying Xie]
* 4897e93 2023-01-05 | Revert "[bufferorch] : Support for buffer profiles for VoQ on chassis (sonic-net#2465)" (HEAD -> 202205, github/202205) [Ying Xie]
* bbccc68 2023-01-04 | [vstest] Only collect stdout of orchagent_restart_check in vstest (sonic-net#2597) (HEAD -> 202205, github/202205) [bingwang-ms]
* f7a7c05 2023-01-04 | Avoid aborting orchagent when setting TUNNEL attributes (sonic-net#2591) [Stephen Sun]
* 84064fa 2022-12-20 | Fixed a bug causing error state of same configuration is applied twice. (sonic-net#2580) [siqbal1986]
* 4851bef 2022-12-20 | Only collect stdout of orchagent_restart_check in vstest (sonic-net#2578) [bingwang-ms]
* 2904d95 2022-12-05 | sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (sonic-net#2552) [Sambath Kumar Balasubramanian]
* ac84e41 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#2546) [Liu Shilong]
* 502bd69 2022-12-20 | Fix `test_vlan.py` (sonic-net#2541) [Longxiang Lyu]
* 1e37d0e 2022-12-19 | [voq][chassis]Add show fabric counters port/queue commands (sonic-net#2522) [jfeng-arista]
* 17cdad3 2022-12-20 | [bufferorch] : Support for buffer profiles for VoQ on chassis (sonic-net#2465) [vmittal-msft]
* 5345338 2023-01-04 | Disable `arp_evict_nocarrier` for vlan host intf  (sonic-net#2590) [Longxiang Lyu]

swss-common:
* 9616287 2023-01-04 | Added customer monitoring  tables in app db and state db (sonic-net#725) (HEAD -> 202205) [siqbal1986]
* d03b95d 2022-11-30 |  Use github code scanning instead of LGTM (sonic-net#718) [Liu Shilong]
* 8a276c6 2022-12-28 | Fix sonic-slave docker image environment issue. (sonic-net#728) (github/202205) [Liu Shilong]
* 8fee1b4 2022-11-14 | Fix memory leak issue in ConfigDBConnector. (sonic-net#655) (sonic-net#706) [Hua Liu]

sairedis:
* 5387602 2022-11-30 | Use github code scanning instead of LGTM (#1160) (HEAD -> 202205) [Liu Shilong]

platform-daemons:
* b499412 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#316) (HEAD -> 202205, github/202205) [Liu Shilong]

platform-common:
* d11e983 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#328) (HEAD -> 202205) [Liu Shilong]

py-swsssdk:
* b654e91 2022-11-30 | Use github code scanning instead of LGTM (sonic-net#131) (HEAD -> 202205) [Liu Shilong]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
yxieca added a commit that referenced this pull request Jan 5, 2023
…atform-common][py-swsssdk] advance submodule head (#13260)

linkmgrd:
* bf75a93 2022-11-30 | Use github code scanning instead of LGTM (#157) (HEAD -> 202205, github/202205) [Liu Shilong]

utilities:
* c1fa31d 2022-11-30 | Use github code scanning instead of LGTM (#2530) (HEAD -> 202205) [Liu Shilong]
* 9990208 2022-05-19 | Add sonic-delayed.target to Application Extension .timer file generator (#2176) [noaOrMlnx]

swss:
* bbccc68 2023-01-04 | [vstest] Only collect stdout of orchagent_restart_check in vstest (#2597) (HEAD -> 202205, github/202205) [bingwang-ms]
* f7a7c05 2023-01-04 | Avoid aborting orchagent when setting TUNNEL attributes (#2591) [Stephen Sun]
* 84064fa 2022-12-20 | Fixed a bug causing error state of same configuration is applied twice. (#2580) [siqbal1986]
* 4851bef 2022-12-20 | Only collect stdout of orchagent_restart_check in vstest (#2578) [bingwang-ms]
* 2904d95 2022-12-05 | sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (#2552) [Sambath Kumar Balasubramanian]
* ac84e41 2022-11-30 | Use github code scanning instead of LGTM (#2546) [Liu Shilong]
* 502bd69 2022-12-20 | Fix `test_vlan.py` (#2541) [Longxiang Lyu]
* 1e37d0e 2022-12-19 | [voq][chassis]Add show fabric counters port/queue commands (#2522) [jfeng-arista]
* 17cdad3 2022-12-20 | [bufferorch] : Support for buffer profiles for VoQ on chassis (#2465) [vmittal-msft]
* 5345338 2023-01-04 | Disable `arp_evict_nocarrier` for vlan host intf  (#2590) [Longxiang Lyu]

swss-common:
* 9616287 2023-01-04 | Added customer monitoring  tables in app db and state db (#725) (HEAD -> 202205) [siqbal1986]
* d03b95d 2022-11-30 |  Use github code scanning instead of LGTM (#718) [Liu Shilong]
* 8a276c6 2022-12-28 | Fix sonic-slave docker image environment issue. (#728) (github/202205) [Liu Shilong]
* 8fee1b4 2022-11-14 | Fix memory leak issue in ConfigDBConnector. (#655) (#706) [Hua Liu]

sairedis:
* 5387602 2022-11-30 | Use github code scanning instead of LGTM (#1160) (HEAD -> 202205) [Liu Shilong]

platform-daemons:
* b499412 2022-11-30 | Use github code scanning instead of LGTM (#316) (HEAD -> 202205, github/202205) [Liu Shilong]

platform-common:
* d11e983 2022-11-30 | Use github code scanning instead of LGTM (#328) (HEAD -> 202205) [Liu Shilong]

py-swsssdk:
* b654e91 2022-11-30 | Use github code scanning instead of LGTM (#131) (HEAD -> 202205) [Liu Shilong]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
StormLiangMS added a commit that referenced this pull request Jan 14, 2023
#13202

advance sonic-utilities submodule for 202211 branch

34428157 - (HEAD, origin/202211) Revert "Optimize the execution time of the 'show techsupport' script to 5-10%, (Qos config change #2504)" (6 days ago) [stormliang]
c3bd01f6 - Revert "[generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution ([201811][Devices] Add new device CIG CS6436-56P #2512)" (6 days ago) [stormliang]
5a326d8b - [Mellanox] Change severity to NOTICE in Mellanox buffer migrator when unable to fetch DEVICE_METADATA due to empty CONFIG_DB during initialization ([warm boot] cherry-pick PR #2538 and advance related sub-modules in 201811 branch #2569) (2 weeks ago) [Stephen Sun]
50b36ef3 - Fix issue: unconfigured PGs are displayed in watermarkstat ([docker-lldp]: fix several issues in lldpd docker #2556) (2 weeks ago) [Stephen Sun]
a9fd2a79 - [Command Ref] Add doc for syslog rate limit ([sub module] move sairedis and swss to 201811 branch #2508) (2 weeks ago) [Junchao-Mellanox]
80546ff3 - [generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution ([201811][Devices] Add new device CIG CS6436-56P #2512) (2 weeks ago) [Vadym Hlushko]
6649ca8a - [timer.unit.j2] use wanted-by in timer unit ([201803] [services] Restart SwSS service upon unexpected critical process exit #2546) (2 weeks ago) [Stepan Blyshchak]
dd23d0ef - Fixes [Sub-If|VRF] Unbind sub-interface from VRF is failed #12170: Delete subinterface and recreate the subinterface in ([VLAN] "show mac" doesn't work when interface added to vlan as tagged member #2513) (2 weeks ago) [Preetham]
236749d3 - [db_migrator] Fix migration of Loopback data: handle all Loopback interfaces (DellEMC S6000 xcvrd support #2560) (2 weeks ago) [Vaibhav Hemant Dixit]
5762d814 - Optimize the execution time of the 'show techsupport' script to 5-10%, (Qos config change #2504) (2 weeks ago) [Vadym Hlushko]
d3c3e368 - [muxcable][show] update show mux tunnel-route to separate ASIC and kernel into two columns (build errors on branch 201811 for centec platform #2553) (2 weeks ago) [Jing Zhang]
c98648a1 - [show]Fix show route return code on error (Dell SMF driver hwmon number reorder fix for Dell S6100/Z9100 #2542) (2 weeks ago) [Sudharsan Dhamal Gopalarathnam]
01374673 - [route_check]: Ignore ASIC only SOC IPs (Added new SN3700/SN3700C Mellanox platforms #2548) (2 weeks ago) [Lawrence Lee]
d2967805 - YANG Validation for ConfigDB Updates: WARM_RESTART, SFLOW_SESSION, SFLOW, VXLAN_TUNNEL, VXLAN_EVPN_NVO, VXLAN_TUNNEL_MAP, MGMT_VRF_CONFIG, CABLE_LENGTH, VRF tables ([submodule 201811] advance sairedis and swss submodule for 201811 branch #2526) (2 weeks ago) [isabelmsft]
88b01ffd - [db_migrator] Remove import of swsssdk as it is not supported in master ([build]: apply proxy setting to curl. #2544) (2 weeks ago) [Vaibhav Hemant Dixit]
4ae970c6 - Support syslog rate limit configuration for containers and host (Move FRR from 4.0 to 6.0.2 and make new frr version and pkg compile #2454) (2 weeks ago) [Junchao-Mellanox]
608ed147 - [generate_dump] [Mellanox] Fix the duplicate dfw dump collection problem by adding symlinks ('show vlan config' is not displaying the VLAN members, after the clear config and reload with default l2 configuration. #2536) (2 weeks ago) [Vivek]
bdc2599f - [config] Add check in config interface ip command to block if the interface is portchannel member ([sub module] advance sonic-swss sub module #2539) (2 weeks ago) [Sudharsan Dhamal Gopalarathnam]
cff4fed5 - [system-health] Improve code structure of system health CLIs ([sub-module] advance sonic-swss sub-module #2453) (2 weeks ago) [Junchao-Mellanox]
488e5714 - Transceiver eeprom dom CLI modification to show output from TRANSCEIVER_DOM_THRESHOLD table (Fix for KeyError: 'DEVICE_NEIGHBOR' when executing 'show interfaces neighbor expected' command #2535) (2 weeks ago) [mihirpat1]
07ca5def - sonic-utilities: Update config reload() to verify formatting of an input file ([ntp]: Do not disable reader for error ENOBUFS #2529) (2 weeks ago) [Caitlin Choate]
f0f083a2 - [GCU] Add RemoveCreateOnlyDependency Validator/Generator (Enabling Fast-reboot command in s6100 loaded with T0 topo getting Failed unmounting /host error #2500) (2 weeks ago) [jingwenxie]
eca0253c - [QoS] Introduce delay to the qos reload flow (Config reload/load_minigraph not clearing State DB #2503) (2 weeks ago) [DavidZagury]
35158ee0 - Use github code scanning instead of LGTM ([sub module] sub module sonic-swss-common tracking 201811 branch #2530) (2 weeks ago) [Liu Shilong]
682b5cee - Change show kube command default value of insecure key to True ([submodule] update sonic-snmpagent #2517) (2 weeks ago) [lixiaoyuner]
ce19e631 - Add db_migrator_constants.py script to setup.py (Add device data for Arista 7060PX/DX4-32 #2534) (2 weeks ago) [Vaibhav Hemant Dixit]
0d0c2693 - [drop counters] Fix CLI script for unconfigured PGs ([config] Do not fail for minigraphs which do not have neighbors listed in <Devices> section #2518) (2 weeks ago) [Lior Avramov]
2c69d0fd - Update vrf add, del commands for duplicate/non-existing VRFs (solve package build dependency issue #2467) (2 weeks ago) [Muhammad Danish]
efc09280 - Port 202012 DB migration changes to newer branches ([vs]: Force10-S6000 buffer settings for virtual switch #2515) (2 weeks ago) [Vaibhav Hemant Dixit]
70a15aaa - [VXLAN]Fixing traceback in show remotemac when mac moves during command execution ([build] When generating image version, handle case where current commit has no reachable tags #2506) (2 weeks ago) [Sudharsan Dhamal Gopalarathnam]
qiluo-msft pushed a commit that referenced this pull request Jan 18, 2023
#### Why I did it

Update for following swss commits:

96180bf - 2023-01-13 : [202012] Bfd default multiplier change  (#2615) [siqbal1986]
07506ac - 2023-01-11 : Add missing parameter to on_switch_shutdown_request method. (#2567) [Hua Liu]
3253cc8 - 2022-11-30 : Use github code scanning instead of LGTM (#2546) [Liu Shilong]
f4df524 - 2023-01-11 : [orchagent]:add local_discriminator to state_db (#2587) [Baorong Liu]
f3cd02d - 2022-12-05 : [202012][muxorch] Adding case for maintaining current state (#2500) [Nikola Dancejic]
StormLiangMS added a commit that referenced this pull request Feb 17, 2023
Why I did it
Submodule advances:
sonic-utilities

8e8e6088 - [202211][dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while adding vlan ([201811 sub-module] advance sub-modules: utilities, swss, swss-common #2679) (16 hours ago) [Yaqiang Zhu]
1400fb94 - [GCU] Ignore bgpraw in GCU applier (Fix sfputil indexing for 7170-Q59S20 #2623) (15 hours ago) [jingwenxie]
f76a6364 - [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan ([sonic-py-swsssdk] Update submodule #2660) (15 hours ago) [Yaqiang Zhu]
7849e18d - [db_migrator] make LOG_LEVEL_DB migration more robust (Mellanox platform: attach queues 2 and 6 to lossy profile using generic buffer template #2651) (16 hours ago) [Stepan Blyshchak]
c7df6dfa - Fixed a bug in "show vnet routes all" causing screen overrun. (Add hook to allow customizing link cable lengths #2644) (16 hours ago) [siqbal1986]
a5505f02 - show logging CLI support for logs stored in tmpfs (Traceback error seen while issuing show interface commands with if_names #2641) (16 hours ago) [mihirpat1]
bbacb91a - [system-health] Fix issue: show system-health CLI crashes (Updating deb package for platform and sai #2635) (16 hours ago) [Junchao-Mellanox]
8d724024 - [sai_failure_dump]Invoking dump during SAI failure ([dockers]: Upgrade LLDP docker to stretch build #2633) (16 hours ago) [Sudharsan Dhamal Gopalarathnam]
3c3be526 - Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR ([submodule]: Update sonic-sairedis pointer #2630) (16 hours ago) [mihirpat1]
37f41666 - [show] add support for gRPC show commands for active-active ([bitmap-vnet]: Bitmap vnet test image [DO NOT MERGE] #2629) (16 hours ago) [vdahiya12]
b06d7fe4 - [show_bfd] add local discriminator in show bfd command ([Pmon] Selectively load pmon container daemons #2625) (16 hours ago) [Baorong Liu]
6adcd3e8 - [GCU] Ignore bgpraw table in GCU operation ([Mellanox] Fix SAI version #2628) (16 hours ago) [jingwenxie]
c65bdc35 - [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when radv service is stopped (Add knob in ConfigDB to enable/disable telemetry container #2622) (16 hours ago) [Jing Zhang]
91e9457f - Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR ([201803] Restart SwSS, syncd and dependent services if a critical process in syncd container exits #2615) (16 hours ago) [longhuan-cisco]
54cc8c5a - Remove TODO comment which is no longer relevant (Warm-reboot: teamd warm restart caused neighbor deleted and learned again.  #2600) (16 hours ago) [Lior Avramov]
6891b4fb - Making 'show feature autorestart' more resilient to missing auto_restart config in CONFIG_DB ([submodule] update mellanox hw-mgmgt pointer (V.2.0.0061) #2592) (16 hours ago) [kartik-arista]
1e8bea37 - [storyteller] add link prober state change to story teller ([sonic-buildimage] New feature managementVRF(L3mdev) #2585) (16 hours ago) [Jing Zhang]
7481a20f - Extend fast-reboot STATE_DB entry timer ([submodule]: update sonic-swss-common, sonic-py-swsssdk, sonic-snmpagent #2577) (16 hours ago) [Aryeh Feigin]
0e08701c - [sonic_installer] use /etc/resolv.conf from the host when migrating packages (Set a rate limit on syslog messages from all Docker containers #2573) (16 hours ago) [Stepan Blyshchak]
06096780 - Fixed admin state config CLI for Backport interfaces (Prior to install a new ONIE SONiC image, delete all partitions except EFI/ONIE #2557) (16 hours ago) [anamehra]
9f1f13e4 - [show] Add bgpraw to show run all (Fixed typo on paragraph #40 #2537) (16 hours ago) [jingwenxie]
98bc8bd2 - [chassis][voq] Add "show fabric reachability" command. ([ntp]: Build 4.2.6 locally. #2528) (16 hours ago) [jfeng-arista]
3a50b63f - Preserve copp tables through DB migration ([docker-radvd]: upgrade docker radvd to stretch based #2524) (16 hours ago) [Aryeh Feigin]
28f6b127 - [masic] 'show interfaces counters' reminds to use '-d all' option to check for internal links (solve dependency issue #2466) (16 hours ago) [wenyiz2021]
15026e14 - suppport multi asic for show queue counter ([dockers] Prevent old supervisord messages from gettting re-logged to syslog #2439) (16 hours ago) [zhixzhu]
2d773e17 - [masic support] 'show run bgp' support for multi-asic (lo address not synced to the asic #2427) (16 hours ago) [wenyiz2021]
sonic-swss

4f304bc - [EVPN]Handling race condition when remote VNI arrives before tunnel map entry ([sonic-quagga] Function defect, do NOT cancel route while connect IP down #2642) (15 hours ago) [Sudharsan Dhamal Gopalarathnam]
34fc615 - [sai_failure_dump]Invoking dump during SAI failure (Add hook to allow customizing link cable lengths #2644) (15 hours ago) [Sudharsan Dhamal Gopalarathnam]
b817695 - [autoneg]Fixing adv interface types to be set when AN is disabled (Fix issue with platform file path name #2638) (15 hours ago) [Sudharsan Dhamal Gopalarathnam]
ab36bd4 - [bfdorch] add local discriminator to state DB ([bitmap-vnet]: Bitmap vnet test image [DO NOT MERGE] #2629) (15 hours ago) [Baorong Liu]
6343471 - Remove TODO comments that are no longer relevant (Add knob in ConfigDB to enable/disable telemetry container #2622) (15 hours ago) [Lior Avramov]
2b1869c - [refactor]Refactoring sai handle status (Rollback kernel submodule update. #2621) (15 hours ago) [Sudharsan Dhamal Gopalarathnam]
c41a1b7 - Fix issue ARP entry is out of sync between kernel and APPL_DB after warm reboot if the ARP entry is updated more than once during warm reboot in PFC watchdog warm reboot test #13341 ARP entry can be out of sync between kernel and APPL_DB if multiple updates are received from RTNL ([sub module] advance sonic-utilities sub module for 201811 branch #2619) (15 hours ago) [Stephen Sun]
da0cf7a - Changed the BFD default detect multiplier to 10x ("failed to load plugin io.containerd.snapshotter..." seen during linux boot up #2614) (15 hours ago) [siqbal1986]
13b5adf - [vstest] Only collect stdout of orchagent_restart_check in vstest ([submodules] update swss and utilities pointers #2597) (15 hours ago) [bingwang-ms]
2b9d94d - Avoid aborting orchagent when setting TUNNEL attributes (build failing for PLATFORM=p4 #2591) (15 hours ago) [Stephen Sun]
99b7d3b - Only collect stdout of orchagent_restart_check in vstest ( [saibcm-modules]: import new bcm modules #2578) (15 hours ago) [bingwang-ms]
5209c42 - dereg acl-rule counters during acl-table del ([201803] Set a rate limit on syslog messages from all Docker containers #2574) (15 hours ago) [Vivek]
ae68054 - Fixed set mtu for deleted subintf due to late notification ([vs]: Add option to specify platform name for DVS orchagent #2571) (15 hours ago) [EdenGri]
ab13dfa - Remove TODO comments which are no longer needed (support set timezone in ConfigDB #2568) (15 hours ago) [Junchao-Mellanox]
a3545cf - Modify coppmgr mergeConfig to support preserving copp tables through reboot. (Added new SN3700/SN3700C Mellanox platforms #2548) (15 hours ago) [Aryeh Feigin]
be16e79 - Use github code scanning instead of LGTM ([201803] [services] Restart SwSS service upon unexpected critical process exit #2546) (15 hours ago) [Liu Shilong]
63c0234 - Updated handling of VRF_VNI mapping and VLAN_VNI mapping for same VNI ID (Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… #2538) (15 hours ago) [Tapash Das]
4844111 - Fix potential risks ([mlnx] Fix sai xml path for boxer platform #2516) (15 hours ago) [Liran-Ar]
6420808 - [p4orch]: PINS Extension tables support ([build] When generating image version, handle case where current commit has no reachable tags #2506) (15 hours ago) [svshah-intel]
sonic-swss-common

1badd46 - Increase the netlink buffer size from 3MB to 16MB. (arp_update doesn't sleep 300 between each execution #739) (14 hours ago) [KISHORE KUNAL]
6555057 - Refactor eventpublisher deinit ([acl] Add default deny rule for l3 table #734) (14 hours ago) [Zain Budhwani]
f4d6de7 - Use github code scanning instead of LGTM ([sonic-quagga]:update submodule #718) (14 hours ago) [Liu Shilong]
sonic-linux-kernel

74f9a8f - Update linux kernel for hw-mgmt V.7.0020.4104 (Move template files to /usr/share/sonic/templates #305) (14 hours ago) [Stephen Sun]
6365701 - Fixes for emmc unreliability ([build_debian.sh]: Integrate system dump script #270) (14 hours ago) [Samuel Angebault]
How I did it
How to verify it
mihirpat1 pushed a commit to mihirpat1/sonic-buildimage that referenced this pull request Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants