Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saidump on T2 results lua script to take more than 5 sec #13561

Closed
abdosi opened this issue Jan 31, 2023 · 10 comments
Closed

saidump on T2 results lua script to take more than 5 sec #13561

abdosi opened this issue Jan 31, 2023 · 10 comments
Assignees
Labels
Issue for 202205 P0 Priority of the issue Triaged this issue has been triaged

Comments

@abdosi
Copy link
Contributor

abdosi commented Jan 31, 2023

On T2 with more route scale we see LUA script: https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua invoked as part of saidump commands can take more than 5 sec and thus all other process start failing as no/incorrect reply from Redis.

Command
generate_dump -s yesterday

Jan 30 23:22:32.637122 str2-xxxx-lc1-3 INFO acms#start.py: start: main: Waiting for bootstrap cert
Jan 30 23:23:00.293090 str2-xxxx-lc1-3 DEBUG syncd0#saidump: :> main: enter
Jan 30 23:23:00.307986 str2-xxxx-lc1-3 INFO syncd0#saidump: :- loadRedisScript: lua script local keys = redis.call("KEYS", KEYS[1] .. ":*")#012local res = {}#012#012for i,k in pairs(keys) do#012   local sres={}#012#012   local flat_map = redis.call('HGETALL', k)#012   for j = 1, #flat_map, 2 do#012       sres[flat_map[j]] = flat_map[j + 1]#012   end#012#012   res[k] = sres#012end#012#012return cjson.encode(res)#012 loaded, sha: 654245aafba722f7b601e1fc414c3647058c053c
Jan 30 23:23:05.329309 str2-xxxx-lc1-3 INFO database0#supervisord: redis 39:M 30 Jan 2023 23:23:05.328 # Lua slow script detected: still in execution after 5021 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 654245aafba722f7b601e1fc414c3647058c053c
Jan 30 23:23:05.329391 str2-xxxxx-lc1-3 ERR gbsyncd0#GBSAI[18]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.

@abdosi
Copy link
Contributor Author

abdosi commented Jan 31, 2023

cc @judyjoseph @arlakshm @anamehra for viz.

@zhenggen-xu
Copy link
Collaborator

@abdosi do you have a sense how many routes would be too much? We could hit the issue in other layers too if this is related to route scale.

@gechiang gechiang added the Triaged this issue has been triaged label Feb 1, 2023
@gechiang
Copy link
Collaborator

gechiang commented Feb 1, 2023

@kcudnik please help investigate this. Thanks!

kenneth-arista pushed a commit to kenneth-arista/sonic-utilities that referenced this issue Feb 10, 2023
yxieca pushed a commit to sonic-net/sonic-utilities that referenced this issue Feb 10, 2023
#2671)

To address sonic-net/sonic-buildimage#13561 skip saidump on T2 platforms for time-being.

Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
@rlhui rlhui added the P0 Priority of the issue label Feb 25, 2023
isabelmsft pushed a commit to isabelmsft/sonic-utilities that referenced this issue Mar 23, 2023
isabelmsft added a commit to isabelmsft/sonic-utilities that referenced this issue Mar 23, 2023
commit 1d54781a1f90bda156b06b0734805babfba88b6d
Merge: 460c7f39 c704b71c
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 07:32:32 2023 +0000

    Merge branch 'mux_mclag' of https://github.com/isabelmsft/sonic-utilities into mux_mclag

commit 460c7f390d352b1a0090708fde2f5ca2ace99209
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 07:22:54 2023 +0000

    fix UT

commit d3e7f22a806d238b20e7e9db1cdfb1afc5d04ae1
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 05:32:03 2023 +0000

    fix UT

commit e2660efe7f6de2531d966a8bf207b04456747374
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 04:37:26 2023 +0000

    add UT

commit 68cc589f4d20e60461bf76cbe67cad931f10c7c2
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 00:55:15 2023 +0000

    add UT

commit f55ea00bb1fd1d4827c67110498de6d49990d4d1
Author: Mai Bui <maibui@microsoft.com>
Date:   Tue Mar 21 00:25:39 2023 -0400

    Revert "Replace pickle by json (#2636)" (#2746)

    This reverts commit 54e26359fccf45d2e40800cf5598a725798634cd.
    Due to https://github.com/sonic-net/sonic-buildimage/issues/14089
    Signed-off-by: Mai Bui <maibui@microsoft.com>

commit 3b842c1b215020b24e5934b618d8cb51542e4088
Author: abdosi <58047199+abdosi@users.noreply.github.com>
Date:   Fri Mar 17 16:27:48 2023 -0700

    Fix the `show interface counters` throwing exception on device with no external interfaces (#2703)

    Fix the `show interface counters` throwing exception
    issue where device do not have any external ports and all are internal links (ethernet or fabric) which is possible in chassis

commit ce9245d90a3ccdf903d34ba6966224b29de5d15b
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Fri Mar 17 09:10:47 2023 +0200

    [route_check] remove check-frr_patch mock (#2732)

    The test fails with python3.7 (works in 3.9) when stopping patch which hasn't been started. We can always mock check_output call and if FRR_ROUTES is not defined return empty dictionary by the mock.

    #### What I did

    Removed check_frr_patch mock to fix UT running on python3.7

    #### How I did it

    Removed the mock

    #### How to verify it

    Run unit test in stretch env

commit 370aa30fc3f51918d4d0c36c9dc2c79f54214e67
Author: Neetha John <nejo@microsoft.com>
Date:   Thu Mar 16 17:31:49 2023 -0700

    Revert "Update load minigraph to load backend acl (#2236)" (#2735)

    This reverts commit 1518ca92df1e794222bf45100246c8ef956d7af6.

commit e4415b5ed4ea3100580ee9aaf8060587b8f96611
Author: Vivek <vivekreddykarri98@gmail.com>
Date:   Tue Mar 14 17:55:40 2023 -0700

    Update the ref guide to reflect the vlan brief output (#2731)

    What I did
    show vlan brief will only be showing dhcpv4 addresses and not dhcpv6 destination

    Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

commit 093c964c576e28188ddb0181af1fcc6b7a3adfc5
Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com>
Date:   Tue Mar 14 22:13:51 2023 +0200

    Fix fast-reboot DB migration (#2734)

    Fix DB migrator logic for migrating fast-reboot table, fixing #2621 db_migrator.

    How I did it
    Checking if fast-reboot table exists in DB.

    How to verify it
    Verified manually, migrating after fast-reboot and after cold/warm reboot.

commit 16baa1a1ddac85ab1db559a27d47b566b65d78e8
Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Date:   Tue Mar 14 21:01:52 2023 +0800

    Enhance the logic to wait for all buffer tables to be removed in _clear_qos (#2720)

    - What I did
    This is an enhancement of PR #2503

    - How I did it
    On top of waiting for BUFFER_POOL_TABLE to be cleared from APPL_DB, we need to wait for KEY_SET and DEL_SET as well.
    KEY_SET and DEL_SET are designed to accommodate the APPL_DB entries that were updated by manager daemons but have not yet been handled by the orchagent.
    In this case, even if the buffer tables are empty, entries in KEY_SET or DEL_SET will be in the buffer tables later on. So, we need to wait for key set tables as well.
    Do not delay for traditional buffer manager because it does not remove any buffer table.
    Provide a CLI option to print the detailed message if there is any table item which still exists

    - How to verify it
    Manually test and unit test

    - Previous command output (if the output of a command-line utility has changed)
    Running command: /usr/local/bin/sonic-cfggen  -d --write-to-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/buffers_dynamic.json.j2,config-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml

    - New command output (if the output of a command-line utility has changed)
    Only with option --verbose there are new output. Without the option, the output is the same as it is.

    admin@mtbc-sonic-01-2410:~$ sudo config qos  reload --verbose
    Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2
    Some entries matching BUFFER_*_SET still exist: BUFFER_PG_TABLE_KEY_SET
    Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2
    Some entries matching BUFFER_*_SET still exist: BUFFER_PG_TABLE_KEY_SET
    Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2
    Running command: /usr/local/bin/sonic-cfggen  -d --write-to-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/buffers_dynamic.json.j2,config-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml

commit 81b4fcaa7f79976fdd5da07077e21902679579fb
Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com>
Date:   Fri Mar 10 18:41:30 2023 +0200

    Remove timer from FAST_REBOOT STATE_DB entry and use finalizer (#2621)

    This should come along with sonic-buildimage PR (sonic-net/sonic-buildimage#13484) implementing fast-reboot finalizing logic in finalize-warmboot script and other submodules PRs utilizing the change.

    This PR should come along with the following PRs as well:
    sonic-net/sonic-swss-common#742
    sonic-net/sonic-platform-daemons#335
    sonic-net/sonic-sairedis#1196

    This set of PRs solves the issue sonic-net/sonic-buildimage#13251

    What I did
    Remove the timer used to clear fast-reboot entry from state-db, instead it will be cleared by fast-reboot finalize function implemented inside finalize-warmboot script (which will be invoked since fast-reboot is using warm-reboot infrastructure).

    As well instead of having "1" as the value for fast-reboot entry in state-db and deleting it when done it is now modified to set enable/disable according to the context.

    As well all scripts reading this entry should be modified to the new value options.

    How I did it
    Removed the timer usage in the fast-reboot script and adding fast-reboot finalize logic to warm-reboot in the linked PR.
    Use "enable/disable" instead of "1" as the entry value.

    How to verify it
    Run fast-reboot and check that the state-db entry for fast-reboot is being deleted after finalizing fast-reboot and not by an expiring timer.

commit 9693c990191143605c74fe98c5a0f099598238fe
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Fri Mar 10 04:07:25 2023 +0200

    [route_check] fix IPv6 address handling (#2722)

    *In case user has configured an IPv6 address on an interface in CONFIG DB in non simplified form like 2000:31:0:0::1/64 it is present in a simplified form in ASIC_DB. This leads to route_check failure since it just compares strings.

commit e65ffce059fc4164a59c17774346764232f2c10d
Author: jhli-cisco <93410383+jhli-cisco@users.noreply.github.com>
Date:   Wed Mar 8 18:03:50 2023 -0800

    update fast-reboot (#2728)

commit 4f24b1137a00f596bf520fdf159ac8c4c6bb63c6
Author: jingwenxie <jingwenxie@microsoft.com>
Date:   Thu Mar 9 09:12:19 2023 +0800

    [GCU] Add vlanintf-validator (#2697)

    What I did
    Fix the bug of GCU vlan interface modification. It should call ip neigh flush dev after removing interface ip.
    The fix is basically following config CLI's tradition.

    How I did it
    Add vlanintf service validator to check if extra step of ip neigh flush is needed.

    How to verify it
    GCU E2E test in dualtor testbed.

commit 40f4254c87f33145c121fc182601702df7fceced
Author: Liu Shilong <shilongliu@microsoft.com>
Date:   Thu Mar 9 06:57:05 2023 +0800

    Check SONiC dependencies before installation. (#2716)

    #### What I did
    SONiC related packages shouldn't be intalled from Pypi.
    It is security compliance requirement.
    Check SONiC related packages when using setup.py.

commit 793b14ac75042e86f9f38852b9c2eafdf981ab18
Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com>
Date:   Wed Mar 8 13:28:59 2023 -0800

    Improve show acl commands (#2667)

    * Add status for ACL_TABLE and ACL_RULE in STATE_DB

commit 3d24b00fcf0159e77eab656f793e9267f323fcbb
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Wed Mar 8 00:19:03 2023 -0800

    [GCU] Add PFC_WD RDMA validator (#2619)

commit dcccec9df35cd76045f0c623d058d0c87fcc3fe6
Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com>
Date:   Tue Mar 7 15:19:53 2023 -0800

    [show][muxcable] increase timeout for displaying HW_STATUS (#2712)

    What I did
    probe mux direction not always return success.

    Sample output of: while [ 1 ]; do date; show mux hwmode muxdirection; show mux status; sleep 1; done

    Mon 27 Feb 2023 03:12:25 PM UTC
    Port         Direction    Presence
    -----------  -----------  ----------
    Ethernet16   unknown      True

    PORT         STATUS    HEALTH    HWSTATUS      LAST_SWITCHOVER_TIME
    -----------  --------  --------  ------------  ---------------------------
    Ethernet16   standby   healthy   inconsistent  2023-Feb-25 07:55:18.269177
    If we increase the timeout to 0.5 secs to get the values back from ycabled, this will remove the inconsistency issue, and display the consistent values, because while telemetry is going on, the time to get actual mux value takes significantly longer than 0.1 seconds.

    PORT         STATUS    HEALTH    HWSTATUS      LAST_SWITCHOVER_TIME
    -----------  --------  --------  ------------  ---------------------------
    Ethernet16   standby   healthy   consistent  2023-Feb-25 07:55:18.269177
    How I did it
    How to verify it
    Manually run changes on setup
    worst-case CLI return time could be 16 seconds for 32 ports. on avg each port is 200 mSec if telemetry is going, but on average show command will return in < 1 sec for all 32 ports.

    Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

commit 75bb60fe4f22b2c0831e7b31e5675df0cd01ff7d
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Tue Mar 7 14:42:50 2023 -0800

    YANG validation for ConfigDB Updates: MIRROR_SESSION use case (#2430)

commit cf3f0ce86b3fd4f7b7548331aab8cc3337663e5d
Author: kellyyeh <42761586+kellyyeh@users.noreply.github.com>
Date:   Tue Mar 7 10:47:13 2023 -0800

    Fix non-zero status exit on non secure boot system (#2715)

    What I did
    Warm-reboot fails on kvm due to non-zero exit upon command
    bootctl status 2>/dev/null | grep -c "Secure Boot: enabled"

    How I did it
    Added || true to return 0 when previous command fails.
    Added CHECK_SECURE_UPGRADE_ENABLED to check output of previous command
    Added debug logs

    How to verify it
    Run warm-reboot on kvm and physical device when increased verbosity. Expects debug log to indicate secure/non secure boot. Successful warm reboot

commit 74d6d77c3ae6cc255bf18755bd902ff7d86ace67
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Tue Mar 7 20:23:07 2023 +0200

    [route_check] implement a check for FRR routes not marked offloaded (#2531)

    * [route_check] implement a check for FRR routes not marked offloaded
    * Implemented a route_check functioality that will check "show ip route json" output from FRR and will ensure that all routes are marked as offloaded. If some routes are not offloaded for 15 sec, this is considered as an issue and a mitigation logic is invoked.

commit 36e98b3ddf584790a4f7e343c4fbe0895ef9bc85
Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
Date:   Mon Mar 6 10:56:51 2023 -0800

    [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown (#2714)

    Goal: Preserve logs during TOR upgrades and shutdown

    Need:

    Below PRs moved logs from disk to tmpfs for specific hwskus.
    Due to these changes, shutdown path logs are now lost.
    The logs in shutdown path are crucial for debug purposes.

    sonic-net/sonic-buildimage#13805
    sonic-net/sonic-buildimage#13587
    sonic-net/sonic-buildimage#13587

    How I did it
    Check if logs are on tmpfs. If yes, backup logs from /var/log

    How to verify it
    Verified on a physical device - logs on tmfs are backed up for past 30 minutes.

commit a1c3bd55eea983aae197282e10ac8099492a6194
Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
Date:   Fri Mar 3 12:45:40 2023 -0800

    [db_migrator] Add missing attribute 'weight' to route entries in APPL DB (#2691)

    Fixes: 201911 to 202205 warm upgrade failure in fpmsyncd reconciliation due to missing weight attr in routes. (sonic-net/sonic-buildimage#12625)

    How I did it
    Check for missing attribute weight in APPLDB route entries. If found missing this attribute is added with empty value.

    How to verify it
    Verified on physical device. 201911 to 202205 upgrade worked fine.

commit 696da1878f2e275d8cf2fbb17881d63ca01df32a
Author: Liu Shilong <shilongliu@microsoft.com>
Date:   Thu Mar 2 15:36:57 2023 +0800

    [ci] Fix pipeline issue caused by sonic-slave-* change. (#2709)

    What I did
    These 3 packages maybe purged by default. Do not block pipeline.
    Download deb/whl packages only to accelerate download process.
    How I did it
    How to verify it

commit bf24267fddc95e8d83ef5908e0eab30ddd6c3ac1
Author: Yaqiang Zhu <yaqiangzhu@microsoft.com>
Date:   Wed Mar 1 10:05:04 2023 +0800

    [dhcp_relay] Fix dhcp_relay restart error while add/del vlan (#2688)

    Why I did
    In device that doesn't have dhcp_relay service, restart dhcp_relay after add/del vlan would encounter failed

    How I did it
    Add support to check whether device is support dhcp_relay service.

    How to verify it
    1. Unit test
    2. Build and install in device

    Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com>

commit 484f5943931eef5ac1bd22467eca648aacbeabd3
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Mon Feb 27 23:49:01 2023 -0800

    [GCU] Add Sample Unit Test for RDMA Headroom Pool Size Tuning (#2692)

    * add rdma gcu unit test

    * fix comment

    * clean unused code

    * clean format

    * extend to mock patchapplier, in place of changeapplier

    * replace tabs with spaces

commit fa291e1078be3676130c99bcec840c88c221bf8e
Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Date:   Mon Feb 27 17:49:34 2023 +0800

    Add begin logs to config reload/config minigraph/warm-reboot/fast-reboot (#2694)

    - What I did
    Add more logs for config reload/config minigraph/warm-reboot/fast/reboot to identify in the log (notice level) what was the command executed which could cause a service affect.

    - How I did it
    Add more logs for config reload/config minigraph/warm-reboot/fast/reboot.

    - How to verify it
    Manual test

commit d58c4fbcbb5dd3b1be004926bf0584c2594049d7
Author: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com>
Date:   Mon Feb 27 11:14:54 2023 +0800

    Revert "Secure upgrade (#2337)" (#2675)

    This reverts commit 6fe8599216afb1c302e77c52235c4849be6042b2.

commit 15a59c93093e779479a47e79f8bd4d5772d1fbdd
Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com>
Date:   Fri Feb 24 12:46:36 2023 -0800

    [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2414)

    This PR adds the support for adding some utility commands for muxacble
    This includes commands for health, operationtime, queueinfo, resetcause

    vdahiya@sonic:~$ show mux health Ethernet4
    PORT          ATTR               HEALTH
    ---------     ---------------   --------
    Ethernet4     health_check       Ok
    vdahiya@sonic:~$ show mux health Ethernet4 --json
    {
        "health_check": "Ok"
    }

    vdahiya@sonic:~$ show mux operation Ethernet4 --json
    {
        "operation_time": "22:22"
    }
    vdahiya@sonic:~$ show mux operation Ethernet4
    PORT       ATTR              OPERATION_TIME
    ---------  --------------  ----------------
    Ethernet4  operation_time                 22:22
    vdahiya@sonic:~$

    vdahiya@sonic:~$ show mux resetcause Ethernet4
    PORT       ATTR           RESETCAUSE
    ---------  -----------  ------------
    Ethernet4  reset_cause             0

    vdahiya@sonic:~$ show mux resetcause Ethernet4 --json
    {
        "reset_cause": "0"
    }

    vdahiya@sonic:~$ show mux queueinfo Ethernet4 --json
    {
        "Remote": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}",
        "Local": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}"
    }

commit 07675feb09544f095e9a867634a16d1dee825a69
Author: Mai Bui <maibui@microsoft.com>
Date:   Fri Feb 24 12:26:32 2023 -0500

    Replace pickle by json (#2636)

    Signed-off-by: maipbui <maibui@microsoft.com>
    #### What I did
    `pickle` can lead to lead to code execution vulnerabilities. Recommend to serializing the relevant data as JSON.
    #### How I did it
    Replace `pickle` by `json`
    #### How to verify it
    Pass UT
    Manual test

commit 56a9d69bc79eda9d67953ed21fd42221b58ee04d
Author: Yaqiang Zhu <yaqiangzhu@microsoft.com>
Date:   Thu Feb 16 02:31:01 2023 +0800

    [dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while add vlan (#2678)

    What I did
    Remove add field of vlanid to DHCP_RELAY table while add vlan which would cause conflict with yang model.

    How I did it
    Remove add field of vlanid to DHCP_RELAY table while add vlan

    How to verify it
    By unit tests

    Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com>

commit 8f7f8bd1810328fc0faa85b23f2033aa3fc61191
Author: davidpil2002 <91657985+davidpil2002@users.noreply.github.com>
Date:   Tue Feb 14 11:38:53 2023 +0200

    Add support of secure warm-boot (#2532)

    - What I did
    Add support of secure warm-boot to SONiC.
    Basically, warm-boot is supporting to load a new kernel without doing full/cold boot.
    That is by loading a new kernel and exec with kexec Linux command. As a result of that, even when the Secure Boot feature is enabled, still a user or a malicious user can load an unsigned kernel, so to avoid that we added the support of the secure warm boot.
    More Description about this feature can be found in the Secure Boot HLD: sonic-net/SONiC#1028

    - How I did it
    In general, Linux support it, so I enabled this support by doing the follow steps:

    I added some special flags in Linux Kernel when user build the sonic-buildimage with secure boot feature enabled.
    I added a flag "-s" to the kexec command
    Note: more details in the HLD above.

    - How to verify it
    * Good flow:
    manually just install with sonic-installed a new secure image (a SONiC image that was build with Secure Boot flag enabled)
    after the secure image is installed, do:
    warm-reboot
    Check now that the new kernel is really loaded and switched.
    * Bad flow:
    Do the same steps 1-2 as a good flow but with an insecure image (SONiC image that was built without setting Secure Boot enabled)
    After the insecure image is installed, and triggered warm-boot you should get an error that the new unsigned kernel from the unsecured image was not loaded.
    Automation test - TBD

commit a05ce562e37463a7ff8d8c012aca347c8bb45e03
Author: Yaqiang Zhu <yaqiangzhu@microsoft.com>
Date:   Tue Feb 14 09:18:37 2023 +0800

    [doc] Add docs for dhcp_relay show/clear cli (#2649)

    What I did
    Add docs for dhcp_realy show/clear cli

    How I did it
    Add docs for dhcp_realy show/clear cli

    Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com>

commit 3228979b2aa0de90444f385a8f6f1c8c66fd0e09
Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com>
Date:   Mon Feb 13 11:04:58 2023 -0800

    [portstat CLI] don't print reminder if use json format (#2670)

    * no print if use json format
    * add print for chassis

commit b741628f5f30283b40b75b784e1daf57671ae6d8
Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
Date:   Mon Feb 13 13:03:12 2023 +0200

    [generate_dump] Revert "Revert generate_dump optimization PR's #2599", add fixes for empty /dump forder and symbolic links (#2645)

    - What I did
    0ee19e5 Revert Revert the show-techsupport optimization PR's #2599
    c8940ad Add a fix for the empty /dump folder inside the final tar archive generated by the show techsupport CLI command.
    8a8668c Add a fix to not follow the symbolic links to avoid duplicate files inside the final tar archive generated by the show techsupport CLI command.

    - How I did it
    Modify the scripts/generate_dump script.

    - How to verify it
    1. Manual verification
    do the show techsupport CLI command and save output original.tar.gz (with original generate_dump script)
    do the show techsupport CLI command and save output fixes.tar.gz (with the generate_dump script modified by this PR)
    unpack both archives original.tar.gz and fixes.tar.gz
    compare both directories with ncdu & diff --brief --recursive original fixes Linux utilities
    2. Run the community tests
    sonic-mgmt/tests/show_techsupport

    Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

commit 96d5c2d5fcc1967b0f5f517ccc490e3b95be3585
Author: Yaqiang Zhu <yaqiangzhu@microsoft.com>
Date:   Fri Feb 10 17:49:38 2023 +0800

    [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (#2660)

    What I did
    Currently, add/del a vlan doesn't change related dhcpv6_relay config, which is incorrect.

    How I did it
    1. Add dhcp_relay table init entry while adding vlan
    2. Delete dhcp_relay related config while deleting vlan
    3. Add unitest

    How to verify it
    1. By unitest
    2. install whl and run cli

    Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com>

commit a090523a9ef07eaab176893b7eaa660930fa5dbf
Author: jingwenxie <jingwenxie@microsoft.com>
Date:   Fri Feb 10 09:13:51 2023 +0800

    [GCU] protect loopback0 from deletion (#2638)

    What I did
    Refer to sonic-net/sonic-buildimage#11171, protect loopback0 from deletion

    How I did it
    Add patch checker to fail the validation when remove loopback0

    How to verify it
    Unit test

commit 18a3d00ad160fd7d890c3f8061cc84b96374f7a3
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Thu Feb 9 05:20:11 2023 +0200

    [config/show] Add command to control pending FIB suppression (#2495)

    * [config/show] Add command to control pending FIB suppression

    What I did
    I added a command config suppress-pending-fib that will allow user to enable/disable this feature.
    Once it is enabled, BGP will wait for route to be programmed to HW before announcing the route to the peers.

    I also added a corresponding show command that prints the status of this feature.

commit 5244e3b5cbc5d6708f56401219a4257d47b4b0f7
Author: mihirpat1 <112018033+mihirpat1@users.noreply.github.com>
Date:   Wed Feb 8 16:39:00 2023 -0800

    Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR (#2630)

    * Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR

    Signed-off-by: Mihir Patel <patelmi@microsoft.com>

    * Added test case for info CLI

    * Updated command reference

    * Resolved merged conflicts

    * Made convert_sfp_info_to_output_string generic for CMIS and non CMIS and added test case to address PR comment

    * Resolved test_multi_asic_interface_status_all failure

    * Addressed PR comments

    ---------

    Signed-off-by: Mihir Patel <patelmi@microsoft.com>

commit 05aedd558dbe901b873e2e2c8e11afc15a67db85
Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com>
Date:   Tue Feb 7 12:30:18 2023 -0800

    [show] add support for gRPC show commands for `active-active` (#2629)

    Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
    This PR adds support for show mux hwmode muxdirection as well as
    show mux grpc muxdirection to show the state of gRPC connected to the SoCs for 'active-active' acble type

    vdahiya@sonic:~$ show mux grpc muxdirection
    Port       Direction    Presence    PeerDirection    ConnectivityState
    ---------  -----------  ----------  ---------------  -------------------
    Ethernet0  active       False       active           READY
    vdahiya@sonic:~$
    vdahiya@sonic:~$ show mux grpc muxdirection --json
    {
        "HWMODE": {
            "Ethernet0": {
                "Direction": "active",
                "Presence": "False",
                "PeerDirection": "active",
                "ConnectivityState": "READY"
            }
        }
    }

    What I did
    Added support for the commands.

    How I did it
    How to verify it
    UT and running the changes on Testbed

commit 9512ccd2d2863d7bcb5e7f42cf60b0be39c61c70
Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com>
Date:   Tue Feb 7 12:14:49 2023 -0800

    [sai_failure_dump]Invoking dump during SAI failure (#2633)

    * Added logic in techsupport script to collect SAI failure dump

commit 4971b7b71067e86c7f86591efc86993aa0c0ce1d
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Tue Feb 7 18:07:52 2023 +0200

    [db_migrator] make LOG_LEVEL_DB migration more robust (#2651)

    It could be that LOG_LEVEL_DB includes some invalid data and/or a KEY_SET that is not cleaned up due to an issue, for example we observed _gearsyncd_KEY_SET set included in the LOG_LEVEL_DB and preserved in warm reboot. However, this key is not of type hash which leads to an exception and migration failure. The migration logic should be more robust allowing users to upgrade even though some daemon has left overs in the LOG_LEVEL_DB or invalid data is written.

    - What I did
    To fix migration issue that leads to device configuration being lost.

    - How I did it
    Wrap the logic in try/except/finally.

    - How to verify it
    202205 -> 202211/master upgrade.

    Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

commit d80ec9722880d7b8a6786a27696bff97ae30b903
Author: siqbal1986 <shahzad.iqbal@microsoft.com>
Date:   Mon Feb 6 12:00:09 2023 -0800

    Fixed a bug in "show vnet routes all" causing screen overrun. (#2644)

    Signed-off-by: siqbal1486 <shahzad.iqbal@microsoft.com>

commit 6b567168bc971cac112681596d828c919d252bc8
Author: mihirpat1 <112018033+mihirpat1@users.noreply.github.com>
Date:   Wed Feb 1 13:48:57 2023 -0800

    show logging CLI support for logs stored in tmpfs (#2641)

    * show logging CLI support for logs stored in tmpfs

    Signed-off-by: Mihir Patel <patelmi@microsoft.com>

    * Fixed testcase failures

    * Reverted unwanted change in a file

    * Added testcase for syslog.1 in log.tmpfs directory

    * mend

    ---------

    Signed-off-by: Mihir Patel <patelmi@microsoft.com>

commit 38e5caadb7caebedb9237a9cd87c927bd6637fe5
Author: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com>
Date:   Wed Feb 1 11:29:49 2023 -0800

    [chassis][voq] Add asic id for linecards so "show fabric counters queue/port" can work. (#2499)

    * Add asic id for linecards so "show fabric counters queue/port" can work.
    * Add test coverage

    ---------

    Signed-off-by: Jie Feng <jfeng@arista.com>

commit 78e5f179772fc951732a33191865efabea77c965
Author: longhuan-cisco <84595962+longhuan-cisco@users.noreply.github.com>
Date:   Wed Feb 1 11:12:41 2023 -0800

    Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR (#2615)

    * Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table

    * Fix alert typo

    * Fix display format and add cd short link

    * Add doc for pm

    * Update Command-Reference.md

commit 8a7609930cae97934719609b42d61ad153c3350d
Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com>
Date:   Wed Feb 1 09:33:14 2023 -0800

    [masic support] 'show run bgp' support for multi-asic (#2427)

    Support 'show run bgp' for multi-asics
    Add mock tables and UTs for single-asic, multi-asic, bgp not running cases

commit 370fe81229f3fbea29d5bf5b9ee2347824056d80
Author: kartik-arista <61531803+kartik-arista@users.noreply.github.com>
Date:   Tue Jan 31 10:19:26 2023 -0800

    Making 'show feature autorestart' more resilient to missing auto_restart config in CONFIG_DB (#2592)

    Fixes BUG 762723

commit e6d880a0249f1f2e0b9d4ef2412e84e9a31b45a2
Author: Yaqiang Zhu <yaqiangzhu@microsoft.com>
Date:   Mon Jan 30 21:07:12 2023 -0800

    [doc] Update docs for dhcp_relay config cli (#2598)

    What I did
    Updated docs about dhcp_relay config cli

    How I did it
    Updated docs about dhcp_relay config cli

    Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com>

commit 9865dda9b7075bc9c788cba893cba329a0548e24
Author: abdosi <58047199+abdosi@users.noreply.github.com>
Date:   Mon Jan 30 17:52:50 2023 -0800

    Skip saidump for Spine Router as this can take more than 5 sec (#2637)

    To address sonic-net/sonic-buildimage#13561 skip saidump on T2 platforms for time-being.

commit 56d41f2581157c31a09da365515ac9df9ebb540b
Author: ycoheNvidia <99744138+ycoheNvidia@users.noreply.github.com>
Date:   Mon Jan 30 23:28:15 2023 +0200

    Secure upgrade (#2337)

    #### What I did
    Added support for secure upgrade

    #### How I did it
    It includes image signing during build (in sonic buildimage repo) and verification during image install (in sonic-utilities).
    HLD can be found in the following PR: https://github.com/sonic-net/SONiC/pull/1024
    #### How to verify it
    Feature is used to allow image was not modified since built from vendor. During installation, image can be verified with a signature attached to it.
    In order for image verification - image must be signed - need to provide signing key and certificate (paths in SECURE_UPGRADE_DEV_SIGNING_KEY and SECURE_UPGRADE_DEV_SIGNING_CERT in rules/config) during build , and during image install, need to enable secure boot flag in bios, and signing_certificate should be available in bios.

    #### Feature dependencies
    In order for this feature to work smoothly, need to have secure boot feature implemented as well.
    The Secure boot feature will be merged in the near future.

    sonic-buildimage PR: https://github.com/sonic-net/sonic-buildimage/pull/11862

commit 0744b19b7321aa33269ee7a76937f21e44c2750c
Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Date:   Tue Jan 31 02:15:01 2023 +0800

    [system-health] Fix issue: show system-health CLI crashes (#2635)

    - What I did
    Fix issue: show system-health CLI crashes

    root@switch:/home/admin# show system-health summary
    Traceback (most recent call last):
      File "/usr/local/bin/show", line 8, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 113, in summary
        _, chassis, stat = get_system_health_status()
      File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 10, in get_system_health_status
        if os.environ["UTILITIES_UNIT_TESTING"] == "1":
      File "/usr/lib/python3.9/os.py", line 679, in __getitem__
        raise KeyError(key) from None
    KeyError: 'UTILITIES_UNIT_TESTING'

    - How I did it
    Use dict.get instead of [] operator.

    - How to verify it
    Manual test

commit c3c4905bb1dac2fd201f4647b730b29424e20013
Author: anamehra <54692434+anamehra@users.noreply.github.com>
Date:   Mon Jan 30 10:05:10 2023 -0800

    Fixed admin state config CLI for Backport interfaces (#2557)

    Fixed admin state config CLI for Backport interfaces
    Fixes sonic-net/sonic-buildimage#13057

commit 3d53f9930084c87bec12c498d8af625ae04a2a05
Author: zhixzhu <44230426+zhixzhu@users.noreply.github.com>
Date:   Tue Jan 31 02:02:33 2023 +0800

    suppport multi asic for show queue counter (#2439)

    Added option -n for both "show queue counter" and "queuestat", using multi_asic module in queuestat to query database of specified namespace.
    Removed function get_queue_port() to decrease the times of connecting database.

commit 5556fafc85edaa1d16276e02e7b34959033ffb29
Author: Baorong Liu <96146196+baorliu@users.noreply.github.com>
Date:   Fri Jan 27 11:19:23 2023 -0800

    [show_bfd] add local discriminator in show bfd command (#2625)

commit 17609919fd461521090113d1de6a77d5062905c9
Author: jingwenxie <jingwenxie@microsoft.com>
Date:   Fri Jan 27 15:48:15 2023 +0800

    [GCU] Ignore bgpraw table in GCU operation (#2628)

    What I did
    After the previous fix #2623 , GCU still fails in the rollback operation. The bgpraw table should be discard in all GCU operation.
    Thus, I change get_config_db_as_json function to crop out "bgpraw" table.

    How I did it
    Pop "bgpraw" table if exists.

    How to verify it
    Unittest

commit 3db8c009a87e246f6f2e16e5e9f22aca264d4c51
Author: Dante (Kuo-Jung) Su <dante.su@broadcom.com>
Date:   Thu Jan 26 01:30:55 2023 +0800

    Add interface link-training command into the CLI doc (#2257)

    * LT Admin/Oper: Use 'N/A' when the data is unavailable

    Signed-off-by: Dante Su <dante.su@broadcom.com>

    * fix test failure

    Signed-off-by: Dante Su <dante.su@broadcom.com>

    * fix coverage failure

    Signed-off-by: Dante Su <dante.su@broadcom.com>

    * [doc]: Update Command-Reference.md (#2257)

    Add interface link-training command into the CLI doc
    Use 'N/A' if link-training attribute is not supported in the SAI.

    Signed-off-by: Dante Su <dante.su@broadcom.com>

    Signed-off-by: Dante Su <dante.su@broadcom.com>

commit 28b255afedb04a9214ca8a7bf10c38c5c64d4c48
Author: jingwenxie <jingwenxie@microsoft.com>
Date:   Wed Jan 25 08:51:16 2023 +0800

    [GCU] Ignore bgpraw in GCU applier (#2623)

    What I did
    show run all output will include bgpraw for business needs. GCU ipv6 test will update BGP_NEIGHBOR table which caused bgpraw content change, which will make the apply-patch operation fail. The solution is to add bgpraw to ignored tables.

    How I did it
    Add new added bgpraw table to ignored backend table.

    How to verify it
    Existing Unit test and local E2E GCU test.

commit ff5167a1c4f2289b1c7b5cf23c802fa3ccde673a
Author: Jing Zhang <zhangjing@microsoft.com>
Date:   Mon Jan 23 15:49:32 2023 -0800

    [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622)

    This PR is to add CLI support to enable or disable the feature to send out a good-bye packet when radv service is stopped on active-active dualtor devices.

    sign-off: Jing Zhang zhangjing@microsoft.com

commit ed1d3c99b60bf8547342a1f98f349eac264fe887
Author: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com>
Date:   Mon Jan 23 13:23:31 2023 -0800

    [chassis][voq] Add "show fabric reachability" command. (#2528)

    What I did
    Added "show fabric reachability" command.

    The output of this command :

      Local Link    Remote Module    Remote Link    Status
    ------------  ---------------  -------------  --------
               0              304            171        up
               1              304            156        up
               2              304            147        up

    Added test for the change at tests/fabricstat_test.py. The test is at sonic-net/sonic-mgmt#6620

commit 049bacf95babe50d32d90c68cf7b4825f5a64b46
Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
Date:   Mon Jan 23 17:39:58 2023 +0200

    Revert (#2599)

    b34a540c [generate_dump] Fix for deletion flow for all secret files from show-techsupport dump (#2571)
    258ffa09 [generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution (#2512)
    572c8cff Optimize the execution time of the 'show techsupport' script to 5-10%, (#2504)

    This reverts commits
    b34a540cca5555ab3aa74e19e81f24c2a20d311b
    258ffa0928ce2c74ebdc180e13c6476dc2534983
    572c8cffdddb7683e158d36067398600a71512ea

commit fafb0dfef95607b5b7dc2da0307ebb2bcd4508bf
Author: Saikrishna Arcot <sarcot@microsoft.com>
Date:   Thu Jan 19 14:42:14 2023 -0800

    [warm-reboot] Use kexec_file_load instead of kexec_load when available (#2608)

    On some dev VMs, warm reboot on a VS image fails. Specifically, after
    kexec is called and the new kernel starts, the new kernel tries to load
    the initramfs, but fails to do so for whatever reason. There may be
    messages about gzip decompression failing and that it's corrupted.

    After some experimentation, it was found that when first loading the new
    kernel and initramfs into memory, using the `kexec_file_load` syscall
    (`-s` flag in kexec) worked fine, whereas using the default `kexec_load`
    syscall resulted in a failure. It's unknown why `kexec_file_load` worked
    fine when `kexec_load` didn't; there shouldn't be any difference for
    non-secure boot kernels, as far as I can tell. What was seen, however,
    was that when taking a KVM dump in the failure case, the memory that
    stored the initramfs had differences compared to what was on disk. It's
    unknown what caused these differences.

    As a workaround (and as a bit of a feature enhancement), use the `-a`
    flag with kexec, which tells it to use `kexec_file_load` if available,
    and `kexec_load` if it's not available or otherwise fails. armhf doesn't
    support `kexec_file_load`, whereas arm64 gained support for
    `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64`
    has supported `kexec_file_load` since 3.17. This also makes it possible
    to do kexec on secure boot systems, where the kernel image must be
    loaded via `kexec_file_load`.

    Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

    Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

commit 954d9e9f7b1678cc794af34ef1ef782bec8e2ee4
Author: pettershao-ragilenetworks <81281940+pettershao-ragilenetworks@users.noreply.github.com>
Date:   Fri Jan 20 06:17:18 2023 +0800

    fix show techsupport error (#2597)

    *Modify the order of "--allow-process-stop" option, it belongs to 'generate_dump'.

commit 3c8a9309e5a409dd008b84159ea3924209dbf0bf
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Thu Jan 19 14:01:17 2023 -0600

    [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field (#2545)

commit bde706b846e0c47e748ed3491177b3d5ad054175
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Thu Jan 19 17:33:38 2023 +0200

    [techsupport] include APPL_STATE_DB dump (#2607)

    - What I did
    I added APPL_STATE_DB to techsupport dump

    - How I did it
    Added a call to save APPL_STATE_DB

    - How to verify it
    Run techsupport and verify dump/APPL_STATE_DB.json

    Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

commit cb3d462db82894eb38f1f3f6edd7f39f5a09a060
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Tue Jan 17 15:31:46 2023 -0600

    YANG Validation for ConfigDB Updates: RADIUS_SERVER (#2604)

    #### What I did
    Add YANG validation using GCU for writes to RADIUS_SERVER table in ConfigDB
    #### How I did it
    Using same method as https://github.com/sonic-net/sonic-utilities/pull/2190/files, extend to RADIUS table
    #### How to verify it
    verified testing on virtual switch CLI, unit tests

commit b01737974e227040e5c3f0e1c48a4b4e8839c4e3
Author: Lior Avramov <73036155+liorghub@users.noreply.github.com>
Date:   Tue Jan 17 18:37:54 2023 +0200

    Remove TODO comment which is no longer relevant (#2600)

commit 521ecfd54317291014c584ecf7c11997381ab7c8
Author: jingwenxie <jingwenxie@microsoft.com>
Date:   Sat Jan 14 09:34:36 2023 +0800

    [show] Add bgpraw to show run all (#2537)

    #### What I did
    Add bgpraw output to `show runningconfiguration all`
    ```
    Requirements:
    1. current `show runningconfig` will print all the ConfigDB in a json format, we need to add a new key-value into the json output "bgpraw" with a long string value
    2. The long string value should be the output of `vtysh -c "show run"`. It is normally multiline string, may include special characters like \". Need to make sure the escaping properly
    3. We do not need to insert the key-value into ConfigDB is not existing there
    4. If ConfigDB already has the key-value, we do not need to override it by vtysh command output
    5. Not break multi-asic use
    ```
    #### How I did it
    Generate bgpraw output then append it to `show runnningconfiguration all`'s output
    #### How to verify it
    Mannual test
    #### Previous command output (if the output of a command-line utility has changed)
    ```
    admin@vlab-01:~$ show run all
    {
        "ACL_TABLE": {
    ......
        "WRED_PROFILE": {
            "AZURE_LOSSLESS": {
                "ecn": "ecn_all",
                "green_drop_probability": "5",
                "green_max_threshold": "2097152",
                "green_min_threshold": "1048576",
                "red_drop_probability": "5",
                "red_max_threshold": "2097152",
                "red_min_threshold": "1048576",
                "wred_green_enable": "true",
                "wred_red_enable": "true",
                "wred_yellow_enable": "true",
                "yellow_drop_probability": "5",
                "yellow_max_threshold": "2097152",
                "yellow_min_threshold": "1048576"
            }
        }
    }
    ```
    #### New command output (if the output of a command-line utility has changed)
    ```
    admin@vlab-01:~$ show run all
    {
        "ACL_TABLE": {
    ......
        "WRED_PROFILE": {
            "AZURE_LOSSLESS": {
                "ecn": "ecn_all",
                "green_drop_probability": "5",
                "green_max_threshold": "2097152",
                "green_min_threshold": "1048576",
                "red_drop_probability": "5",
                "red_max_threshold": "2097152",
                "red_min_threshold": "1048576",
                "wred_green_enable": "true",
                "wred_red_enable": "true",
                "wred_yellow_enable": "true",
                "yellow_drop_probability": "5",
                "yellow_max_threshold": "2097152",
                "yellow_min_threshold": "1048576"
            }
        },
        "bgpraw": "Building configuration...\n\nCurrent configuration......end\n"
    }
    ```

commit 83295189cab227d640839c0079207bf17b6442d8
Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com>
Date:   Fri Jan 13 05:47:22 2023 +0200

    Extend fast-reboot STATE_DB entry timer (#2577)

    *Due to an issue of fallback from to cold-boot when using upgrade with fast-reboot combined with FW upgrade a short term solution is to extend the timer. Long term solution of using fast-reboot finalizer replacing the timer is in work.

commit 68a11e77212c09d87d98b4a4724f57e06e6442da
Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com>
Date:   Wed Jan 11 10:18:07 2023 +0200

    Preserve copp tables through DB migration (#2524)

    This PR should be merged together with sonic-net/sonic-swss#2548 and is required to 202205 and 202211.
    This PR implements [fastboot] Preserve CoPP table HLD to improve fastboot flow (sonic-net/SONiC#1107).

    - What I did
    Preserve COPP table contents through DB migration. (Mellanox only)

    - How I did it
    Skipped deleting of COPP tables in DB migrator.

    - How to verify it
    Observe COPP table contents are preserved right after reboot.

commit c236b83a7afea4c5479c7cb18555f301847f080c
Author: CliveNi <clive.ni@cloudlight.com.hk>
Date:   Tue Jan 10 01:11:19 2023 +0800

    [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) (#2349)

    * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947)

    - Description
    Checking that the running image is switched or not after CDB_run during firmware upgrade process.

    - Motivation and Context
    CDB_run will maybe cause several seconds NACK or stretching on i2c bus which depend on the implementation of module vendor, checking the status after CDB_run for compatible with different implementation.

    * Update unit tests for sfputil.

    Test : Creating "is_fw_switch_done" test, this function expected to return 1 when 'status' == True and running image('result'[1, 5]) different with committed('result'[2, 6]) one, otherwise return -1.

    * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947)

    - Description
    Adding error judgements in "is_fw_switch_done" function.
    Update unit tests for "is_fw_switch_done".

    - Motivation and Context
    Checking status of images to avoid committing image with a wrong status.

    * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947)

    Fixing : Comparing error code with a wrong variable.
    Refactor : Renaming variables for more suitable its purpose.
    Refactor : Removing if case which is low correlation with function.
    Feat : Adding "echo" to display detail result.

    * Update unit tests for sfputil.

    * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947)

    Feat : Reducing frequency of check during "is_fw_switch_done".
    Refactor : Removing a repeated line.

commit 5ac55f06fc3efcfc02450ff33410b1df2e290ddd
Author: Qi Luo <qiluo-msft@users.noreply.github.com>
Date:   Fri Jan 6 17:37:51 2023 -0800

    Revert "sonic-utilities: Update config reload() to verify formatting of an input file (#2529)" (#2586)

    This reverts commit 42f51c26d1d0017f3211904ca19c023b5d784463.

    Reverts sonic-net/sonic-utilities#2529

    Reason: There are use cases like `config reload /dev/stdin`, for example [L2 Switch mode · sonic-net/SONiC Wiki (github.com)](https://github.com/sonic-net/SONiC/wiki/L2-Switch-mode). The original PR would read input file twice, so /dev/stdin does not work.

commit 2dc17968b6fa95289aa98fa30ff57eb87afaf231
Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com>
Date:   Fri Jan 6 15:24:02 2023 -0800

    [masic] 'show interfaces counters' reminds to use '-d all' option to check for internal links (#2466)

    Print reminder to check internal links on multi-asic platforms
    Signed-off-by: Wenyi Zhang <wenyizhang@microsoft.com>

commit 551836f524504cbcf7e9066bfa64104912a545c1
Author: Jing Zhang <zhangjing@microsoft.com>
Date:   Fri Jan 6 13:28:14 2023 -0800

    [storyteller] add link prober state change to story teller (#2585)

    What I did
    Add linkprober category to story teller. It will reflect dualtor heartbeat events.

    sign-off: Jing Zhang zhangjing@microsoft.com

    How to verify it
    Tested on dualtor device, was able to grep link prober state change events.

commit bfe85fdbd6f4244a0c4d5903a3e6cf75e87f68e6
Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
Date:   Tue Jan 3 11:21:52 2023 +0200

    [generate_dump] Fix for deletion flow for all secret files from show-techsupport dump (#2571)

    - What I did
    Fixed a deletion flow for all secret files in the tech support dump.

    - How I did it
    Delete files by using the find and rm Linux utilities.

    - How to verify it
    Run the show_techsupport/test_techsupport_no_secret.py

    Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>

commit 80162b0bf02d6dff88c503a7c7310a7b0a287531
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Mon Jan 2 15:01:09 2023 +0200

    [sonic_installer] use /etc/resolv.conf from the host when migrating packages (#2573)

    - What I did
    SONiC package migration has been failing due to the lack of DNS configuration for registries domain names.
    I used /etc/resolv.conf from host OS when migrating.

    - How I did it
    Copy /etc/resolv.conf into new image filesystem during migration, then, restore it back.

    - How to verify it
    Run sonic-installer install.

    Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

commit f22d6b0067d570b550b43ed98693cf23bf82a35b
Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Date:   Thu Dec 29 15:37:38 2022 +0800

    [Mellanox] Change severity to NOTICE in Mellanox buffer migrator when unable to fetch DEVICE_METADATA due to empty CONFIG_DB during initialization (#2569)

    - What I did
    It is expected that db_migrator is not able to fetch DEVICE_METADATA when it is invoked before the CONFIG_DB is initialized. In this case, we should not use ERROR to log the message since it's not an error.
    Change the severity to NOTICE

    - How I did it
    Change the severity.

    - How to verify it
    Manually test.

    Signed-off-by: Stephen Sun <stephens@nvidia.com>

commit 18f9ae1b0e1b4a02646f389b696553074867dcbc
Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Date:   Mon Dec 26 16:00:31 2022 +0800

    Fix issue: unconfigured PGs are displayed in watermarkstat (#2556)

    - What I did
    All the PGs between minimal and maximal indexes are displayed regardless of whether they are configured.
    Originally, watermark counters were enabled for all PGs, so there is no issue.
    Now, watermark counters are enabled only for PGs with buffer configured, eg. if PG 0/2/3/4/6, is configured, PG 0-6 will be displayed, which is confusing by giving users a feeling that PG 7 is lost

    - How I did it
    Display valid PGs only

    - How to verify it
    Manually test and unit test.

    - Previous command output (if the output of a command-line utility has changed)
           Port    PG0    PG1    PG2    PG3    PG4
    -----------  -----  -----  -----  -----  -----
     Ethernet0      0      0      0      0      0
     Ethernet2      0      0      0      0      0
     Ethernet8      0      0      0      0      0
    Ethernet10      0      0      0      0      0
    Ethernet16      0      0      0      0      0
    Ethernet18      0      0      0      0      0
    Ethernet32      0      0      0      0      0

    - New command output (if the output of a command-line utility has changed)
    PG1 won't be displayed if it is not configured

           Port    PG0    PG3    PG4
    -----------  -----  -----  -----
     Ethernet0      0      0      0
     Ethernet2      0      0      0
     Ethernet8      0      0      0
    Ethernet10      0      0      0
    Ethernet16      0      0      0
    Ethernet18      0      0      0
    Ethernet32      0      0      0

    Signed-off-by: Stephen Sun <stephens@nvidia.com>

commit 78566674edd15dd8aa618fcde520a7b170452840
Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Date:   Tue Dec 20 17:05:23 2022 +0800

    [Command Ref] Add doc for syslog rate limit (#2508)

    - What I did
    Add command reference doc for syslog rate limit feature

    - How I did it
    Add command reference doc for syslog rate limit feature

    - How to verify it
    Manual check

    Previous command output (if the output of a command-line utility has changed)
    New command output (if the output of a command-line utility has changed)
      admin@sonic:~$ show syslog rate-limit-container
      SERVICE         INTERVAL    BURST
      --------------  ----------  -------
      bgp             0           0
      database        300         20000
      lldp            300         20000
      mgmt-framework  300         20000
      pmon            300         20000
      radv            300         20000
      snmp            300         20000
      swss            300         20000
      syncd           300         20000
      teamd           300         20000
      telemetry       300         20000
      admin@sonic:~$ show syslog rate-limit-container bgp
      SERVICE         INTERVAL    BURST
      --------------  ----------  -------
      bgp             0           0

commit 5181264f203e1fa5f74ca069ca2a58f4e192d718
Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
Date:   Tue Dec 20 11:04:02 2022 +0200

    [generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution (#2512)

    - What I did
    Optimize the execution time of the 'show techsupport' script.

    - How I did it
    The show techsupport CLI command calls the generate_dump bash script. In the script, there are a many functions that do the next scenario:

    1. Run some CLI command
    2. Save output from step 1 to the temporary file
    3. Append the temporary file from step 2 to the `/var/dump/sonic_dump_XXXX.tar` file
    4. Delete the temporary file from step 2
    This PR will add the execution of these functions in parallel manner. Also, it will not spawn too many processes to not waste all CPU time.

    - How to verify it
    First test scenario

    Run the `time show techsupport` CLI command and compare the execution time to the original script (with no parallelism), the execution time will be decreased by 10-20%.
    Second test scenario

    1. Stuck the FW by using next commands
    	a. mcra /dev/mst/mt52100_pci_cr0 0xa01e4 0x10
    	b. mcra /dev/mst/mt52100_pci_cr0 0xa05e4 0x10
    	c. mcra /dev/mst/mt52100_pci_cr0 0xa07e4 0x10
    	d. mcra /dev/mst/mt52100_pci_cr0 0xa09e4 0x10
    	e. mcra /dev/mst/mt52100_pci_cr0 0xa0be4 0x10
    	f. mcra /dev/mst/mt52100_pci_cr0 0xa0de4 0x10
    	g. mcra /dev/mst/mt52100_pci_cr0 0xa0fe4 0x10
    2. Run the `time show techsupport` CLI command and compare the execution time to the original script (with no parallelism), the execution time will be decreased by up to 50% because inside the script we launch CLI commands with `timeout --foreground 5m`.

    Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>

commit 1ca3fedc4575c04b6578e6c5c66dac353be27072
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Mon Dec 19 07:32:18 2022 +0200

    [timer.unit.j2] use wanted-by in timer unit (#2546)

    Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

    Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

commit b3e7f1c07d18542d700b8162c164fbd24544f505
Author: Preetham <51771885+preetham-singh@users.noreply.github.com>
Date:   Fri Dec 16 23:38:03 2022 +0530

    Fixes #12170: Delete subinterface and recreate the subinterface  in (#2513)

    * Fixes #12170: Delete subinterface and recreate the subinterface  in default-vrf while unbinding subinterface from user defined vrf.

commit 44c5d1c23a1632cfb316ae1d93b3e4cbeeb3934e
Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
Date:   Thu Dec 15 23:21:58 2022 -0800

    [db_migrator] Fix migration of Loopback data: handle all Loopback interfaces (#2560)

    Fix the issue where cross branch upgrades (base DB version 1_0_1) lead to a OA crash due to a duplicate IP2ME route being added when there are more than one Loopback interfaces.

    The issue happens as in current implementation lo is hardcoded to be replaced as Loopback0.
    When the base image's APP DB has more than one IP assigned to lo interface, upon migration, all the IPs are assinged to same loopback Loopback0. This is incorrect, as in newer images different IPs are assinged to distinct Loopback interfaces.

    How to verify it
    Verified on a physical testbed that this change fixes the OA crash issue.
    Also added a unit test to catch this issue in PR tests.

commit ccefd454dd53c6332815b28a58fa20fd24215fdc
Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
Date:   Thu Dec 15 09:55:08 2022 +0200

    Optimize the execution time of the 'show techsupport' script to 5-10%, (#2504)

    - What I did
    Optimize the execution time of the 'show techsupport' script to 5-10%.

    - How I did it
    The show techsupport CLI command calls the generate_dump bash script. In the script, there are a many functions that do the next scenario:

    1. Run some CLI command
    2. Save output from step 1 to the temporary file
    3. Append the temporary file from step 2 to the `/var/dump/sonic_dump_XXXX.tar` file
    4. Delete the temporary file from step 2
    This PR removes the 3 and 4 step from those functions and creates a new function save_to_tar() which will add to .tar archive the whole directory with temporary files (which means it will not spawn a tar -v -rhf ... process for each temporary file)

    - How to verify it
    Run the time show techsupport CLI command and compare the execution time to the original script, the execution time will be decreased by 5-10%.

    Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>

commit 4f825d9849a7def94fa3dfe9be8f22a88f50aa1f
Author: Jing Zhang <zhangjing@microsoft.com>
Date:   Wed Dec 14 10:13:43 2022 -0800

    [muxcable][show] update `show mux tunnel-route` to separate ASIC and kernel into two columns (#2553)

    Stemming from sonic-net/sonic-swss#2557.

    This PR is to update show mux tunnel-route command to show status of both ASIC and kernel tunnel routes.

    sign-off: Jing Zhang zhangjing@microsoft.com

    What I did
    How I did it
    How to verify it
    Previous command output (if the output of a command-line utility has changed)
    Only check Kernel Route, if removing tunnel route for server_ipv4 in kernel, it won't show in CMD output:

    zhangjing@********************:~$ show mux tunnel-route Ethernet4
    PORT       DEST_TYPE    DEST_ADDRESS
    ---------  -----------  --------------------------------
    Ethernet4  server_ipv6  2603:10b0:d11:8614::a32:9112/128
    New command output (if the output of a command-line utility has changed)
    Check both ASIC and APP DB for tunnel route status

    zhangjing@********************:~$ show mux tunnel-route Ethernet4
    PORT       DEST_TYPE    DEST_ADDRESS                      kernel    asic
    ---------  -----------  --------------------------------  --------  ------
    Ethernet4  server_ipv4  10.50.145.18/32                   -         added
    Ethernet4  server_ipv6  2603:10b0:d11:8614::a32:9112/128  added     added

commit d5465ed5b22ead0101ad2aaabf44050773968cfd
Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com>
Date:   Tue Dec 13 23:27:57 2022 -0800

    [show]Fix show route return code on error (#2542)

    - What I did
    Fix show route return command to return error code on failure cases. The parameter return_cmd=True in run_command will suppress the return code and return success even in error scenarios.

    - How I did it
    When run command is called with return_cmd = True, modified its return to include return code, which can then be used to assess if there is an error by the caller

    - How to verify it
    Added UT to verify it

    - Previous command output (if the output of a command-line utility has changed)
    root@sonic:/home/admin# show ip route 123
    % Unknown command: show ip route 123

    root@sonic:/home/admin# echo $?
    0

    - New command output (if the output of a command-line utility has changed)
    root@sonic:/home/admin# show ip route 123
    % Unknown command: show ip route 123

    root@sonic:/home/admin# echo $?
    1

commit 14936d7ef6a46745dd9d8b6c07e0de476695cd6e
Author: Lawrence Lee <lawlee@microsoft.com>
Date:   Mon Dec 12 17:07:27 2022 -0800

    [route_check]: Ignore ASIC only SOC IPs (#2548)

    * [tests]: Improve route check test

    - Split test into separate methods based on functionality being tested
    - Parametrize main test method for better granularity when viewing results/running test cases
    - Add config DB mocking support
    - Move some setup/teardown code to fixtures for better consistency
    - Extract test data to separate file
    - Ignore routes for SOC IPs that are only present in the ASIC
    - Add test case to cover ASIC only SOC IPs

    Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

commit 609f18fed063cf5c299328e2f6ca36c907cc1883
Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com>
Date:   Thu Dec 8 11:12:50 2022 -0600

    YANG Validation for ConfigDB Updates: WARM_RESTART, SFLOW_SESSION, SFLOW, VXLAN_TUNNEL, VXLAN_EVPN_NVO, VXLAN_TUNNEL_MAP, MGMT_VRF_CONFIG, CABLE_LENGTH, VRF tables (#2526)

commit 83aa5fb9e7671ee9871af4ccee764ad5ef84cf0f
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Fri Mar 17 19:15:47 2023 +0000

    UT coverage

commit da9db448985ce138059e8c18b986a8c8e70035d1
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Tue Feb 7 02:17:25 2023 +0000

    add sflow collector

commit c4caaf80adec48a2010c3be3e8fb0fedf696da9b
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Mon Feb 6 21:54:44 2023 +0000

    fix UT

commit cda464a32738d37e71589dc0ac982be43908734d
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Sat Feb 4 09:06:35 2023 +0000

    fix UT

commit 1251c805a6adc370c0aeb37dd06d3927a712468b
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Wed Feb 1 02:38:19 2023 +0000

    fix UT

commit c704b71c893f17062f2b7eab54ebc9ad65fa2a6c
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 07:22:54 2023 +0000

    fix UT

commit f83abb274c4e02ee31e3df0353cbbc784d9601a0
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 05:32:03 2023 +0000

    fix UT

commit 966a0e0fe75cc542c6ca4f3dcca54f9b5d54e25b
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 04:37:26 2023 +0000

    add UT

commit 92a5dc20feb6e19a0515c1ce7e41c75efe5281ae
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Thu Mar 23 00:55:15 2023 +0000

    add UT

commit 34a61f3bb7c25fb2b3c8aedceeb311f5c3c8ef84
Merge: 582bac06 10f31ea6
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Wed Mar 22 22:11:09 2023 +0000

    Merge remote-tracking branch 'origin/master' into mux_mclag

commit 10f31ea6fb0876f913cfcfce8c95011e675a99f6
Author: Mai Bui <maibui@microsoft.com>
Date:   Tue Mar 21 00:25:39 2023 -0400

    Revert "Replace pickle by json (#2636)" (#2746)

    This reverts commit 54e26359fccf45d2e40800cf5598a725798634cd.
    Due to https://github.com/sonic-net/sonic-buildimage/issues/14089
    Signed-off-by: Mai Bui <maibui@microsoft.com>

commit 05fa7513355cf333818c480fade157bdff969811
Author: abdosi <58047199+abdosi@users.noreply.github.com>
Date:   Fri Mar 17 16:27:48 2023 -0700

    Fix the `show interface counters` throwing exception on device with no external interfaces (#2703)

    Fix the `show interface counters` throwing exception
    issue where device do not have any external ports and all are internal links (ethernet or fabric) which is possible in chassis

commit 582bac065ee067db6ad06ca71296fc70a4ebcb57
Author: isabelmsft <isabel.li@microsoft.com>
Date:   Fri Mar 17 19:15:47 2023 +0000

    UT coverage

commit f27dea0cfdefbdcfc03d19136e4ae47ea72fd51f
Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Date:   Fri Mar 17 09:10:47 2023 +0200

    [route_check] remove check-frr_patch mock (#2732)

    The test fails with python3.7 (works in 3.9) when stopping patch which hasn't been started. We can always mock check_output call and if FRR_ROUTES is not defined return empty dictionary by the mock.

    #### What I did

    Removed check_frr_patch mock to fix UT running on python3.7

    #### How I did it

    Removed the mock

    #### How to verify it

    Run unit test in st…
@kcudnik
Copy link
Contributor

kcudnik commented May 11, 2023

how big is the route table on this? and does it contain ipv6 prefixes ? it could be related to this: redis/redis#8077,
is this during warm reboot ? this is T2 so we don't do worm reboot there ? so this 5 seconds should not impact anything

@rlhui
Copy link
Contributor

rlhui commented Jun 23, 2023

how big is the route table on this? and does it contain ipv6 prefixes ? it could be related to this: redis/redis#8077, is this during warm reboot ? this is T2 so we don't do worm reboot there ? so this 5 seconds should not impact anything

@kcudnik - there is no warm boot for T2. the route table size is 32k ipv4, 32k ipv6.

malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this issue Aug 3, 2023
@judyjoseph
Copy link
Contributor

Looks like https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua is the lua script getting called -- which inturn go through the ASIC_DB for all routes for eg: and dump it.

So in case of T2 when we have more route entries in ASIC DB, this takes more CPU -- and swss times out

Can we use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping trough each entry in table and saving it ? @kcudnik I can investigate a bit more in this direction

@kcudnik
Copy link
Contributor

kcudnik commented Aug 17, 2023

yes dump will take all objects including routes, how many of them there are? and why this is a problem if its takes 5sec? this is not T0

@rlhui rlhui assigned mlok-nokia and unassigned kcudnik Aug 18, 2023
@mlok-nokia
Copy link
Contributor

@kcudnik @judyjoseph Any progress on this issue?

@kcudnik
Copy link
Contributor

kcudnik commented Aug 22, 2023

please provide dump of all objects i requested

JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 1, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  #1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  #2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  sonic-net#3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  sonic-net#4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  sonic-net#5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-utilities that referenced this issue Sep 1, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  sonic-net#1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  sonic-net#2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  sonic-net#3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  sonic-net#4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  sonic-net#5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-utilities that referenced this issue Sep 1, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 1, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-buildimage that referenced this issue Sep 1, 2023
•	Saidump for DNX-SAI sonic-net#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-utilities that referenced this issue Sep 5, 2023
    •       Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

    Solution and modification:
    To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

    (1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
    (2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
    (3) Add a new script file: files/scripts/saidump.sh, to do the below steps
      For each ASIC0, such as ASIC0,

      1. Save the Redis data.
      sudo sonic-db-cli -n asic$1 SAVE > /dev/null

      2. Move dump files to /var/run/redisX/
      docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

      3. Run rdb command to convert the dump files into JSON files
      sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

      4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
      docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

      5. clear
      sudo rm -f /var/run/redis$1/dump.rdb
      sudo rm -f /var/run/redis$1/dump.json

    (4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 8, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 8, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
@rlhui
Copy link
Contributor

rlhui commented Sep 9, 2023

yes dump will take all objects including routes, how many of them there are? and why this is a problem if its takes 5sec? this is not T0

it takes much longer than 5 sec

JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 9, 2023
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
JunhongMao added a commit to JunhongMao/sonic-utilities that referenced this issue Sep 19, 2023
sonic-net#2972
SAI DUMP based on the route table size

* [saidump]
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scrip
JunhongMao added a commit to JunhongMao/sonic-utilities that referenced this issue Sep 19, 2023
sonic-net#2972
SAI DUMP based on the route table size

* [saidump]
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 19, 2023
* [saidump]
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 21, 2023
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
JunhongMao added a commit to JunhongMao/sonic-buildimage that referenced this issue Sep 21, 2023
Install rdbtools into syncd docker and add saidump.sh into host.

* [saidump]
• Saidump for DNX-SAI sonic-net#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
JunhongMao added a commit to JunhongMao/sonic-sairedis that referenced this issue Sep 21, 2023
• Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, install Python library rdbtools into the syncd containter.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Updated sonic-buildimage/build_debian.sh, to add a new script file: files/scripts/saidump.sh into the host. This shell file does the below steps:
  For each ASIC0, such as ASIC0,

  1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  3. Run rdb command to convert the dump files into JSON files
  docker exec syncd$1 sh -c "rdb --command json /var/run/redis$1/dump.rdb | tee /var/run/redis$1/dump.json > /dev/null"

  4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json -m 100"

  5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, to check the asic db size and if it is larger than xxx entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
kcudnik pushed a commit to sonic-net/sonic-sairedis that referenced this issue Sep 25, 2023
…file and displays/format the right output (#1288)

Why I did it
Fix issue: sonic-net/sonic-buildimage#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-buildimage#16466
kcudnik pushed a commit to sonic-net/sonic-sairedis that referenced this issue Oct 10, 2023
…he sairedis repo (#1298)

To fix the issue: sonic-net/sonic-buildimage#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
@lguohan lguohan closed this as completed in 4da5099 Nov 8, 2023
judyjoseph pushed a commit to sonic-net/sonic-utilities that referenced this issue Nov 15, 2023
…aidump_by_route_size (#2972)

* * [saidump]
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  #1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  #2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  #3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  #4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  #5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
* * [saidump]
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Nov 19, 2023
…rs. (sonic-net#16466)

Fix sonic-net#13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
StormLiangMS pushed a commit to sonic-net/sonic-utilities that referenced this issue Nov 19, 2023
…aidump_by_route_size (#2972)

* * [saidump]
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561

Solution and modification:
To use the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage/build_debian.sh, to install Python library rdbtools into the host.
(2) Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
(3) Add a new script file: files/scripts/saidump.sh, to do the below steps
  For each ASIC0, such as ASIC0,

  #1. Save the Redis data.
  sudo sonic-db-cli -n asic$1 SAVE > /dev/null

  #2. Move dump files to /var/run/redisX/
  docker exec database$1 sh -c "mv /var/lib/redis/dump.rdb /var/run/redis$1/"

  #3. Run rdb command to convert the dump files into JSON files
  sudo python /usr/local/bin/rdb --command json  /var/run/redis$1/dump.rdb | sudo tee /var/run/redis$1/dump.json > /dev/null

  #4. Run saidump -r to update the JSON files' format as same as the saidump before. Then we can get the saidump result in standard output.
  docker exec syncd$1 sh -c "saidump -r /var/run/redis$1/dump.json"

  #5. clear
  sudo rm -f /var/run/redis$1/dump.rdb
  sudo rm -f /var/run/redis$1/dump.json

(4) Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump, replace saidump with saidump.sh
* * [saidump]
•	Saidump for DNX-SAI sonic-net/sonic-buildimage#13561
StormLiangMS pushed a commit to sonic-net/sonic-sairedis that referenced this issue Nov 19, 2023
…file and displays/format the right output (#1288)

Why I did it
Fix issue: sonic-net/sonic-buildimage#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-buildimage#16466
StormLiangMS pushed a commit to sonic-net/sonic-sairedis that referenced this issue Nov 19, 2023
…he sairedis repo (#1298)

To fix the issue: sonic-net/sonic-buildimage#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Nov 21, 2023
…rs. (sonic-net#16466)

Fix sonic-net#13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Nov 21, 2023
…rs. (sonic-net#16466)

Fix sonic-net#13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
mssonicbld pushed a commit that referenced this issue Nov 21, 2023
…rs. (#16466)

Fix #13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
mssonicbld pushed a commit that referenced this issue Nov 21, 2023
…rs. (#16466)

Fix #13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202205 P0 Priority of the issue Triaged this issue has been triaged
Projects
Archived in project
Development

No branches or pull requests

7 participants