forked from sonic-net/sonic-buildimage
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
201911 #205
Open
bbinxie
wants to merge
806
commits into
SONIC-DEV:201911
Choose a base branch
from
sonic-net:201911
base: 201911
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
201911 #205
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
see below error: + sudo https_proxy= LANG=C chroot ./fsroot easy_install pip==20.3.3 Searching for pip==20.3.3 Reading https://pypi.python.org/simple/pip/ Couldn't find index page for 'pip' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading https://pypi.python.org/simple/ No local packages or working download links found for pip==20.3.3 error: Could not find suitable distribution for Requirement.parse('pip==20.3.3') How I fix: Install python-pip via apt-get Pin the version to 20.3.3 Master has same changes. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
209b7ddec109587ddeb90071ca23ae6a288b1442 (HEAD -> 201911, origin/201911) Fixed the possibility of using uninitialized variable in route_check.py (#1551) e30387cbebaaccbf9385059b1e501955c40be338 route_check: Fix hanging & logging level (#1520) 3c8de6950615a4608a80e3d47ea678f8e8487186 Add self timeout and crash if exceeded. (#1502) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Fix show interface status Ethernet* (#1559) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Fix Bad Merge Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
…tainers. (#7340) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor critical processes in router advertiser and dhcp_relay containers by Monit. How I did it Router advertiser container only ran on T0 device and the T0 device should have at least one VLAN interface which was configured an IPv6 address. At the same time, router advertiser container will not run on devices of which the deployment type is 8. As such, I created a service which will dynamically generate Monit configuration file of router advertiser from a template. Similarly Monit configuration file of dhcp_relay was also generated from a template since the number of dhcrelay process in dhcp_relay container is depended on number of VLANs. How to verify it I verified this implementation on a DuT.
…o poll mode (#7334) #### Why I did it - xcvrd crash was seen in latest 201811 images. - For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode. #### How I did it - Modified get_transceiver_change_event in 1.0 to poll mode in all the related branches. Backport of #7309 to the 201911 branch
…dhcp_relay (#7378) #### Why I did it Since we will have multiple `dhcrelay` processes if there exists different VLANs in the table `VLAN_INTERFACE` of `CONIFG_DB`, we should use unique service name for each `dhcrelay` process in Monit configuration file. Otherwise, Monit service will fail to work. #### How I did it I append the VLAN name to the end of each service name such that they are unique. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
[techsupport] Update show ip interface command (#1562) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6. Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835). However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88) Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6 ---> Running in bece31f49267 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1891k 100 1891k 0 0 9564k 0 --:--:-- --:--:-- --:--:-- 9600k Traceback (most recent call last): File "<stdin>", line 24298, in <module> File "<stdin>", line 139, in main File "<stdin>", line 115, in bootstrap File "<stdin>", line 96, in monkeypatch_for_cert File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module> ImportError: cannot import name 'NoReturn' The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py | python3.6' returned a non-zero code: 1 How I did: Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
New features and fixes in the new SDK/FW: SN4600C | AN/LT support SN2700 | AN/LT bugs fixes WJH | FID_MISS support Signed-off-by: Kebo Liu <kebol@nvidia.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor the critical processes in PMon container by Monit in 201911 branch. How I did it I created a template configuration file of Monit and it will be rendered to generate Monit configuration file of PMon container by a service generate_monit_config.service. How to verify it I verified this on a Mellanox device str-msn2700-03 and an Arista device str-a7050-acs-1. Which release branch to backport (provide reason below if selected) 201811 [x ] 201911 202006 202012
- Fix ACL ANY debug counter to correctly track ACL drops - Add VXLAN source port hard coded range, controlled by K/V Signed-off-by: Dror Prital <drorp@nvidia.com>
…ly (#7501) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
20e1589 [Mellanox] [201911] backport kernel patches for hw-management 7.0100.2303 (#210)
* Set monitoring VLAN hostif up dy default (for VNET ping tool) Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Update hw-mgmt pointer - Remove unused patches - Fix existing patch to make sure it apply successfully
…ollect process (#7308) Recently, we found on some of our testbeds the entropy collecting process finishes more than 60 seconds after system started. This results in swss not able to start sporadically. To install haveged can accelerate the entropy collect process. Signed-off-by: Stephen Sun <stephens@nvidia.com>
Enable VXLAN src port range configuration via SAI profile
[201911]: add show bgp neigh/network support for multi asic (#1587) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it Added soft-reboot plugin support. Added SSD version s16425cq check Added error message to display in console/SSH in case reboot is called in faulty/non-upgraded devices.
dd01491e4d167993b3a80517f737188151443a75 (HEAD -> 201911, origin/201911) [Monitor Vlan] Fix a typo in hostif (#1722) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
d898b03e4ec91f964f0e1fcba535ea33a78c838e (HEAD -> 201911, origin/201911) Create mappings using existing tunnel (#1593) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
…7536) #### Why I did it MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0 #### How I did it Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it The service file generate_monit_config.service is used to generate the Monit configuration file from template. I also should install this service file and enable it. How I did it I appended this service file name at the end of /etc/sonic/generated_services.conf. How to verify it I verified this on the device str2-7260cx3-acs-1. Which release branch to backport (provide reason below if selected) 201811 [x ] 201911 202006 202012
Upgrade systemd to fix timer elapsed issue. #### Why I did it On 201911 release, snmp.timer become elapsed status and snmp.service will not be trigger by snmp.timer: ● snmp.service - SNMP container Loaded: loaded (/usr/lib/systemd/system/snmp.service; static; vendor preset: enabled) Active: inactive (dead) ● snmp.timer - Delays snmp container until SONiC has started Loaded: loaded (/usr/lib/systemd/system/snmp.timer; enabled; vendor preset: enabled) Active: active (elapsed) since Wed 2022-08-03 18:12:59 UTC; 2 months 17 days ago This issue caused by systemd bug: https://github.com/systemd/systemd/pull/10778/files This issue can be reproduce with following steps: 1. reboot system. 2. continusly run following commands till timer elapsed: systemctl status snmp.timer sudo systemctl daemon-reload #### How I did it Install latest version systemd from offical backport source. #### How to verify it Pass all test case. Manually check reproduce steps, verify the issue fixed. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Upgrade systemd to fix timer elapsed issue. #### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
…#12158) * Add Celestica Silverstone-X platform deb dependency files * Optimized Celestica Silverstone-X platform deb dependency files indentation
Modified the skip check to be greater than or equal to compared to equal to previously
Signed-off-by: Prince George <prgeor@microsoft.com>
* Fix to improve hostname handling If config_db.json is missing hostname entry, hostname-config.sh ends up deleting existing entry too and hostname changes to default 'localhost' * default hostname to 'sonic` if missing in config file
#### Why I did it The GPG key used for Jessie's official repos has since expired, which means building 201911 images no longer works. #### How I did it Fake the time to be before the expiry date.
* Create Vxlan and Vnet default configs
- Why I did it Added BIOS upgrade infra - How I did it Added new make target - How to verify it Copy msn3800_bios.tar.gz to platform/mellanox/bios make configure PLATFORM=mellanox make target/files/stretch/msn3800_bios.tar.gz Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it To allow SSH connections from IPv6 addresses Resolves #7668 #### How I did it In build_debian.sh, modify sshd_config file so as to enable listening for IPv6 connections
*Fix a typo introduced as part of #13403
Why I did it docker.com's gpg key start to work from 2023-02-23. While debian.org's gpg key expired in 2022-11. We used a walkaround for security checking for debian gpg keys. Now we need to exclude docker.com's gpg key. How I did it Update docker.com's gpg key without faketime. Update others' gpg key with faketime '2022-11' How to verify it
Change to use the snapshot mirror http://packages.trafficmanager.net/snapshot. Warning: The Jessie distribution is EOL, please avoid to use it if you can. And the snapshot mirror will be removed in near future as well.
Why I did it Some products might experience an occasional IO failure in the communication between CPU and SSD. Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue). This issue currently affect 4 products: DCS-7170-32C* DCS-7170-64C DCS-7060DX4-32 DCS-7260CX3-64 DCS-7050CX3-32S How I did it This change disable NCQ on the affected drive for a small set of products. How to verify it When the fix is applied, these 2 patterns can be found in the dmesg. ata[0-9]+.00: FORCE: horkage modified (noncq) NCQ (not used) Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA) READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used)) READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec Which release branch to backport (provide reason below if selected)
… of squashfs (#14270) 202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogather with this change.
Upgrade BRCM SAI to Debian package SAI 3.7.6.1-3.
[Build] Fix the stretch/jessie mirror removed issue.
ISSU version check fails due to inability to mount squashfs from 202211 on 201911
This PR makes two changes: - Store Jinja2 cache in LOGLEVEL DB instead of STATE DB - Store bytecode cache encoded in base64 Tested with the following command: "redis-dump -d 3 -k JINJA2_CACHE" Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it Fix: #16086 faketime package url expired. It breaks 201911 build. Update package url. Work item tracking Microsoft ADO (number only): 24930879
… script (#16393) Monit changes to enable script to monitor SAI_PORT_STAT_IF_IN_ERRORS & SAI_PORT_STAT_IF_OUT_ERRORS on internal (backend) ports of multi-asic device.
Why I did it Back port #6478 and #6519 to 201911 branch. Work item tracking Microsoft ADO (number only): 24978836 How I did it Add checking the connection between zebra and bgp during bgpd start. How to verify it Modify start.h, add debug log and check the syslog _Sep 22 02:41:29.716356 str-a7060cx-acs-10 INFO bgp#root: ####: start zebra Sep 22 02:41:30.815341 str-a7060cx-acs-10 INFO bgp#root: ####: start check connection Sep 22 02:41:30.868784 str-a7060cx-acs-10 INFO bgp#root: ####: It took 0.029979 seconds to wait for zebra to be ready to accept connections Sep 22 02:41:30.873685 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd Sep 22 02:41:35.270569 str-a7060cx-acs-10 INFO bgp#root: ####: done_ _Sep 22 03:28:02.423438 str-a7060cx-acs-10 INFO bgp#root: ####: start zebra Sep 22 03:28:03.731320 str-a7060cx-acs-10 INFO bgp#root: ####: start check connection Sep 22 03:28:33.749152 str-a7060cx-acs-10 INFO bgp#root: ####: Error: zebra is not ready to accept connections Sep 22 03:28:33.752490 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd Sep 22 03:28:34.259735 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpd done Sep 22 03:28:34.755538 str-a7060cx-acs-10 INFO bgp#root: ####: start bgpcfgd Sep 22 03:28:35.800906 str-a7060cx-acs-10 INFO bgp#root: ####: done_
…16907) Fix monit false alarm issue, which located in process_checker and it missed "disk-sleep" status check, thus some 201911 SONiC box report "pmon|sensord" error coincidently. #### Why I did it Currently psutil library returns below detail process status: running: The process is currently running. sleeping: The process is sleeping or waiting for an event to occur. disk-sleep: The process is waiting for I/O operations to complete. stopped: The process has been stopped (e.g. via the SIGSTOP signal). zombie: The process has terminated but is still listed in the process table. dead: The process has terminated and has been removed from the process table. We should regard running/sleeping/disk-sleep as normal case and not alert in monit process. Now once the disk-sleep occurs during monit cycle, below syslog will be paged, so get rid of syslog output meanwhile. yslog.2.gz:Feb 24 06:12:17.394619 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:13:17.932531 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:14:18.502505 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host Then I tried to reproduce the issue by triggering process_checker for sensord frequently and observed it's under "disk-sleep" status once the alert is raised. ##### Work item tracking - Microsoft ADO **(number only)**:17663589 #### How I did it Fix process_checker script code for adding "disk-sleep" case handling. #### How to verify it Verified in local DUT.
8b9cab7 2023-10-26 [201911] Fix IfHighSpeed UT issue on 201911 (#299) 622b771 2023-10-13 | Fix backup port rfc2863 UT to 202012 branch issue (#298) [Hua Liu] fa94798 2023-10-11 | Add ifhighspeed UT (#296) [Hua Liu] 41789ca 2023-09-14 | Support interface speed for PortChannels (#262) [Lukas Stockner]
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
- What I did
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)