forked from sonic-net/sonic-mgmt
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update 201811 from azure #39
Open
bbinxie
wants to merge
219
commits into
SW-CSA:201811
Choose a base branch
from
sonic-net:201811
base: 201811
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The existing script does not restore lag rate setting on VMs in case of failure. This improvement is to restore lag setting on VMs if the testing failed. Signed-off-by: Xin Wang <xinw@mellanox.com>
…842) After applying acltb_test_rules_part_1.json BGP sessions may go down before we apply acltb_test_rules_part_2.json (which had BGP ACL forward rules); This results in BGP flap during ptf test run; It is safer to apply BGP ACL forward rules first to avoid BGP flapping. Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
…cted (#844) PR #831 does not fully fix the issue introduced by PR #822. Ansible's include_vars module could not override variable value previous defined by set_fact. Variables in vars/run_config_test_vars.yml may still have old value. The change is to avoid using include_vars. The variables defined in run_config_test_vars.yml are moved into script run_command_with_log_analyzer.yml. The vars files are deleted. The same change is made to other scripts using the same pattern. Signed-off-by: Xin Wang <xinw@mellanox.com>
If use apt_key module for getting docker official GPG key, there would be cert validation issue. Replace the apt_key module with 'curl' command recommended on docker official documentation site.
… PTF container (#836) The PTF container will be destroyed if testbed-cli.sh remove-topo is executed. Run testbed-cli.sh add-topo will add a new PTF conainer. Usually the new PTF container will have a new MAC address. If add-topo is executed immediately after remove-topo, ARP table of neighbor switches and hosts may still have entry of the old PTF MAC address. This would cause connectivity issue to the new PTF container for a while until the old PTF MAC address is expired. This workaround is to send out an ARPing from the PTF container querying mgmt_gw after new PTF container is deployed and attached to network. The ARPing request will be broadcasted to all neighbors on same LAN and will refresh ARP table of neighbors with new MAC address of new PTF. Signed-off-by: Xin Wang <xinw@mellanox.com>
Otherwise, if 2 systems have names where one is prefix of the other one, parsing of the shorter name will come up with 2 lines. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
sonic-net/sonic-utilities#504 This is to make all the commands backwards compatible Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
…#823) * [ptf_runner] Save ptf log to script executing host in case of failure The PTF log and pcap files are useful for debugging in case of PTF script failed. However, these files are in the PTF container and could be lost when the PTF container is re-deployed. This improvement is to save the log and pcap files to the script executing host when the PTF script is failed. Signed-off-by: Xin Wang <xinw@mellanox.com> * [ptf_runner] Add option for specifying whether to save ptf log The previous commit changed the default behavior. This change is to add an option for specifying whether to save ptf log in case of failure. For example: ansible-playbook <some_test>.yml ... -e save_ptf_log=yes
…hange (#866) PFC_WD_TABLE --> PFC_WD Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* [fast/warm reboot] improve new image installation code - Allow new_sonic_image being defined as empty string. It causes skipping image installation. - Rename new_image_location to a generic name. - Display defined new image url. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [fast/warm reboot] allow DUT to stay in the warm/fast reboot target release This feature is needed in order to test ugprade path. Where we might upgrade from one version to another, and more. We want the system to stay in target release for next steps. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * Address review comments, test issues and some minor touch-ups Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [fast/warm reboot] add knob to clean up old iamges on DUT before warm/fast reboot When new image is specified for fast/warm reboot. The new image will be installed. However, if the specified image is already installed on the target DUT, then sonic_install will fail and fast/warm reboot will reboot into current image. Add a knob to cleanup old images so that the installing of new image will have a better chance to succeed. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * address review issue
…les (#865) The ntpd may generate 'ERR ntpd' in syslog and caused unnecessary test case failure. Previous PR #816 added a matching pattern of 'ERR ntpd' in loganalyzer igonre files to ignore the ntpd error messages. However, ntpd may generate two formats of error messages. The previously added matching pattern can only match one of the formats. This change is to update the pattern to match both of the formats. Signed-off-by: Xin Wang <xinw@mellanox.com>
* Add many testcases support to t0-56 * Fix bgp_speaker for t0-56
…g warm reboot (#890) * [advanced-reboot] move Arista class to seperate module Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [advanced-reboot] use lock to synchronize fast data plane and reachability_watcher threads Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [advanced-reboot] stabilize test when fast data plane send running * Apply a filter on socket before sending fast data plane IO * Save sniffed packets after the traffic test is done Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [advanced-reboot] refactor fast data plane generator code * reuse from_t1 and from_vlan_server generated packets in generate_bidirectional * use tcp instead ofudp in generate_bidirectional Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [advanced-reboot] add space back Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Some test setup has delay in the IO path, the original sniff wait time doesn't guarantee all 36000 packet were received. Increasing sniffing wait time by 30 seconds. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…put (#907) * Modified sensors data for S6100/Z9100 according to latest output
* [advanced-reboot] start watcher thread after initializing Event objects Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [advanced-reboot] improve error messages when DUT is not ready for test Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
…ot (#904) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
``` Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "ptftests/advanced-reboot.py", line 769, in peer_state_check self.fails[ip], self.info[ip], self.cli_info[ip], self.logs_info[ip] = ssh.run() File "ptftests/arista.py", line 157, in run log_data = self.parse_logs(log_lines) File "ptftests/arista.py", line 218, in parse_logs result_bgp, initial_time_bgp = self.extract_from_logs(bgp_r, data) File "ptftests/arista.py", line 205, in extract_from_logs raw_data.append((datetime.datetime.strptime(m.group(1), "%b %d %X"), m.group(2), m.group(3))) AttributeError: 'module' object has no attribute '_strptime' ```
* [deploy] Wait for vEOS to come back after restart * [deploy] Replace handlers with tasks to ensure execution sequence * [deploy] Replace ping cmd with wait_for module * [deploy] Remove unused handlers
* Improved error handling when not all Interfaces are up * Fixed PR 853.
…nout (#909) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* Moved image processing from advanced-reboot.yml to separate file reboot-image-handle.yml * Moved image processing from advanced-reboot.yml to separate file reboot-image-handle.yml
* If core-storage secret key is available, add to /etc/sonic/core_analyzer.rc.json and enable & start core_uploader service If https_proxy is provided, update /etc/sonic/core_analyzer.rc.json. * Check the entire dict path before de-referencing. * Improved regex per comments. * Fixed syntax error. * Add a sample file for newly introduced ansible facts.wq * Removed a redundant empty line. Co-authored-by: Ubuntu <remanava@remanava-kube-1.hblknyhzkmnujibhxvn3dmavjb.xx.internal.cloudapp.net>
10 sec default timeout for some devices, is not enough to complete the reboot process. This PR increases the time out to reboot task timeout. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
make sure only syncd is match on pgrep
… image (#1722) When minigraph.xml exists, config_db.json can be generated from it. In this case, before booting into an new image, remove the config_db.json from /host/old_config to force the new image to load minigraph. This is needed when nightly testbed is moving from an higer version to a lower version. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Hence add a check. Apparently fastreboot test from an old image failed due to this file missing.
…shared buffer (#1663) * [qos] Support designating the packet size when testing water mark of shared buffer
Signed-off-by: Neetha John <nejo@microsoft.com>
If a DUT has already have the target image installed, then there will be no /host/old_config/config_db.json afterward installing. Add -f option to ignore file not exists error. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…llanox fanout (#1764) Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
there is check pfc_wd poll_time <= pfc_wd_detection/restoration_time. So make sure in testscript before setting poll interval stop pfc wd if enable by default because default detection/restoration time can be < poll time interval making script failure.
… to timeout (#1820) With the recent changes to caclmgrd, this check was not consistent, and could potentially fail with the following error: ``` TASK [test : Ensure the SSH port on the DuT becomes closed to us] ******************************************************************************************************************** Friday 26 June 2020 01:17:14 +0000 (0:00:05.323) 0:01:29.829 *********** fatal: [sonic-dut-1]: FAILED! => {"msg": "The conditional check ''Timeout when waiting for search string OpenSSH' not in result.msg' failed. The error was: error while evaluating conditional ('Timeout when waiting for search string OpenSSH' not in result.msg): Unable to look up a name or access an attribute in template string ({% if 'Timeout when waiting for search string OpenSSH' not in result.msg %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'AnsibleUndefined' is not iterable"} ``` `result` could potentially be `AnsibleUndefined`. This changes the logic to match that of the new pytest, which is a more appropriate method of checking that we are no longer able to SSH to the device, and no longer relies on parsing an error message (see https://github.com/Azure/sonic-mgmt/blob/master/tests/cacl/test_control_plane_acl.py#L34). Also remove unnecessary sleep, which also aligns more with the pytest version.
* wrong DIP on packet it cannot receive BGP, SNMP, SSH IP2ME packet on t1-lag and the root cause is copp_test.py config wrong DIP on packet * fix comment indentation Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
A test server may have VMs for multiple test setups. The existing tool sets can start the first N VMs and remove all VMs on the server. This change added the support of starting and stopping partial of the VMs used by specified test setup. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Verified with Dell 6000 Platforms.
…mmand line (#1908) Add `-f` flag to pkill so that it will send the signal to processes where "exabgp" appears anywhere in the command line. Without this flag, it only sends the signal to processes where "exabgp" is the actual file being executed, thus leaving two `sh exabgp/start.sh` processes running. This change ensures all "exabgp" processes as well as the `sh exabgp/start.sh` processes are stopped. Also update comment to be more precise about what signal is being sent, in case we need to be more forceful in the future, we could send SIGKILL instead.
…ncd (#1917) Signed-off-by: Danny Allen <daall@microsoft.com>
a) Instead of teamdctl to add/remove port-channel member we shoudl config port-channel command. Reason being show interface status does not get updated as teadctl bypass config db b) One of change done by me as prt of PR #1893 was not complete as I forgot to update Buffer size in swap_syncd.yml. Fixed now. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
…ion (#2024) Sample old output: Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC --------------- --------------- ------- ----- ------------ --------------- ------ ------- --------------- ---------- Ethernet0 77,78 50G 9100 Ethernet1/1 trunk down up QSFP28 or later off Sample new output: Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC --------------- --------------- ------- ----- ----- ------------ --------------- ------ ------- --------------- ---------- Ethernet48 57,58,59,60 100G 9100 rs Ethernet13/1 PortChannel0001 up up QSFP28 or later off Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* minigraph: Add the ability to set a per-port speed in the minigraph (#3527) * minigraph: Add the ability to set a per-port speed in the minigraph (cherry picked from commit cef1f77) * minigraph: Fix with_dict syntax in the playbook That entry needs to be specified as referring to a variable. Signed-off-by: Saikrishna Arcot <saiarcot895@gmail.com> * [topo] Add test topology for 7050QX-32S-S4Q31 (#3568) (cherry picked from commit 6d1720b)
…for populating the speed (#3584) For filling in the speed of the port, when reading `device_conn`, the Sonic name needs to be used for reading into the dict, not the alias name. Also, indexing by the hostname isn't required for 201811, since the returned structure gets rid of that "layer" if the hostname exists in that data structure. Signed-off-by: Saikrishna Arcot <saiarcot895@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
Type of change
Approach
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation