Improved error handling when not all Interfaces are up #853

andriymoroz-mlnx · 2019-04-02T10:58:52Z

Recreated PR
#815 which was merged and reverted

Description of PR

Summary:
Any test, that calls interface.yml, if encountered that some Interfaces/PortChannels are not up, will not just fail but will try to gather interfaces status from Fanout/VMs as well.

Type of change

[] Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

How did you do it?

Modified interface.yml to include check_testbed_interfaces.yml,
when 'Verify interfaces are up' step fails.
check_testbed_interfaces.yml will gather interfaces status on:

DUT
Fanout (for MLNX only, filtered by check_interfaces_status tag)
Testbed server (by calling testbed_vm_status.yml playbook)
relevant VMs

testbed_vm_status.yml palybook will connect to Testbed server and gather VMs status.
show_int_portchannel_status.j2 will connect to each relevant VM and gather Port-Channel status.

How did you verify/test it?

Any test that call interface.yml, in two cases:

Normal flow, when all interfaces are up.
In this case nothing is changed.
When Fanout has deliberately its ports down,
In that cases, the test fails as usual, but gathers Interfaces status from Fanout.
When some relevant VM has its Port-Channel deliberately down.
In that cases, the test fails as usual, but gathers VMs status from Testbed server, and PortChannel status from VMs.

Any platform specific information?

Current implementation can enter Fanout switch, only if Fanout is MLNX type.
The 'Check Fanout interfaces' step in check_testbed_interfaces.yml calls fanout.yml (actually fanout role) with tag 'check_interfaces_status'
In this case anyone can modify roles/fanout/tasks/main.yml to execute Fanout specific step with
when: peer_hwsku == "<specific_fanout_type>" tags: check_interfaces_status

Example:

 ###################################################################
 # Check Fanout interfaces status                                  #
 ###################################################################
- block:
  - name: Check Fanout interfaces status
    action: apswitch template=roles/fanout/templates/mlnx_interfaces_status.j2
    connection: switch
    register: fanout_interfaces
    args:
      login: "{{ switch_login['MLNX-OS'] }}"

  - debug:
      msg: "{{ fanout_interfaces.stdout.split('\n') }}"

  when: peer_hwsku == "MLNX-OS"
  tags: check_interfaces_status

Supported testbed topology if it's a new test case?

Documentation

qiluo-msft · 2019-04-03T15:58:53Z

What is changed base on #815?
Is it possible to give incremental commit based on old commit?

liat-grozovik · 2019-04-10T13:36:43Z

The original commit had comments by Roman was not able to update it. So he created #815 which was merged and then revered (wrongly as i thought it also requires the first one).
So, Andriy made some order. Recreated #815 which is exactly as this one.

liat-grozovik · 2019-05-02T12:47:01Z

this PR was verified on top of sonic-mgmt master today and found to be working just fine.
extra information is added once port channel are down to allow the offline debug.

* Improved error handling when not all Interfaces are up * Fixed PR 853.

Improved error handling when not all Interfaces are up

ba022c7

Fixed PR 853.

ea0ff3b

liat-grozovik approved these changes May 22, 2019

View reviewed changes

liat-grozovik added the Request_for_201811_branch label May 22, 2019

liat-grozovik requested review from pavel-shirshov and maggiemsft May 22, 2019 08:11

pavel-shirshov approved these changes May 24, 2019

View reviewed changes

liat-grozovik merged commit 71b6c80 into sonic-net:master May 26, 2019

yxieca pushed a commit that referenced this pull request May 28, 2019

Improved error handling when not all Interfaces are up (#853)

7c4b00c

* Improved error handling when not all Interfaces are up * Fixed PR 853.

yxieca added the Included in 201811 branch label May 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved error handling when not all Interfaces are up #853

Improved error handling when not all Interfaces are up #853

andriymoroz-mlnx commented Apr 2, 2019 •

edited

Loading

qiluo-msft commented Apr 3, 2019

liat-grozovik commented Apr 10, 2019

liat-grozovik commented May 2, 2019

Improved error handling when not all Interfaces are up #853

Improved error handling when not all Interfaces are up #853

Conversation

andriymoroz-mlnx commented Apr 2, 2019 • edited Loading

Description of PR

Type of change

Approach

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

qiluo-msft commented Apr 3, 2019

liat-grozovik commented Apr 10, 2019

liat-grozovik commented May 2, 2019

andriymoroz-mlnx commented Apr 2, 2019 •

edited

Loading