Skip to content

Commit

Permalink
Stop ptf container before remove it (#6131)
Browse files Browse the repository at this point in the history
What is the motivation for this PR?
The testbed-cli.sh script supports an operation which is to restart
the ptf docker container. In our lab, we ran into issue that the test
server was stuck with below error after the ptf docker container
was removed:

Aug 9 04:25:58 server1 kernel: [329566.985211] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/1:0]

When the ptf docker is removed, we used the "force_kill" option. Not sure if this is too
brutal and caused issue to the server. This issue happens around once a week.
It is hard to reproduce and yet very annoying.

How did you do it?
This change added code to stop the ptf container before removing it. Need to
observe for some time to see if this change can fix the issue.

How did you verify/test it?
Tested using "testbed-cli.sh restart-ptf".

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
  • Loading branch information
wangxin authored Aug 11, 2022
1 parent 85c02f1 commit 2e40f50
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions ansible/roles/vm_set/tasks/renumber_topo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@
echo "-----------------------------" >> /tmp/ptf_network_{{ vm_set_name }}.log
when: ptf_docker_info.exists

- name: Stop ptf container ptf_{{ vm_set_name }}
docker_container:
name: ptf_{{ vm_set_name }}
state: stopped
become: yes

- name: Remove ptf container ptf_{{ vm_set_name }}
docker_container:
name: ptf_{{ vm_set_name }}
Expand Down

0 comments on commit 2e40f50

Please sign in to comment.