Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test_system_health.py] Adjust test cases of system health #4562

Merged
merged 2 commits into from
Nov 17, 2021

Conversation

Junchao-Mellanox
Copy link
Contributor

@Junchao-Mellanox Junchao-Mellanox commented Oct 26, 2021

Depends on sonic-net/sonic-buildimage#9068

Change-Id: I01710a47112b483fbedc5d0ff83b70bed5c44c70

Description of PR

Summary:
Command monit summary -B can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to adjust the test case and cover the change.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

Command monit summary -B can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to adjust the test case and cover the change.

How did you do it?

  1. Adjust test case to check critical process without monit
  2. Stop a critical process and check system-health status

How did you verify/test it?

Run the regression

Any platform specific information?

N/A

Supported testbed topology if it's a new test case?

N/A

Documentation

Change-Id: I01710a47112b483fbedc5d0ff83b70bed5c44c70
Change-Id: Idd1ca2133593c45ce18de0e82a40281ab557561d
@wangxin
Copy link
Collaborator

wangxin commented Oct 27, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Collaborator

yxieca commented Oct 27, 2021

@lguohan should we fix "monit summary -B"?

@Junchao-Mellanox
Copy link
Contributor Author

@lguohan should we fix "monit summary -B"?

Hi @yxieca , I am adding you to a mail thread about this topic.

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Oct 28, 2021

import json

Should we add this test case into tests/kvmtest.sh ?


Refers to: tests/system_health/test_system_health.py:1 in 51e4f99. [](commit_id = 51e4f99, deletion_comment = False)


critical_process = random.sample(running_critical_process, 1)[0]
with ProcessExitContext(duthost, container, critical_process):
time.sleep(DEFAULT_INTERVAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_INTERVAL

What is the expected max time for redis table to take effect? Is 60s too long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System-health poll critical process status every 60 seconds, so it has to wait 60 second here.

@@ -348,3 +369,18 @@ def __exit__(self, exc_type, exc_val, exc_tb):
:return:
"""
self.dut.command('mv -f {} {}'.format(self.backup_config, self.origin_config))


class ProcessExitContext:
Copy link
Contributor

@qiluo-msft qiluo-msft Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessExitContext

Super concise and useful, thanks! #Closed

@liat-grozovik
Copy link
Collaborator

@qiluo-msft could you please approve and sign off?

@wangxin wangxin merged commit 69daa5f into sonic-net:master Nov 17, 2021
AntonHryshchuk pushed a commit to AntonHryshchuk/sonic-mgmt that referenced this pull request Jan 4, 2022
…#4562)

What is the motivation for this PR?
Command monit summary -B can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to adjust the test case and cover the change.

How did you do it?
Adjust test case to check critical process without monit
Stop a critical process and check system-health status

How did you verify/test it?
Run the regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants