Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for Graceful node shutdown #8233

Merged

Conversation

avd-sagare
Copy link
Contributor

Test to verify that data is accessible and uncorrupted as well as
operational cluster and osd are up after graceful nodes shutdown

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

@avd-sagare
Copy link
Contributor Author

Blocked Validation due to https://bugzilla.redhat.com/show_bug.cgi?id=2227835

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugep
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job FAILED (installation failed, tests not executed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

# S3 bucket
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory)

def setup_amq_kafka_notification(self, bucket_factory):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if you can use existing amq common function instead of writing new one here.

Comment on lines 671 to 686
# (
# "OC",
# {
# "interface": "OC",
# "namespace_policy_dict": {
# "type": "Cache",
# "ttl": 3600,
# "namespacestore_dict": {"aws": [(1, None)]},
# },
# "placement_policy": {
# "tiers": [
# {"backingStores": [constants.DEFAULT_NOOBAA_BACKINGSTORE]}
# ]
# },
# },
# ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not comment the core functions, instead handle it in the test script.

from ocs_ci.helpers.helpers import default_storage_class
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode

# from ocs_ci.ocs.resources.fips import check_fips_enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the comment code

from ocs_ci.ocs.utils import get_pod_name_by_pattern
from ocs_ci.helpers.helpers import default_storage_class
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can delete this empty line

# S3 bucket
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory)

def setup_amq_kafka_notification(self, bucket_factory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not calling it in setup

tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
multi_obc_lifecycle_factory(num_of_obcs=20, bulk=True, measure=False)

# OCP Workloads
start_ocp_workload(workloads_list=["registry", "monitoring"], run_in_bg=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will have to handle the teardown of ocp workloads too

tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
@PrasadDesala PrasadDesala added the team/e2e E2E team related issues/PRs label Sep 21, 2023
tree_output = ct_pod.exec_ceph_cmd(ceph_cmd="ceph osd tree")
logger.info("ceph osd tree output:")
logger.info(tree_output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines size/XXL and removed size/XL size/L PR that changes 100-499 lines labels Nov 9, 2023
@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines and removed size/XXL labels Nov 17, 2023
tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
Comment on lines 54 to 60
"""
nodes = get_nodes()
for node in nodes:
assert (
node.get()["status"]["allocatable"]["hugepages-2Mi"] == "64Mi"
), f"Huge pages is not applied on {node.name}"
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is not needed, please remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed. On testing cluster it was not enabled so for testing purpose its commented.


def teardown():
logger.info("cleanup the environment")
""" cleanup logging workload """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a single line comment. pls consider adding with hash

logger.info("Logging is configured")
uninstall_cluster_logging()

""" cleanup monitoring workload """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


request.addfinalizer(teardown)

logger.info("Starting the test setup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the setup related func to a setup func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are not achieving optimisation or resource saving thne its ok to be as it is.

Comment on lines 158 to 162
##################################### non encrypted pvc (ne_pvc)
create + clone
non-encrypted block/fs pvc
Returns:
pvc object
pod object
file_name
origin md5sum
snapshot
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix doc string format issues

Comment on lines 333 to 335
if not check_if_monitoring_stack_exists():
assert "monitoring workload not started after reboot"
if not check_if_registry_stack_exists():
assert "registry workload not started after reboot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workload exists does not mean that the workload is running fine without any issues after reboot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. what additional steps you suggest to check?

Sanity().health_check(tries=60)

self.validate_data_integrity()
self.validate_snapshot_restore(snapshot_restore_factory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we checking the below step-
PVC expansion is possible on the restored snapshot- validate_pvc_expansion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling validate_pvc_expansion inside validate_snapshot_restore


self.validate_data_integrity()
self.validate_snapshot_restore(snapshot_restore_factory)
validate_mcg_bg_features(skip_any_features=["expiration", "rgw kafka", "nsfs"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are checking for the below steps-

  • get operation on the bucket gets the data that was stored before shutdown
  • delete operation on the bucket can delete the data that was stored before shutdown
  • put and list operations work correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 operations on NSFS bucket on files is not implemented yet.

Avdhoot and others added 16 commits July 22, 2024 16:01
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <avd@Avdhoots-MacBook-Pro.local>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
Signed-off-by: Avdhoot <asagare@redhat.com>
@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 39a0e2b to 4fa1e16 Compare July 22, 2024 10:32
Signed-off-by: Avdhoot <asagare@redhat.com>
@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines and removed size/XL labels Jul 22, 2024
@avd-sagare
Copy link
Contributor Author

@avd-sagare
Copy link
Contributor Author

Copy link

openshift-ci bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: avd-sagare, mashetty330, PrasadDesala

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PrasadDesala PrasadDesala merged commit ba411e7 into red-hat-storage:master Jul 24, 2024
5 of 6 checks passed
amr1ta pushed a commit to amr1ta/ocs-ci that referenced this pull request Jul 25, 2024
Signed-off-by: Avdhoot <asagare@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm size/L PR that changes 100-499 lines team/e2e E2E team related issues/PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants