Add smoke test framework for opensearch bundle #5185

zelinh · 2024-11-14T01:56:50Z

Description

Add smoke test framework for opensearch bundle

The smoke test workflow can be started with both local bundle artifacts or from CI.
Example command:
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=/test/tar
or
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.18.0/10479/linux/x64/tar/

The essential functionalities of this smoke test workflow is to deploy a staging bundle cluster by using distribution class form integ test workflow, run some basic API requests and validate its response schema against the opensearch-api-specification.

Issues Resolved

#5164

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

codecov · 2024-11-14T02:00:49Z

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
src/run_smoke_test.py	0.00%	16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py	0.00%	9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py	89.83%	6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py	94.11%	4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py	94.73%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5185      +/-   ##
==========================================
- Coverage   92.12%   91.81%   -0.31%     
==========================================
  Files         197      202       +5     
  Lines        6817     7026     +209     
==========================================
+ Hits         6280     6451     +171     
- Misses        537      575      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zelinh · 2024-11-14T02:05:18Z

Looking into these check failures.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

peterzhuamazon · 2024-11-14T21:37:06Z

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/run_smoke_test.py 0.00% 16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py 0.00% 9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py 89.83% 6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py 94.11% 4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py 94.73% 3 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

Seems like missing bunch of test cases.

zelinh · 2024-11-14T21:45:05Z

Codecov Report

Attention: Patch coverage is 81.81818% with 38 lines in your changes missing coverage. Please review.

Project coverage is 91.81%. Comparing base (c56153c) to head (f568b7a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/run_smoke_test.py 0.00% 16 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runners.py 0.00% 9 Missing ⚠️
...orkflow/smoke_test/smoke_test_runner_opensearch.py 89.83% 6 Missing ⚠️
...rkflow/smoke_test/smoke_test_cluster_opensearch.py 94.11% 4 Missing ⚠️
src/test_workflow/smoke_test/smoke_test_runner.py 94.73% 3 Missing ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

Seems like missing bunch of test cases.

No sure why it's not updating. I attached respective tests after this comments but it still shows 0%

peterzhuamazon · 2024-11-15T02:14:01Z

Pending #5187

peterzhuamazon · 2024-11-15T18:41:16Z

Ready for review.

zelinh · 2024-11-15T22:20:44Z

Ready for review.

Thanks for help on fixing the container.

zelinh · 2024-11-15T22:56:59Z

Tested on this docker image opensearchstaging/ci-runner:ci-runner-al2-opensearch-build-v1 with ./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.18.0/10479/linux/x64/tar/ command.
Confirmed it's running as expected.

2024-11-15 22:41:27 INFO     | opensearch           | / GET                | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_bulk POST          | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_cat/indices GET    | PASS  |
2024-11-15 22:41:27 INFO     | opensearch           | /_cat/plugins GET    | PASS  |

gaiksaya

Few questions and suggestions:

How about reusing the current integration test cluster set up instead of installing a cluster for smoke testing? Let me know if the set up is different.
Would we have a new test-report.yml for smoke testing or will the data be appended to current report.yml?

Adding @dblock to take a look from open-api perspective.

Thanks!

gaiksaya · 2024-11-18T20:46:03Z

src/test_workflow/smoke_test/smoke_test_cluster_opensearch.py

@@ -0,0 +1,99 @@
+# Copyright OpenSearch Contributors


I believe we are duplicating the work of setting up clusters here. Why not just use the integ-test cluster set up instead of creating a new framework for smoke testing? Check the set ups https://github.com/opensearch-project/opensearch-build/tree/main/src/test_workflow/integ_test and see if you can use the same classes to set up a cluster for smoke testing as well. From high level I do not see a difference in the set up.

Reusing the entire integ-test cluster might be overcomplicate for smoke test as we don't want to accept any customized configuration for smoke tests cluster to keep it lightweight. I also want to make smoke test workflow less dependent on integ tests as a standalone test workflow.

For APIs to work at component level we might need additional-config to be added.
For example: You cannot check system templates plugin if these configs are not enabled.
The current set up for integ-test is light weight as well. Reusing the existing codebase might be the way to go instead of reinventing the wheel for different usecase. Also per dustribution it will go on increasing like the way we have today for integ-test.

Any example API that I can test for system templates plugin you are referring? Anyway I think if certain component requires specific configuration for smoke test, it's not ready to be onboarded to smoke tests framework. Because we only deploy the cluster once and run all API request checks against it. We are not sure whether that specific configuration would affect others.
We are reusing the distributions classes from integ tests workflow to install and start the cluster. For any future distribution type, it can be easily adopted by this smoke test workflow.

Agree with reusing the cluster setup from integ test, the code is pretty modular to be reused here. To be able to even run an api of system-templates you need those configurations added to opensearch.yml.
Think about adding smoke tests for features that need to be enabled explicitly.

I still think we shouldn't allow any customized configurations as it might grow much bigger. We provide generic framework with all default configurations for the cluster by design. If any of the component needs specific configuration to be operational, it shouldn't be added to the smoke tests and can go into the integ tests.

gaiksaya · 2024-11-18T20:47:48Z

src/test_workflow/smoke_test/smoke_tests_spec/opensearch.yml

@@ -0,0 +1,30 @@
+# yamllint disable


How about storing these files under manifests folder per version?

manifests/2.18.0/smoke_test/opensearch.yml

At this point, i don't feel it's necessary to keep these files distinct by version. So far the API requests in this file are generic and not version specific. In addition, any of the valid API path can be filtered in the API-spec file for the first version it was introduced.

For new APIs that are being added how do you propose to manage that?

The version introduced will be specified in the API-spec file and we can check the version and compare it with the current version.

Sorry for not being clear, what I meant was if we introduce new APIs in say 2.19.0 would we be using same spec file that we used in 2.18.0?
Wondering how api-spec repo is managing that today.

So for new x-version-add (2.9, 2.10, 2.11), how would the spec look like?

This component spec will still be the same. We check on the test workflow side.
The workflow extracts the api path from this component spec, checks current cluster version vs the x-version-add from api-spec and run checks if current version is later than added version.

if not differentiated on version it will become a problem when 3.0 is being prepared to release and has breaking changes in existing apis.
You don't have to separate at each minor but atleast at 2.x and 3.x. Minor versions are backward compatible so it is safe to assume there will be no breaking changes in the apis.
We don't have to worry about introducing new apis in minor versions as smoke tests are expected to run for versions to be released and not already released versions.

I'm okay if openapi handles the version check and returns appropriate response.
Say, version 3.0 made a change to _cat/indices api request/response and you run this workflow for both 2.x and 3.x for a new release, how will that be handled?

Those changes should also be handled in the openapi spec files.

zelinh · 2024-11-18T21:59:22Z

Few questions and suggestions:

How about reusing the current integration test cluster set up instead of installing a cluster for smoke testing? Let me know if the set up is different.

Would we have a new test-report.yml for smoke testing or will the data be appended to current report.yml?

Adding @dblock to take a look from open-api perspective.

Thanks!

The cluster we deployed for smoke tests are intended to be universal and won't accept any customized configuration. Using the integ test cluster might be overcomplicate here and little difficult to maintain in the future.
Generating test-report.yml is a separate step from this workflow. We will want to trigger the report workflow to create test-report.yml. I think we could accommodate that once we onboard the smoke test.

gaiksaya · 2024-11-18T22:51:16Z

src/test_workflow/smoke_test/smoke_test_cluster_opensearch.py

+        url = "https://localhost:9200/"
+        logging.info(f"Pinging {url}")
+        try:
+            request = requests.get(url, verify=False, auth=("admin", "myStrongPassword123!"))


Would it always be with-security?

Would recommend to create an issue for without-security as security is an optional plugin and smoke tests just like integ-test need to run smoothly irrespective of security plugin

rishabh6788 · 2024-11-20T22:00:41Z

src/test_workflow/smoke_test/smoke_test_runner.py

+            logging.info("Initiating smoke tests.")
+            test_cluster = SmokeTestClusterOpenSearch(self.args, os.path.join(work_dir.path), self.test_recorder)
+            test_cluster.__start_cluster__(os.path.join(work_dir.path))
+            for i in range(10):


So it iterates 10 times to check if the cluster is up and running.
What happens when the cluster is not up after 10 retries?

It would hard fail the workflow anyway as the cluster is not available.

Why hard-fail when you can put a check and gracefully terminate the workflow?
Please add a graceful termination check.

Added another check before running the checks so if the cluster is not ready after 10 attempts, the workflow will exit gracefully.

rishabh6788 · 2024-11-20T22:14:24Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+        }
+        # self.openapi = openapi_core.OpenAPI.from_file_path(spec_file)
+
+    def validate_request_swagger(self, request: Any) -> None:


Where is this being used?

It's not used yet as currently there are some issue validating requests. I'm thinking we may be able to use it eventually; I'm okay to remove it for now.

rishabh6788 · 2024-11-20T22:19:16Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+        logging.info("Request is validated.")
+
+    def validate_response_swagger(self, response: Any) -> None:
+        request = RequestsOpenAPIRequest(response.request)


why not validate request as well?

There seem to be some issue on the tool about request validation as the string types are different.

Can you add more details on it? Is there an issue on openapi repo, may be reference that.

I created an issue here opensearch-project/opensearch-api-specification#656
The implementation of the openapi tool that we using will assume the requestBodies type as JSON array however what we got from the response will return String.
The Array return type specified in the openapi will also be used to generate client code so it's complicated to modify.

Besides, the validation tool seems to use byte string while some of our response will return string, which cause error as well.

rishabh6788 · 2024-11-20T22:20:54Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+    def validate_response_swagger(self, response: Any) -> None:
+        request = RequestsOpenAPIRequest(response.request)
+        response = RequestsOpenAPIResponse(response)
+        validate_response(response=response, spec=self.spec_, request=request)


try/catch here and return.

rishabh6788 · 2024-11-20T22:22:10Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+                        header = api_details.get(method).get("header", self.mimetype)
+                        logging.info(f"Parameter is {parameters_data} and type is {type(parameters_data)}")
+                        logging.info(f"header is {header}")
+                        status = 0


instead of this, better have try/catch in the called method and return True/False and take appropriate action?

This status is integer type and used for TestResults class to classify PASS/FAIL.

Agree with the recommendation to use boolean instead of integer.

The TestResults class will mark PASS only when status == 0. We may have to use integer here.

opensearch-build/src/test_workflow/test_result/test_result.py

Lines 18 to 25 in 0931ccc

def __init__(self, component: str, config: dict, status: int) -> None:

self.component = component

self.config = config

self.status = status

@property

def __test_result(self) -> str:

return "PASS" if self.status == 0 else "FAIL"

rishabh6788 · 2024-11-20T22:22:31Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+                            logging.info(f"Response is {response.text}")
+                            self.validate_response_swagger(response)
+                        except:
+                            status = 1


May be log the exception for debugging purpose.

rishabh6788 · 2024-11-20T22:23:15Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+                        except:
+                            status = 1
+                        finally:
+                            test_result = TestResult(component.name, ' '.join([api_requests, method]), status)  # type: ignore


What will happen here if the cluster failed to start and no tests were run?

Added another check to avoid these from happening. Thanks.

rishabh6788 · 2024-11-20T22:36:36Z

src/test_workflow/smoke_test/smoke_test_runner_opensearch.py

+        validate_request(request=request, spec=self.spec_)
+        logging.info("Request is validated.")
+
+    def validate_response_swagger(self, response: Any) -> None:


These can be moved into the base class since they will be same for each component, right?
You are passing the spec, request and response for each component spec that is running or is my understanding is wrong?
Give all the validation is being done by openapi, would this method signature change when it is running for OpenSerch and SQL?

This class method is used for all OpenSearch components. I only verified it's working here but not sure if it still works if the scope grows so I keep it in this runner_opensearch class.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

zelinh · 2024-11-26T22:32:02Z

Feel free to take a look on this smoke test workflow. @reta @dblock Thanks!

gaiksaya · 2024-11-26T23:41:30Z

manifests/2.18.0/opensearch-2.18.0-test.yml

+  - name: opensearch
+    smoke-test:
+      test-spec: opensearch.yml


Maybe move to 2.19 now?

We haven't started the release cycle for 2.19 so there is no complete distribution bundle for testing purpose yet so I used 2.18 to demonstrate the correct command to start the workflow. I'm OK to change this to 2.19; and we should also update the manifest workflow to attach the smoke tests entry into the test manifest creation.

Moved to 2.19 test manifest.

Confirmed new command ./test.sh smoke-test manifests/2.19.0/opensearch-2.19.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.19.0/10545/linux/x64/tar/ is running correctly.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

reta · 2024-11-27T06:19:48Z

Feel free to take a look on this smoke test workflow. @reta @dblock Thanks!

Thanks a lot @zelinh !

rishabh6788 · 2024-11-27T19:14:41Z

src/test_workflow/smoke_test/smoke_test_runner.py

+                else:
+                    time.sleep(10)
+            try:
+                if test_cluster.__check_cluster_ready__():


This is redundant given you are already checking above if the cluster is ready or not.

Updated in the latest commit.

rishabh6788 · 2024-11-27T19:18:42Z

src/test_workflow/smoke_test/smoke_test_runner.py

+            for i in range(10):
+                logging.info(f"Attempt {i} of 10 to check cluster.")
+                if test_cluster.__check_cluster_ready__():
+                    break


Cleaner approach would be add a method in this file called is_cluster_ready().
Move the test_cluster.__check_cluster_ready__(): logic to that method and return boolean.
Add elif after if test_cluster.__check_cluster_ready__(): to check if i==9 (10th attempt) and if yes, return False.

Catch that bool value and take action accordingly.

I updated the logic here to have a boolean variable so that we don't have to do another redundant cluster readiness check.

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

Add smoke test framework for opensearch bundle

f568b7a

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

zelinh requested review from dblock, peterzhuamazon, gaiksaya, rishabh6788, prudhvigodithi, Divyaasm and tianleh as code owners November 14, 2024 01:56

github-actions bot added the distinguished-contributor label Nov 14, 2024

zelinh and others added 3 commits November 13, 2024 18:36

Update tests

b8090cf

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

Fix windows tests

ede0d2c

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

Fix Pipfile

34730d2

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

Merge remote-tracking branch 'upstream/main' into smoke-test-workflow

b70cf5e

gaiksaya requested changes Nov 18, 2024

View reviewed changes

gaiksaya reviewed Nov 18, 2024

View reviewed changes

rishabh6788 reviewed Nov 20, 2024

View reviewed changes

Add cluster check before running tests

11175c1

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

gaiksaya reviewed Nov 26, 2024

View reviewed changes

Update the test manifest to 2.19.0

6adc380

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

rishabh6788 reviewed Nov 27, 2024

View reviewed changes

Update check cluster readiness

03c7e58

Signed-off-by: Zelin Hao <zelinhao@amazon.com>

	def __init__(self, component: str, config: dict, status: int) -> None:
	self.component = component
	self.config = config
	self.status = status

	@property
	def __test_result(self) -> str:
	return "PASS" if self.status == 0 else "FAIL"

Add smoke test framework for opensearch bundle #5185

Are you sure you want to change the base?

Add smoke test framework for opensearch bundle #5185

Conversation

zelinh commented Nov 14, 2024

Description

Issues Resolved

codecov bot commented Nov 14, 2024

Codecov Report

zelinh commented Nov 14, 2024

peterzhuamazon commented Nov 14, 2024

Codecov Report

zelinh commented Nov 14, 2024

Codecov Report

peterzhuamazon commented Nov 15, 2024

peterzhuamazon commented Nov 15, 2024

zelinh commented Nov 15, 2024

zelinh commented Nov 15, 2024

gaiksaya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rishabh6788 Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelinh commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelinh commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta commented Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rishabh6788 Nov 20, 2024 •

edited

Loading

reta commented Nov 27, 2024 •

edited

Loading