-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add smoke test framework for opensearch bundle #5185
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5185 +/- ##
==========================================
- Coverage 92.12% 91.81% -0.31%
==========================================
Files 197 202 +5
Lines 6817 7026 +209
==========================================
+ Hits 6280 6451 +171
- Misses 537 575 +38 ☔ View full report in Codecov by Sentry. |
Looking into these check failures. |
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
Seems like missing bunch of test cases. |
No sure why it's not updating. I attached respective tests after this comments but it still shows 0% |
Pending #5187 |
Ready for review. |
Thanks for help on fixing the container. |
Tested on this docker image
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions and suggestions:
- How about reusing the current integration test cluster set up instead of installing a cluster for smoke testing? Let me know if the set up is different.
- Would we have a new test-report.yml for smoke testing or will the data be appended to current report.yml?
Adding @dblock to take a look from open-api perspective.
Thanks!
@@ -0,0 +1,99 @@ | |||
# Copyright OpenSearch Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we are duplicating the work of setting up clusters here. Why not just use the integ-test cluster set up instead of creating a new framework for smoke testing? Check the set ups https://github.com/opensearch-project/opensearch-build/tree/main/src/test_workflow/integ_test and see if you can use the same classes to set up a cluster for smoke testing as well. From high level I do not see a difference in the set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing the entire integ-test cluster might be overcomplicate for smoke test as we don't want to accept any customized configuration for smoke tests cluster to keep it lightweight. I also want to make smoke test workflow less dependent on integ tests as a standalone test workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For APIs to work at component level we might need additional-config to be added.
For example: You cannot check system templates plugin if these configs are not enabled.
The current set up for integ-test is light weight as well. Reusing the existing codebase might be the way to go instead of reinventing the wheel for different usecase. Also per dustribution it will go on increasing like the way we have today for integ-test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any example API that I can test for system templates plugin you are referring? Anyway I think if certain component requires specific configuration for smoke test, it's not ready to be onboarded to smoke tests framework. Because we only deploy the cluster once and run all API request checks against it. We are not sure whether that specific configuration would affect others.
We are reusing the distributions
classes from integ tests workflow to install and start the cluster. For any future distribution type, it can be easily adopted by this smoke test workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with reusing the cluster setup from integ test, the code is pretty modular to be reused here. To be able to even run an api of system-templates you need those configurations added to opensearch.yml.
Think about adding smoke tests for features that need to be enabled explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we shouldn't allow any customized configurations as it might grow much bigger. We provide generic framework with all default configurations for the cluster by design. If any of the component needs specific configuration to be operational, it shouldn't be added to the smoke tests and can go into the integ tests.
@@ -0,0 +1,30 @@ | |||
# yamllint disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about storing these files under manifests folder per version?
manifests/2.18.0/smoke_test/opensearch.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, i don't feel it's necessary to keep these files distinct by version. So far the API requests in this file are generic and not version specific. In addition, any of the valid API path can be filtered in the API-spec file for the first version it was introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new APIs that are being added how do you propose to manage that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version introduced will be specified in the API-spec file and we can check the version and compare it with the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear, what I meant was if we introduce new APIs in say 2.19.0 would we be using same spec file that we used in 2.18.0?
Wondering how api-spec repo is managing that today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for new x-version-add (2.9, 2.10, 2.11), how would the spec look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component spec will still be the same. We check on the test workflow side.
The workflow extracts the api path from this component spec, checks current cluster version vs the x-version-add from api-spec and run checks if current version is later than added version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not differentiated on version it will become a problem when 3.0 is being prepared to release and has breaking changes in existing apis.
You don't have to separate at each minor but atleast at 2.x and 3.x. Minor versions are backward compatible so it is safe to assume there will be no breaking changes in the apis.
We don't have to worry about introducing new apis in minor versions as smoke tests are expected to run for versions to be released and not already released versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay if openapi handles the version check and returns appropriate response.
Say, version 3.0 made a change to _cat/indices
api request/response and you run this workflow for both 2.x and 3.x for a new release, how will that be handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes should also be handled in the openapi spec files.
|
url = "https://localhost:9200/" | ||
logging.info(f"Pinging {url}") | ||
try: | ||
request = requests.get(url, verify=False, auth=("admin", "myStrongPassword123!")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it always be with-security?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend to create an issue for without-security as security is an optional plugin and smoke tests just like integ-test need to run smoothly irrespective of security plugin
logging.info("Initiating smoke tests.") | ||
test_cluster = SmokeTestClusterOpenSearch(self.args, os.path.join(work_dir.path), self.test_recorder) | ||
test_cluster.__start_cluster__(os.path.join(work_dir.path)) | ||
for i in range(10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it iterates 10 times to check if the cluster is up and running.
What happens when the cluster is not up after 10 retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would hard fail the workflow anyway as the cluster is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hard-fail when you can put a check and gracefully terminate the workflow?
Please add a graceful termination check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another check before running the checks so if the cluster is not ready after 10 attempts, the workflow will exit gracefully.
} | ||
# self.openapi = openapi_core.OpenAPI.from_file_path(spec_file) | ||
|
||
def validate_request_swagger(self, request: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used yet as currently there are some issue validating requests. I'm thinking we may be able to use it eventually; I'm okay to remove it for now.
logging.info("Request is validated.") | ||
|
||
def validate_response_swagger(self, response: Any) -> None: | ||
request = RequestsOpenAPIRequest(response.request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not validate request as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be some issue on the tool about request validation as the string types are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details on it? Is there an issue on openapi repo, may be reference that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue here opensearch-project/opensearch-api-specification#656
The implementation of the openapi tool that we using will assume the requestBodies type as JSON array however what we got from the response will return String.
The Array return type specified in the openapi will also be used to generate client code so it's complicated to modify.
Besides, the validation tool seems to use byte string while some of our response will return string, which cause error as well.
def validate_response_swagger(self, response: Any) -> None: | ||
request = RequestsOpenAPIRequest(response.request) | ||
response = RequestsOpenAPIResponse(response) | ||
validate_response(response=response, spec=self.spec_, request=request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try/catch here and return.
header = api_details.get(method).get("header", self.mimetype) | ||
logging.info(f"Parameter is {parameters_data} and type is {type(parameters_data)}") | ||
logging.info(f"header is {header}") | ||
status = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, better have try/catch in the called method and return True/False
and take appropriate action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This status is integer type and used for TestResults class to classify PASS/FAIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the recommendation to use boolean instead of integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TestResults class will mark PASS only when status == 0. We may have to use integer here.
opensearch-build/src/test_workflow/test_result/test_result.py
Lines 18 to 25 in 0931ccc
def __init__(self, component: str, config: dict, status: int) -> None: | |
self.component = component | |
self.config = config | |
self.status = status | |
@property | |
def __test_result(self) -> str: | |
return "PASS" if self.status == 0 else "FAIL" |
logging.info(f"Response is {response.text}") | ||
self.validate_response_swagger(response) | ||
except: | ||
status = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be log the exception for debugging purpose.
except: | ||
status = 1 | ||
finally: | ||
test_result = TestResult(component.name, ' '.join([api_requests, method]), status) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen here if the cluster failed to start and no tests were run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another check to avoid these from happening. Thanks.
validate_request(request=request, spec=self.spec_) | ||
logging.info("Request is validated.") | ||
|
||
def validate_response_swagger(self, response: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be moved into the base class since they will be same for each component, right?
You are passing the spec, request and response for each component spec that is running or is my understanding is wrong?
Give all the validation is being done by openapi, would this method signature change when it is running for OpenSerch and SQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class method is used for all OpenSearch components. I only verified it's working here but not sure if it still works if the scope grows so I keep it in this runner_opensearch class.
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
- name: opensearch | ||
smoke-test: | ||
test-spec: opensearch.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move to 2.19 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't started the release cycle for 2.19 so there is no complete distribution bundle for testing purpose yet so I used 2.18 to demonstrate the correct command to start the workflow. I'm OK to change this to 2.19; and we should also update the manifest workflow to attach the smoke tests entry into the test manifest creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to 2.19 test manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed new command ./test.sh smoke-test manifests/2.19.0/opensearch-2.19.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.19.0/10545/linux/x64/tar/
is running correctly.
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
else: | ||
time.sleep(10) | ||
try: | ||
if test_cluster.__check_cluster_ready__(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant given you are already checking above if the cluster is ready or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest commit.
for i in range(10): | ||
logging.info(f"Attempt {i} of 10 to check cluster.") | ||
if test_cluster.__check_cluster_ready__(): | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaner approach would be add a method in this file called is_cluster_ready()
.
Move the test_cluster.__check_cluster_ready__():
logic to that method and return boolean.
Add elif after if test_cluster.__check_cluster_ready__():
to check if i==9 (10th attempt) and if yes, return False.
Catch that bool value and take action accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the logic here to have a boolean variable so that we don't have to do another redundant cluster readiness check.
Signed-off-by: Zelin Hao <zelinhao@amazon.com>
Description
Add smoke test framework for opensearch bundle
The smoke test workflow can be started with both local bundle artifacts or from CI.
Example command:
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=/test/tar
or
./test.sh smoke-test manifests/2.18.0/opensearch-2.18.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.18.0/10479/linux/x64/tar/
The essential functionalities of this smoke test workflow is to deploy a staging bundle cluster by using
distribution
class form integ test workflow, run some basic API requests and validate its response schema against theopensearch-api-specification
.Issues Resolved
#5164
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.