Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smoke test framework for opensearch bundle #5185

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ types-PyYAML = "~=6.0.1"
# TODO: The 'requests' package stays on 2.28 until we deprecate CentOS7.
# As newer version requires openssl1.1.1 where CentOS7 only provides openssl1.1.0.
# https://github.com/opensearch-project/opensearch-build/issues/3554
requests = "<=2.28.1"
requests = "==2.31.0"
types-requests = "~=2.25"
pre-commit = "~=2.15.0"
isort = "~=5.9"
Expand Down Expand Up @@ -44,6 +44,7 @@ types-urllib3 = "~=1.26.25.14"
charset-normalizer = "~=2.1.1"
beautifulsoup4 = "~=4.12.3"
lxml = "~=5.3.0"
openapi-core = "~=0.19.4"

[dev-packages]

Expand Down
443 changes: 377 additions & 66 deletions Pipfile.lock

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions manifests/2.19.0/opensearch-2.19.0-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ ci:
name: opensearchstaging/ci-runner:ci-runner-al2-opensearch-build-v1
args: -e JAVA_HOME=/opt/java/openjdk-21
components:
- name: opensearch
smoke-test:
test-spec: opensearch.yml
- name: alerting
integ-test:
test-configs:
Expand Down
36 changes: 36 additions & 0 deletions src/run_smoke_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env python
# Copyright OpenSearch Contributors
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import sys

from manifests.test_manifest import TestManifest
from system import console
from test_workflow.smoke_test.smoke_test_runners import SmokeTestRunners
from test_workflow.test_args import TestArgs


def main() -> int:
args = TestArgs()

# Any logging.info call preceding to next line in the execution chain will make the console output not displaying logs in console.
console.configure(level=args.logging_level)

test_manifest = TestManifest.from_path(args.test_manifest_path)

all_results = SmokeTestRunners.from_test_manifest(args, test_manifest).run()

all_results.log()

if all_results.failed():
return 1
else:
return 0


if __name__ == "__main__":
sys.exit(main())
8 changes: 8 additions & 0 deletions src/test_workflow/smoke_test/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright OpenSearch Contributors
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
#
# This page intentionally left blank.
99 changes: 99 additions & 0 deletions src/test_workflow/smoke_test/smoke_test_cluster_opensearch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Copyright OpenSearch Contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are duplicating the work of setting up clusters here. Why not just use the integ-test cluster set up instead of creating a new framework for smoke testing? Check the set ups https://github.com/opensearch-project/opensearch-build/tree/main/src/test_workflow/integ_test and see if you can use the same classes to set up a cluster for smoke testing as well. From high level I do not see a difference in the set up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the entire integ-test cluster might be overcomplicate for smoke test as we don't want to accept any customized configuration for smoke tests cluster to keep it lightweight. I also want to make smoke test workflow less dependent on integ tests as a standalone test workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For APIs to work at component level we might need additional-config to be added.
For example: You cannot check system templates plugin if these configs are not enabled.
The current set up for integ-test is light weight as well. Reusing the existing codebase might be the way to go instead of reinventing the wheel for different usecase. Also per dustribution it will go on increasing like the way we have today for integ-test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any example API that I can test for system templates plugin you are referring? Anyway I think if certain component requires specific configuration for smoke test, it's not ready to be onboarded to smoke tests framework. Because we only deploy the cluster once and run all API request checks against it. We are not sure whether that specific configuration would affect others.
We are reusing the distributions classes from integ tests workflow to install and start the cluster. For any future distribution type, it can be easily adopted by this smoke test workflow.

Copy link
Collaborator

@rishabh6788 rishabh6788 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with reusing the cluster setup from integ test, the code is pretty modular to be reused here. To be able to even run an api of system-templates you need those configurations added to opensearch.yml.
Think about adding smoke tests for features that need to be enabled explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we shouldn't allow any customized configurations as it might grow much bigger. We provide generic framework with all default configurations for the cluster by design. If any of the component needs specific configuration to be operational, it shouldn't be added to the smoke tests and can go into the integ tests.

# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import logging
import os
import shutil
import time

import requests

from git.git_repository import GitRepository
from manifests.build_manifest import BuildManifest
from manifests.bundle_manifest import BundleManifest
from manifests.test_manifest import TestManifest
from system.process import Process
from test_workflow.integ_test.distributions import Distributions
from test_workflow.test_args import TestArgs
from test_workflow.test_recorder.test_recorder import TestRecorder


class SmokeTestClusterOpenSearch():
# dependency_installer: DependencyInstallerOpenSearch
repo: GitRepository

def __init__(
self,
args: TestArgs,
work_dir: str,
test_recorder: TestRecorder
) -> None:
self.args = args
self.work_dir = work_dir
self.test_recorder = test_recorder
self.process_handler = Process()
self.test_manifest = TestManifest.from_path(args.test_manifest_path)
self.product = self.test_manifest.name.lower().replace(" ", "-")
self.path = args.paths.get(self.product)
self.build_manifest = BuildManifest.from_urlpath(os.path.join(self.path, "builds", f"{self.product}", "manifest.yml"))
self.bundle_manifest = BundleManifest.from_urlpath(os.path.join(self.path, "dist", f"{self.product}", "manifest.yml"))
self.version = self.bundle_manifest.build.version
self.platform = self.bundle_manifest.build.platform
self.arch = self.bundle_manifest.build.architecture
self.dist = self.bundle_manifest.build.distribution
self.distribution = Distributions.get_distribution(self.product, self.dist, self.version, work_dir)

def download_or_copy_bundle(self, work_dir: str) -> str:
extension = "tar.gz" if self.dist == "tar" else self.dist
artifact_name = f"{self.product}-{self.version}-{self.platform}-{self.arch}.{extension}"
src_path = '/'.join([self.path.rstrip("/"), "dist", f"{self.product}", f"{artifact_name}"]) \
if self.path.startswith("https://") else os.path.join(self.path, "dist",
f"{self.product}", f"{artifact_name}")
dest_path = os.path.join(work_dir, artifact_name)

if src_path.startswith("https://"):
logging.info(f"Downloading artifacts to {dest_path}")
response = requests.get(src_path)
with open(dest_path, "wb") as file:
file.write(response.content)
else:
logging.info(f"Trying to copy {src_path} to {dest_path}")
# Only copy if it's a file
if os.path.isfile(src_path):
shutil.copy2(src_path, dest_path)
print(f"Copied {src_path} to {dest_path}")
return artifact_name

# Reason we don't re-use test-suite from integ-test is that it's too specific and not generic and lightweight.
def __installation__(self, work_dir: str) -> None:
self.distribution.install(self.download_or_copy_bundle(work_dir))
logging.info("Cluster is installed and ready to be start.")

# Start the cluster after installed and provide endpoint.
def __start_cluster__(self, work_dir: str) -> None:
self.__installation__(work_dir)
self.process_handler.start(self.distribution.start_cmd, self.distribution.install_dir, self.distribution.require_sudo)
logging.info(f"Started OpenSearch with parent PID {self.process_handler.pid}")
time.sleep(30)
logging.info("Cluster is started.")

# Check if the cluster is ready
def __check_cluster_ready__(self) -> bool:
url = "https://localhost:9200/"
logging.info(f"Pinging {url}")
try:
request = requests.get(url, verify=False, auth=("admin", "myStrongPassword123!"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it always be with-security?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend to create an issue for without-security as security is an optional plugin and smoke tests just like integ-test need to run smoothly irrespective of security plugin

logging.info(f"Request is {request.text}")
return 200 <= request.status_code < 300
except requests.RequestException as e:
logging.info(f"Request is {request.text}")
print(f"Request failed: {e}")
return False

def __uninstall__(self) -> None:
self.process_handler.terminate()
logging.info("Cluster is terminated.")
83 changes: 83 additions & 0 deletions src/test_workflow/smoke_test/smoke_test_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Copyright OpenSearch Contributors
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import abc
import json
import logging
import os
import sys
import time
from pathlib import Path
from typing import Any

import yaml

from manifests.component_manifest import Components
from manifests.test_manifest import TestManifest
from system.temporary_directory import TemporaryDirectory
from test_workflow.smoke_test.smoke_test_cluster_opensearch import SmokeTestClusterOpenSearch
from test_workflow.test_args import TestArgs
from test_workflow.test_recorder.test_recorder import TestRecorder


class SmokeTestRunner(abc.ABC):
args: TestArgs
test_manifest: TestManifest
tests_dir: str
test_recorder: TestRecorder
components: Components

def __init__(self, args: TestArgs, test_manifest: TestManifest) -> None:
self.args = args
self.test_manifest = test_manifest
self.tests_dir = os.path.join(os.getcwd(), "test-results")
os.makedirs(self.tests_dir, exist_ok=True)
self.test_recorder = TestRecorder(self.args.test_run_id, "smoke-test", self.tests_dir, args.base_path)
self.save_log = self.test_recorder.test_results_logs

def start_test(self, work_dir: Path) -> Any:
pass

def extract_paths_from_yaml(self, component: str) -> Any:
file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "smoke_tests_spec", f"{component}.yml")
if os.path.exists(file_path):
logging.info(f"Component spec for {component} is found.")
with open(file_path, 'r') as file:
data = yaml.safe_load(file) # Load the YAML content
# Extract paths
paths = data.get('paths', {})
return paths
else:
logging.error("No spec found.")
sys.exit(1)

def convert_parameter_json(self, data: list) -> str:
return "\n".join(json.dumps(item) for item in data) + "\n" if data else ""

# Essential of initiate the testing phase. This function is called by the run_smoke_test.py
def run(self) -> Any:
with TemporaryDirectory(keep=self.args.keep, chdir=True) as work_dir:

logging.info("Initiating smoke tests.")
test_cluster = SmokeTestClusterOpenSearch(self.args, os.path.join(work_dir.path), self.test_recorder)
test_cluster.__start_cluster__(os.path.join(work_dir.path))
is_cluster_ready = False
for i in range(10):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it iterates 10 times to check if the cluster is up and running.
What happens when the cluster is not up after 10 retries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would hard fail the workflow anyway as the cluster is not available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hard-fail when you can put a check and gracefully terminate the workflow?
Please add a graceful termination check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another check before running the checks so if the cluster is not ready after 10 attempts, the workflow will exit gracefully.

logging.info(f"Attempt {i} of 10 to check cluster.")
if test_cluster.__check_cluster_ready__():
is_cluster_ready = True
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner approach would be add a method in this file called is_cluster_ready().
Move the test_cluster.__check_cluster_ready__(): logic to that method and return boolean.
Add elif after if test_cluster.__check_cluster_ready__(): to check if i==9 (10th attempt) and if yes, return False.

Catch that bool value and take action accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic here to have a boolean variable so that we don't have to do another redundant cluster readiness check.

time.sleep(10)
try:
if is_cluster_ready:
results_data = self.start_test(work_dir.path)
else:
logging.info("Cluster is not ready after 10 attempts.")
finally:
logging.info("Terminating and uninstalling the cluster.")
test_cluster.__uninstall__()
return results_data
81 changes: 81 additions & 0 deletions src/test_workflow/smoke_test/smoke_test_runner_opensearch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright OpenSearch Contributors
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import logging
import os
from pathlib import Path
from typing import Any

import requests
from openapi_core import Spec, validate_request, validate_response
from openapi_core.contrib.requests import RequestsOpenAPIRequest, RequestsOpenAPIResponse

from manifests.test_manifest import TestManifest
from test_workflow.smoke_test.smoke_test_runner import SmokeTestRunner
from test_workflow.test_args import TestArgs
from test_workflow.test_result.test_component_results import TestComponentResults
from test_workflow.test_result.test_result import TestResult
from test_workflow.test_result.test_suite_results import TestSuiteResults


class SmokeTestRunnerOpenSearch(SmokeTestRunner):

def __init__(self, args: TestArgs, test_manifest: TestManifest) -> None:
super().__init__(args, test_manifest)
logging.info("Entering Smoke test for OpenSearch Bundle.")

# TODO: Download the spec from https://github.com/opensearch-project/opensearch-api-specification/releases/download/main-latest/opensearch-openapi.yaml
spec_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), "smoke_tests_spec", "opensearch-openapi.yaml")
self.spec_ = Spec.from_file_path(spec_file)
self.mimetype = {
"Content-Type": "application/json"
}
# self.openapi = openapi_core.OpenAPI.from_file_path(spec_file)

def validate_request_swagger(self, request: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this being used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used yet as currently there are some issue validating requests. I'm thinking we may be able to use it eventually; I'm okay to remove it for now.

request = RequestsOpenAPIRequest(request)
validate_request(request=request, spec=self.spec_)
logging.info("Request is validated.")

def validate_response_swagger(self, response: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be moved into the base class since they will be same for each component, right?
You are passing the spec, request and response for each component spec that is running or is my understanding is wrong?
Give all the validation is being done by openapi, would this method signature change when it is running for OpenSerch and SQL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class method is used for all OpenSearch components. I only verified it's working here but not sure if it still works if the scope grows so I keep it in this runner_opensearch class.

request = RequestsOpenAPIRequest(response.request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not validate request as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be some issue on the tool about request validation as the string types are different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more details on it? Is there an issue on openapi repo, may be reference that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue here opensearch-project/opensearch-api-specification#656
The implementation of the openapi tool that we using will assume the requestBodies type as JSON array however what we got from the response will return String.
The Array return type specified in the openapi will also be used to generate client code so it's complicated to modify.

Besides, the validation tool seems to use byte string while some of our response will return string, which cause error as well.

response = RequestsOpenAPIResponse(response)
validate_response(response=response, spec=self.spec_, request=request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try/catch here and return.

logging.info("Response is validated.")

def start_test(self, work_dir: Path) -> TestSuiteResults:
url = "https://localhost:9200"

all_results = TestSuiteResults()
for component in self.test_manifest.components.select(self.args.components):
if component.smoke_test:
logging.info(f"Running smoke test on {component.name} component.")
component_spec = self.extract_paths_from_yaml(component.name)
logging.info(f"component spec is {component_spec}")
test_results = TestComponentResults()
for api_requests, api_details in component_spec.items():
request_url = ''.join([url, api_requests])
logging.info(f"Validating api request {api_requests}")
logging.info(f"API request URL is {request_url}")
for method in api_details.keys(): # Iterates over each method, e.g., "GET", "POST"
requests_method = getattr(requests, method.lower())
parameters_data = self.convert_parameter_json(api_details.get(method).get("parameters"))
header = api_details.get(method).get("header", self.mimetype)
logging.info(f"Parameter is {parameters_data} and type is {type(parameters_data)}")
logging.info(f"header is {header}")
status = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, better have try/catch in the called method and return True/False and take appropriate action?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status is integer type and used for TestResults class to classify PASS/FAIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the recommendation to use boolean instead of integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestResults class will mark PASS only when status == 0. We may have to use integer here.

def __init__(self, component: str, config: dict, status: int) -> None:
self.component = component
self.config = config
self.status = status
@property
def __test_result(self) -> str:
return "PASS" if self.status == 0 else "FAIL"

try:
response = requests_method(request_url, verify=False, auth=("admin", "myStrongPassword123!"), headers=header, data=parameters_data)
logging.info(f"Response is {response.text}")
self.validate_response_swagger(response)
except:
status = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be log the exception for debugging purpose.

finally:
test_result = TestResult(component.name, ' '.join([api_requests, method]), status) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen here if the cluster failed to start and no tests were run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another check to avoid these from happening. Thanks.

test_results.append(test_result)
all_results.append(component.name, test_results)
return all_results
22 changes: 22 additions & 0 deletions src/test_workflow/smoke_test/smoke_test_runners.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright OpenSearch Contributors
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.


from manifests.test_manifest import TestManifest
from test_workflow.smoke_test.smoke_test_runner import SmokeTestRunner
from test_workflow.smoke_test.smoke_test_runner_opensearch import SmokeTestRunnerOpenSearch
from test_workflow.test_args import TestArgs


class SmokeTestRunners:
RUNNERS = {
"OpenSearch": SmokeTestRunnerOpenSearch
}

@classmethod
def from_test_manifest(cls, args: TestArgs, test_manifest: TestManifest) -> SmokeTestRunner:
return cls.RUNNERS[test_manifest.name](args, test_manifest)
Loading
Loading