Skip to content

Commit

Permalink
Can integration tests against the docker cluster
Browse files Browse the repository at this point in the history
The Python script for integration tests was updated to run queries against the docker cluster.
The required indices are created as part of the script. The queries for the Python script were
likely out of date. These have been updated when the fix for the query was obvious.

There are still 6 tests that fail.

Signed-off-by: Norman Jordan <norman.jordan@improving.com>
  • Loading branch information
normanj-bitquill committed Dec 12, 2024
1 parent 418ee7e commit 1d1b807
Show file tree
Hide file tree
Showing 20 changed files with 803 additions and 474 deletions.
2 changes: 1 addition & 1 deletion docker/integ-test/spark-defaults.conf
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.sql.extensions org.opensearch.flint.spark.FlintPPLSparkExtensions,org.opensearch.flint.spark.FlintSparkExtensions
spark.sql.catalog.myglue_test org.apache.spark.opensearch.catalog.OpenSearchCatalog
spark.sql.catalog.dev org.apache.spark.opensearch.catalog.OpenSearchCatalog
spark.datasource.flint.host opensearch
spark.datasource.flint.port 9200
spark.datasource.flint.scheme http
Expand Down
45 changes: 25 additions & 20 deletions integ-test/script/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,31 @@ Apart from the basic feature, it also has some advanced functionality includes:
### Usage
To use this script, you need to have Python **3.6** or higher installed. It also requires the following Python libraries:
```shell
pip install requests pandas openpyxl
pip install requests pandas openpyxl pyspark setuptools pyarrow grpcio grpcio-status protobuf
```

Next start the Docker containers that will be used for the tests. In the directory `docker/integ-test`
```shell
docker compose up -d
```

After the tests are finished, the Docker containers can be stopped from the directory `docker/integ-test` with:
```shell
docker compose down
```

After getting the requisite libraries, you can run the script with the following command line parameters in your shell:
```shell
python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --datasource ${DATASOURCE_NAME} --input-csv test_cases.csv --output-file test_report --max-workers 2 --check-interval 10 --timeout 600
python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --opensearch-url ${OPENSEARCH_URL} --input-csv test_cases.csv --output-file test_report
```
You need to replace the placeholders with your actual values of URL_ADDRESS, DATASOURCE_NAME and USERNAME, PASSWORD for authentication to your endpoint.
You need to replace the placeholders with your actual values of URL_ADDRESS, OPENSEARCH_URL and USERNAME, PASSWORD for authentication to your endpoint.

For more details of the command line parameters, you can see the help manual via command:
```shell
python SanityTest.py --help

usage: SanityTest.py [-h] --base-url BASE_URL --username USERNAME --password PASSWORD --datasource DATASOURCE --input-csv INPUT_CSV
--output-file OUTPUT_FILE [--max-workers MAX_WORKERS] [--check-interval CHECK_INTERVAL] [--timeout TIMEOUT]
--output-file OPENSEARCH_URL [--max-workers MAX_WORKERS] [--check-interval CHECK_INTERVAL] [--timeout TIMEOUT]
[--start-row START_ROW] [--end-row END_ROW]

Run tests from a CSV file and generate a report.
Expand All @@ -41,17 +51,12 @@ options:
--base-url BASE_URL Base URL of the service
--username USERNAME Username for authentication
--password PASSWORD Password for authentication
--datasource DATASOURCE
Datasource name
--output-file OPENSEARCH_URL
URL of the OpenSearch service
--input-csv INPUT_CSV
Path to the CSV file containing test queries
--output-file OUTPUT_FILE
Path to the output report file
--max-workers MAX_WORKERS
optional, Maximum number of worker threads (default: 2)
--check-interval CHECK_INTERVAL
optional, Check interval in seconds (default: 10)
--timeout TIMEOUT optional, Timeout in seconds (default: 600)
--start-row START_ROW
optional, The start row of the query to run, start from 1
--end-row END_ROW optional, The end row of the query to run, not included
Expand All @@ -78,12 +83,12 @@ It also provides the query_id, session_id and start/end time for each query, whi

An example of Excel report:

| query_name | query | expected_status | status | check_status | error | result | Duration (s) | query_id | session_id | Start Time | End Time |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------|--------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------------------------------|------------------------------|----------------------|---------------------|
| 1 | describe myglue_test.default.http_logs | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{...}, ...], 'datarows': [[...], ...], 'total': 31, 'size': 31} | 37.51 | SHFEVWxDNnZjem15Z2x1ZV90ZXN0 | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10 | 2024-11-07 13:34:47 |
| 2 | source = myglue_test.default.http_logs \| dedup status CONSECUTIVE=true | SUCCESS | FAILED | FALSE | {"Message":"Fail to run query. Cause: Consecutive deduplication is not supported"} | | 39.53 | dVNlaVVxOFZrZW15Z2x1ZV90ZXN0 | ZGU2MllVYmI4dG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10 | 2024-11-07 13:34:49 |
| 3 | source = myglue_test.default.http_logs \| eval res = json_keys(json('{"account_number":1,"balance":39225,"age":32,"gender":"M"}')) \| head 1 \| fields res | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{'name': 'res', 'type': 'array'}], 'datarows': [[['account_number', 'balance', 'age', 'gender']]], 'total': 1, 'size': 1} | 12.77 | WHQxaXlVSGtGUm15Z2x1ZV90ZXN0 | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:47 | 2024-11-07 13:38:45 |
| ... | ... | ... | ... | ... | | | ... | ... | ... | ... | ... |
| query_name | query | expected_status | status | check_status | error | result | duration (s) | Start Time | End Time |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------|--------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|----------------------|---------------------|
| 1 | describe myglue_test.default.http_logs | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{...}, ...], 'datarows': [[...], ...], 'total': 31, 'size': 31} | 37.51 | 2024-11-07 13:34:10 | 2024-11-07 13:34:47 |
| 2 | source = myglue_test.default.http_logs \| dedup status CONSECUTIVE=true | SUCCESS | FAILED | FALSE | {"Message":"Fail to run query. Cause: Consecutive deduplication is not supported"} | | 39.53 | 2024-11-07 13:34:10 | 2024-11-07 13:34:49 |
| 3 | source = myglue_test.default.http_logs \| eval res = json_keys(json('{"account_number":1,"balance":39225,"age":32,"gender":"M"}')) \| head 1 \| fields res | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{'name': 'res', 'type': 'array'}], 'datarows': [[['account_number', 'balance', 'age', 'gender']]], 'total': 1, 'size': 1} | 12.77 | 2024-11-07 13:34:47 | 2024-11-07 13:38:45 |
| ... | ... | ... | ... | ... | | | ... | ... | ... |


#### JSON Report
Expand All @@ -103,7 +108,7 @@ An example of JSON report:
"detailed_results": [
{
"query_name": 1,
"query": "source = myglue_test.default.http_logs | stats avg(size)",
"query": "source = dev.default.http_logs | stats avg(size)",
"query_id": "eFZmTlpTa3EyTW15Z2x1ZV90ZXN0",
"session_id": "bFJDMWxzb2NVUm15Z2x1ZV90ZXN0",
"status": "SUCCESS",
Expand All @@ -130,7 +135,7 @@ An example of JSON report:
},
{
"query_name": 2,
"query": "source = myglue_test.default.http_logs | eval res = json_keys(json(\u2018{\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]}')) | head 1 | fields res",
"query": "source = def.default.http_logs | eval res = json_keys(json(\u2018{\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]}')) | head 1 | fields res",
"query_id": "bjF4Y1VnbXdFYm15Z2x1ZV90ZXN0",
"session_id": "c3pvU1V6OW8xM215Z2x1ZV90ZXN0",
"status": "FAILED",
Expand All @@ -142,7 +147,7 @@ An example of JSON report:
},
{
"query_name": 2,
"query": "source = myglue_test.default.http_logs | eval col1 = size, col2 = clientip | stats avg(col1) by col2",
"query": "source = dev.default.http_logs | eval col1 = size, col2 = clientip | stats avg(col1) by col2",
"query_id": "azVyMFFORnBFRW15Z2x1ZV90ZXN0",
"session_id": "VWF0SEtrNWM3bm15Z2x1ZV90ZXN0",
"status": "TIMEOUT",
Expand Down
Loading

0 comments on commit 1d1b807

Please sign in to comment.