Can integration tests against the docker cluster

The Python script for integration tests was updated to run queries against the docker cluster. The required indices are created as part of the script. The queries for the Python script were likely out of date. These have been updated when the fix for the query was obvious. There are still 6 tests that fail. Signed-off-by: Norman Jordan <norman.jordan@improving.com>
Bit-Quill · Dec 12, 2024 · 1d1b807 · 1d1b807
1 parent 418ee7e
commit 1d1b807
Show file tree

Hide file tree

Showing 20 changed files with 803 additions and 474 deletions.
diff --git a/docker/integ-test/spark-defaults.conf b/docker/integ-test/spark-defaults.conf
@@ -26,7 +26,7 @@
 # spark.driver.memory              5g
 # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
 spark.sql.extensions                  org.opensearch.flint.spark.FlintPPLSparkExtensions,org.opensearch.flint.spark.FlintSparkExtensions
-spark.sql.catalog.myglue_test         org.apache.spark.opensearch.catalog.OpenSearchCatalog
+spark.sql.catalog.dev                 org.apache.spark.opensearch.catalog.OpenSearchCatalog
 spark.datasource.flint.host           opensearch
 spark.datasource.flint.port           9200
 spark.datasource.flint.scheme         http

diff --git a/integ-test/script/README.md b/integ-test/script/README.md
@@ -17,21 +17,31 @@ Apart from the basic feature, it also has some advanced functionality includes:
 ### Usage
 To use this script, you need to have Python **3.6** or higher installed. It also requires the following Python libraries:
 ```shell
-pip install requests pandas openpyxl
+pip install requests pandas openpyxl pyspark setuptools pyarrow grpcio grpcio-status protobuf
+```
+
+Next start the Docker containers that will be used for the tests. In the directory `docker/integ-test`
+```shell
+docker compose up -d
+```
+
+After the tests are finished, the Docker containers can be stopped from the directory `docker/integ-test` with:
+```shell
+docker compose down
 ```
 
 After getting the requisite libraries, you can run the script with the following command line parameters in your shell:
 ```shell
-python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --datasource ${DATASOURCE_NAME} --input-csv test_cases.csv --output-file test_report --max-workers 2 --check-interval 10 --timeout 600
+python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --opensearch-url ${OPENSEARCH_URL} --input-csv test_cases.csv --output-file test_report
 ```
-You need to replace the placeholders with your actual values of URL_ADDRESS, DATASOURCE_NAME and USERNAME, PASSWORD for authentication to your endpoint.
+You need to replace the placeholders with your actual values of URL_ADDRESS, OPENSEARCH_URL and USERNAME, PASSWORD for authentication to your endpoint.
 
 For more details of the command line parameters, you can see the help manual via command:
 ```shell
 python SanityTest.py --help   
 
 usage: SanityTest.py [-h] --base-url BASE_URL --username USERNAME --password PASSWORD --datasource DATASOURCE --input-csv INPUT_CSV
-                                      --output-file OUTPUT_FILE [--max-workers MAX_WORKERS] [--check-interval CHECK_INTERVAL] [--timeout TIMEOUT]
+                                      --output-file OPENSEARCH_URL [--max-workers MAX_WORKERS] [--check-interval CHECK_INTERVAL] [--timeout TIMEOUT]
                                       [--start-row START_ROW] [--end-row END_ROW]
 
 Run tests from a CSV file and generate a report.
@@ -41,17 +51,12 @@ options:
   --base-url BASE_URL   Base URL of the service
   --username USERNAME   Username for authentication
   --password PASSWORD   Password for authentication
-  --datasource DATASOURCE
-                        Datasource name
+  --output-file OPENSEARCH_URL
+                        URL of the OpenSearch service
   --input-csv INPUT_CSV
                         Path to the CSV file containing test queries
   --output-file OUTPUT_FILE
                         Path to the output report file
-  --max-workers MAX_WORKERS
-                        optional, Maximum number of worker threads (default: 2)
-  --check-interval CHECK_INTERVAL
-                        optional, Check interval in seconds (default: 10)
-  --timeout TIMEOUT     optional, Timeout in seconds (default: 600)
   --start-row START_ROW
                         optional, The start row of the query to run, start from 1
   --end-row END_ROW     optional, The end row of the query to run, not included
@@ -78,12 +83,12 @@ It also provides the query_id, session_id and start/end time for each query, whi
 
 An example of Excel report:
 
-| query_name | query                                                                                                                                                      | expected_status | status  | check_status | error                                                                              | result                                                                                                                                                      | Duration (s) | query_id                      | session_id                   | Start Time           | End Time            |
-|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------|--------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------------------------------|------------------------------|----------------------|---------------------|
-| 1          | describe myglue_test.default.http_logs                                                                                                                     | SUCCESS         | SUCCESS | TRUE         |                                                                                    | {'status': 'SUCCESS', 'schema': [{...}, ...], 'datarows': [[...], ...], 'total': 31, 'size': 31}                                                            | 37.51        | SHFEVWxDNnZjem15Z2x1ZV90ZXN0  | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10  | 2024-11-07 13:34:47 |
-| 2          | source = myglue_test.default.http_logs \| dedup status CONSECUTIVE=true                                                                                    | SUCCESS         | FAILED  | FALSE        | {"Message":"Fail to run query. Cause: Consecutive deduplication is not supported"} |                                                                                                                                                             | 39.53        | dVNlaVVxOFZrZW15Z2x1ZV90ZXN0  | ZGU2MllVYmI4dG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10  | 2024-11-07 13:34:49 |
-| 3          | source = myglue_test.default.http_logs \| eval res = json_keys(json('{"account_number":1,"balance":39225,"age":32,"gender":"M"}')) \| head 1 \| fields res | SUCCESS         | SUCCESS | TRUE         |                                                                                    | {'status': 'SUCCESS', 'schema': [{'name': 'res', 'type': 'array'}], 'datarows': [[['account_number', 'balance', 'age', 'gender']]], 'total': 1, 'size': 1}  | 12.77        | WHQxaXlVSGtGUm15Z2x1ZV90ZXN0  | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:47  | 2024-11-07 13:38:45 |
-| ...        | ...                                                                                                                                                        | ...             | ...     | ...          |                                                                                    |                                                                                                                                                             | ...          | ...                           | ...                          | ...                  | ...                 |
+| query_name | query                                                                                                                                                      | expected_status | status  | check_status | error                                                                              | result                                                                                                                                                      | duration (s) | Start Time           | End Time            |
+|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------|--------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|----------------------|---------------------|
+| 1          | describe myglue_test.default.http_logs                                                                                                                     | SUCCESS         | SUCCESS | TRUE         |                                                                                    | {'status': 'SUCCESS', 'schema': [{...}, ...], 'datarows': [[...], ...], 'total': 31, 'size': 31}                                                            | 37.51        | 2024-11-07 13:34:10  | 2024-11-07 13:34:47 |
+| 2          | source = myglue_test.default.http_logs \| dedup status CONSECUTIVE=true                                                                                    | SUCCESS         | FAILED  | FALSE        | {"Message":"Fail to run query. Cause: Consecutive deduplication is not supported"} |                                                                                                                                                             | 39.53        | 2024-11-07 13:34:10  | 2024-11-07 13:34:49 |
+| 3          | source = myglue_test.default.http_logs \| eval res = json_keys(json('{"account_number":1,"balance":39225,"age":32,"gender":"M"}')) \| head 1 \| fields res | SUCCESS         | SUCCESS | TRUE         |                                                                                    | {'status': 'SUCCESS', 'schema': [{'name': 'res', 'type': 'array'}], 'datarows': [[['account_number', 'balance', 'age', 'gender']]], 'total': 1, 'size': 1}  | 12.77        | 2024-11-07 13:34:47  | 2024-11-07 13:38:45 |
+| ...        | ...                                                                                                                                                        | ...             | ...     | ...          |                                                                                    |                                                                                                                                                             | ...          | ...                  | ...                 |
 
 
 #### JSON Report
@@ -103,7 +108,7 @@ An example of JSON report:
   "detailed_results": [
     {
       "query_name": 1,
-      "query": "source = myglue_test.default.http_logs | stats avg(size)",
+      "query": "source = dev.default.http_logs | stats avg(size)",
       "query_id": "eFZmTlpTa3EyTW15Z2x1ZV90ZXN0",
       "session_id": "bFJDMWxzb2NVUm15Z2x1ZV90ZXN0",
       "status": "SUCCESS",
@@ -130,7 +135,7 @@ An example of JSON report:
     },
     {
       "query_name": 2,
-      "query": "source = myglue_test.default.http_logs | eval res = json_keys(json(\u2018{\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]}')) | head 1 | fields res",
+      "query": "source = def.default.http_logs | eval res = json_keys(json(\u2018{\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]}')) | head 1 | fields res",
       "query_id": "bjF4Y1VnbXdFYm15Z2x1ZV90ZXN0",
       "session_id": "c3pvU1V6OW8xM215Z2x1ZV90ZXN0",
       "status": "FAILED",
@@ -142,7 +147,7 @@ An example of JSON report:
     },
     {
       "query_name": 2,
-      "query": "source = myglue_test.default.http_logs |  eval col1 = size, col2 = clientip | stats avg(col1) by col2",
+      "query": "source = dev.default.http_logs |  eval col1 = size, col2 = clientip | stats avg(col1) by col2",
       "query_id": "azVyMFFORnBFRW15Z2x1ZV90ZXN0",
       "session_id": "VWF0SEtrNWM3bm15Z2x1ZV90ZXN0",
       "status": "TIMEOUT",