Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Flint Spark API error reporting with centralized handler #348

Merged
merged 16 commits into from
Jun 27, 2024

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented May 20, 2024

Description

In PR #335, we enhanced error messages by storing the root cause error messages in the query results, helping users better understand issues. However, in Flint Spark's API layer, we still don't pass the underlying exceptions to the high-level code.

This PR introduces the withTransaction method to centralize logging and exception handling across different index operations and pass back the original exception. This change aims to improve the clarity of error reporting throughout the system.

Testing

Index State Doesn't Satisfy Precondition

Tested index state exception for #201:

# Before the changes
Fail to run query, cause: Failed to vacuum Flint index

# After the changes
"_source": {
  "jobRunId": "XXX",
  "applicationId": "XXX",
  "dataSourceName": "glue",
  "status": "FAILED",
  "error": """{"Message":"Fail to run query. Cause: Index state [failed] doesn't satisfy precondition"}""",
  "queryId": "",
  "queryText": "VACUUM INDEX test ON glue.default.parquet_mismatch_test",
  "sessionId": "",
  "jobType": "streaming",
  "updateTime": 1719433155989,
  "queryRunTime": 2989
}

OpenSearch Exception

Tested OpenSearch exception for #252:

# Before the changes
Fail to run query, cause: Failed to create Flint index

# After the changes
"_source": {
  "jobRunId": "XXX",
  "applicationId": "XXX",
  "dataSourceName": "glue",
  "status": "FAILED",
  "error": """{"Message":"Fail to run query. Cause: OpenSearch exception [type=illegal_argument_exception,
    reason=unknown setting [index.number_shards] did you mean [index.number_of_shards]?]"}""",
  "queryId": "",
  "queryText": """CREATE INDEX test ON glue.default. parquet_mismatch_test ( protocol )
    WITH (auto_refresh = true, checkpoint_location = 's3://...', index_settings = '{"number_shards":1}')""",
  "sessionId": "",
  "jobType": "streaming",
  "updateTime": 1719434459275,
  "queryRunTime": 6569
}

Wrong MV Query

Tested MV with wrong query in definition for #140

# Before the changes
Fail to run query, cause: Failed to refresh Flint index

# After the changes
"_source": {
  "jobRunId": "XXX",
  "applicationId": "XXX",
  "dataSourceName": "glue",
  "status": "FAILED",
  "error": """{"Message":"Fail to run query. Cause: A windowing function is required for 
incremental refresh with aggregation"}""",
  "queryId": "",
  "queryText": "CREATE MATERIALIZED VIEW glue.default.test_mv AS
SELECT clientip, COUNT(*) FROM glue.default.http_logs GROUP BY clientip
WITH (auto_refresh = true, checkpoint_location = 's3://checkpoint') ",
  "sessionId": "",
  "jobType": "streaming",
  "updateTime": 1719520498657,
  "queryRunTime": 9387
}

Spark log example

INFO FlintSpark: Starting index operation [Create Flint index flint_glue_default_parquet_mismatch_test_index] with forceInit=true
...
ERROR FlintSpark: Failed to execute index operation [Create Flint index flint_glue_default_parquet_mismatch_test_index]
java.lang.IllegalStateException: Failed to commit transaction operation
	...
Caused by: java.lang.IllegalStateException: Failed to create Flint index flint_glue_default_parquet_mismatch_test_index
	at org.opensearch.flint.core.storage.FlintOpenSearchClient.createIndex(FlintOpenSearchClient.java:88)
	at org.opensearch.flint.core.storage.FlintOpenSearchClient.createIndex(FlintOpenSearchClient.java:74)
	at org.opensearch.flint.spark.FlintSpark.$anonfun$createIndex$5(FlintSpark.scala:119)
	at org.opensearch.flint.spark.FlintSpark.$anonfun$createIndex$5$adapted(FlintSpark.scala:114)
	at org.opensearch.flint.core.metadata.log.DefaultOptimisticTransaction.commit(DefaultOptimisticTransaction.java:109)
	... 63 more
Caused by: org.opensearch.OpenSearchStatusException: OpenSearch exception [type=illegal_argument_exception,
reason=unknown setting [index.number_shards] did you mean [index.number_of_shards]?]
	at org.opensearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:207)
	at org.opensearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2228)
	at org.opensearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2205)
	at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1924)
	...

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added enhancement New feature or request 0.5 labels May 20, 2024
@dai-chen dai-chen self-assigned this May 20, 2024
dai-chen added 5 commits May 20, 2024 16:11
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
dai-chen added 7 commits June 3, 2024 13:15
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review June 26, 2024 16:52
@dai-chen dai-chen marked this pull request as draft June 26, 2024 17:35
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review June 26, 2024 22:20
@dai-chen dai-chen changed the title Enhance Flint error handling with detailed exception messages Enhance Flint Spark API error reporting with centralized handler Jun 26, 2024
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx

@dai-chen dai-chen merged commit 0c1ec6b into opensearch-project:main Jun 27, 2024
4 checks passed
@dai-chen dai-chen deleted the improve-error-handling branch June 27, 2024 22:13
@opensearch-trigger-bot
Copy link

The backport to 0.4 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.4 0.4
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.4
# Create a new branch
git switch --create backport/backport-348-to-0.4
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0c1ec6bdf9258cf98595ade985c8d38b7b540114
# Push it to GitHub
git push --set-upstream origin backport/backport-348-to-0.4
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.4

Then, create a pull request where the base branch is 0.4 and the compare/head branch is backport/backport-348-to-0.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants