[SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance #40161

huanliwang-db · 2023-02-24T18:58:02Z

"23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
"23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance 
could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by 
[ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.

We are seeing those error messages for a testing query. The taskId != partitionId but we fail to be clear on this in the error log.

It's confusing when we see those logs: the second log entry seems to talk about task 3.0 (it's actually partition 3 and retry attempt 0), but the TID 363 is already occupied by task 2.0 in stage 57.1.

Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)

What changes were proposed in this pull request?

add partition after task: in the log message for clarification
add stage attempt to distinguish different stage retries.

Why are the changes needed?

improve the log message for a better debuggability

Does this PR introduce any user-facing change?

No

How was this patch tested?

only log message change

…ocksDB state store instance

HeartSaVioR

+1 Nice!

HeartSaVioR · 2023-02-24T22:54:55Z

Thanks! Merged to master.

…ocksDB state store instance ``` "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363) "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms. ``` We are seeing those error messages for a testing query. The `taskId != partitionId` but we fail to be clear on this in the error log. It's confusing when we see those logs: the second log entry seems to talk about `task 3.0` (it's actually partition 3 and retry attempt 0), but the `TID 363` is already occupied by `task 2.0 in stage 57.1`. Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired) ### What changes were proposed in this pull request? * add `partition ` after `task: ` in the log message for clarification * add stage attempt to distinguish different stage retries. ### Why are the changes needed? improve the log message for a better debuggability ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? only log message change Closes apache#40161 from huanliwang-db/rocksdb. Authored-by: Huanli Wang <huanli.wang@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

[SPARK-42565][SS] Error log improvement for the lock acquisition of R…

94f0546

…ocksDB state store instance

github-actions bot added SQL STRUCTURED STREAMING labels Feb 24, 2023

HeartSaVioR approved these changes Feb 24, 2023

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-42565][SS] Error log improve ment for the lock acquisition of RocksDB state store instance~~ [SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance Feb 24, 2023

HeartSaVioR closed this in ac30e93 Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance #40161

[SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance #40161

Uh oh!

huanliwang-db commented Feb 24, 2023

Uh oh!

HeartSaVioR left a comment

Uh oh!

HeartSaVioR commented Feb 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance #40161

[SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance #40161

Uh oh!

Conversation

huanliwang-db commented Feb 24, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Feb 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants