Skip to content

Conversation

@liuzqt
Copy link
Contributor

@liuzqt liuzqt commented Jun 13, 2024

What changes were proposed in this pull request?

refactor: In ExplainUtils.processPlan, use auxiliary idMap instead of OP_ID_TAG

Why are the changes needed?

#45282 introduced synchronize to ExplainUtils.processPlan to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable OP_ID_TAG, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of OP_ID_TAG usage is within ExplainUtils.processPlan, therefore it's safe to do so, with thread local to make it available in other involved classes.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

existing UTs.

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions bot added the SQL label Jun 13, 2024
@liuzqt
Copy link
Contributor Author

liuzqt commented Jun 13, 2024

@cloud-fan

val OP_ID_TAG = TreeNodeTag[Int]("operatorId")
val CODEGEN_ID_TAG = new TreeNodeTag[Int]("wholeStageCodegenId")

val localIdMap: ThreadLocal[java.util.Map[QueryPlan[_], Int]] = ThreadLocal.withInitial(() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define the scope of this thread local? When it's set and when it's cleared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scope is ExplainUtils.processPlan, but I defined it here because QueryPlan also need this, and don't have access to execution package from catalyst. Added comments to clarify.

@liuzqt liuzqt requested a review from cloud-fan June 14, 2024 21:58
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in d3da240 Jun 17, 2024
cloud-fan pushed a commit that referenced this pull request Jun 17, 2024
### What changes were proposed in this pull request?

refactor: In `ExplainUtils.processPlan`, use auxiliary idMap instead of OP_ID_TAG

### Why are the changes needed?

#45282 introduced synchronize to `ExplainUtils.processPlan`  to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable `OP_ID_TAG`, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of `OP_ID_TAG` usage is within `ExplainUtils.processPlan`, therefore it's safe to do so, with thread local to make it available in other involved classes.

### Does this PR introduce _any_ user-facing change?
  NO

### How was this patch tested?
existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #46965 from liuzqt/SPARK-48610.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d3da240)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
senthh pushed a commit to acceldata-io/spark3 that referenced this pull request Aug 23, 2025
…P_ID_TAG

### What changes were proposed in this pull request?

refactor: In `ExplainUtils.processPlan`, use auxiliary idMap instead of OP_ID_TAG

### Why are the changes needed?

apache#45282 introduced synchronize to `ExplainUtils.processPlan`  to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable `OP_ID_TAG`, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of `OP_ID_TAG` usage is within `ExplainUtils.processPlan`, therefore it's safe to do so, with thread local to make it available in other involved classes.

### Does this PR introduce _any_ user-facing change?
  NO

### How was this patch tested?
existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#46965 from liuzqt/SPARK-48610.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d3da240)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
senthh pushed a commit to acceldata-io/spark3 that referenced this pull request Oct 31, 2025
…P_ID_TAG

### What changes were proposed in this pull request?

refactor: In `ExplainUtils.processPlan`, use auxiliary idMap instead of OP_ID_TAG

### Why are the changes needed?

apache#45282 introduced synchronize to `ExplainUtils.processPlan`  to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable `OP_ID_TAG`, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of `OP_ID_TAG` usage is within `ExplainUtils.processPlan`, therefore it's safe to do so, with thread local to make it available in other involved classes.

### Does this PR introduce _any_ user-facing change?
  NO

### How was this patch tested?
existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#46965 from liuzqt/SPARK-48610.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d3da240)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
senthh pushed a commit to acceldata-io/spark3 that referenced this pull request Nov 3, 2025
…P_ID_TAG

### What changes were proposed in this pull request?

refactor: In `ExplainUtils.processPlan`, use auxiliary idMap instead of OP_ID_TAG

### Why are the changes needed?

apache#45282 introduced synchronize to `ExplainUtils.processPlan`  to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable `OP_ID_TAG`, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of `OP_ID_TAG` usage is within `ExplainUtils.processPlan`, therefore it's safe to do so, with thread local to make it available in other involved classes.

### Does this PR introduce _any_ user-facing change?
  NO

### How was this patch tested?
existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#46965 from liuzqt/SPARK-48610.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d3da240)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…apache#626)

### What changes were proposed in this pull request?

refactor: In `ExplainUtils.processPlan`, use auxiliary idMap instead of OP_ID_TAG

### Why are the changes needed?

apache#45282 introduced synchronize to `ExplainUtils.processPlan`  to avoid race condition when multiple queries refers to same cached plan.

The granularity of lock is too large. We can try to fix the root cause of this concurrency issue by refactoring the usage of mutable `OP_ID_TAG`, which is not a good practice in terms of immutable nature of SparkPlan.

Instead, we can use an auxiliary id map, with object identity as the key. The entire scope of `OP_ID_TAG` usage is within `ExplainUtils.processPlan`, therefore it's safe to do so, with thread local to make it available in other involved classes.

### Does this PR introduce _any_ user-facing change?
  NO

### How was this patch tested?
existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#46965 from liuzqt/SPARK-48610.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>

(cherry picked from commit d3da240)

Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants