[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204

LennonChin · 2025-09-12T08:28:43Z

Spark Optimizer's ColumnPruning will replace count(*)/count(1) Aggregate plan's child to a Project node with empty projection list:

object ColumnPruning extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = removeProjectBeforeFilter(
    plan.transformWithPruning(AlwaysProcess.fn, ruleId) {
    ...
    // Prunes the unused columns from child of Aggregate/Expand/Generate/ScriptTransformation
    case a @ Aggregate(_, _, child) if !child.outputSet.subsetOf(a.references) =>
      a.copy(child = prunedChild(child, a.references))
    ...
}

but AuthZ plugin's PrivilegesBuilder.buildQuery method will ignore to check child node when plan's inputSet is empty, in this scenario, Aggregate node's child plan's privileges are ignored, which cause count(*)/ count(1) will ignored all privileges that should be checked.

Why are the changes needed?

this patch add a holder node ChildOutputHolder when Aggregate node's references and it's child node's outputSet hava no intersection, it will hold the child node's outputSet used for build privilege objects, and ChildOutputHolder node will be eliminated after RuleAuthorization rule work completed.

How was this patch tested?

updated old unit tests and added new unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

LennonChin · 2025-09-12T08:29:46Z

related issue: #7173, cc @bowenliang123

codecov-commenter · 2025-09-12T10:59:15Z

Codecov Report

❌ Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (8b56295) to head (952548f).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
.../spark/authz/rule/plan/RuleChildOutputMarker.scala	0.00%	6 Missing ⚠️
...ugin/spark/authz/rule/plan/ChildOutputHolder.scala	0.00%	4 Missing ⚠️
...rk/authz/rule/RuleEliminateChildOutputHolder.scala	0.00%	3 Missing ⚠️
.../kyuubi/plugin/spark/authz/PrivilegesBuilder.scala	0.00%	2 Missing ⚠️
...ugin/spark/authz/ranger/RangerSparkExtension.scala	0.00%	2 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #7204   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         695     698    +3     
  Lines       43505   43495   -10     
  Branches     5888    5886    -2     
======================================
+ Misses      43505   43495   -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pan3793 · 2025-09-12T15:16:34Z

...-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/plan/ChildOutputHolder.scala

+
+import org.apache.kyuubi.plugin.spark.authz.util.WithInternalChild
+
+case class ChildOutputHolder(child: LogicalPlan, fixedOutput: Seq[Attribute])


let's add comments to explain why we need this.

fixedOutput is not a good name, it fixed what? maybe just call it childOutput

pan3793 · 2025-09-12T15:18:55Z

...uubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala

          if (columnPrune(p.references.toSeq ++ p.output, p.inputSet).isEmpty) {
            // If plan is project and output don't have relation to input, can ignore.
-            if (!p.isInstanceOf[Project]) {
+            // If plan tree exists ChildOutputHolder, we should build child logic plan.


the variable name existsChildOutputHolder itself explains what it does, a good comment explains WHY, not WHAT.

pan3793 · 2025-09-12T15:22:21Z

this approach lgtm, I suggest adding some comments.

cc @wForget will this affect lineage?

also cc @zhouyifan279 since you have worked closely on this part

wForget · 2025-09-15T05:31:30Z

cc @wForget will this affect lineage?

It seems not, ChildOutputHolder extends UnaryNode, so it will pass the child's lineage.

...-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/plan/ChildOutputHolder.scala

wForget · 2025-09-15T05:47:19Z

...uubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/PrivilegesBuilder.scala

+            //    some nodes in the tree is fixed by RuleChildOutputMarker in some special
+            //    scenarios, such as the Aggregate(count(*)) child node. To avoid missing child node
+            //    permissions, we need to continue checking down.
+            if (!p.isInstanceOf[Project] || existsChildOutputHolder) {


Does this affect children that are not ChildOutputHolder ?

Does this affect children that are not ChildOutputHolder ?

This check is to allow PrivilegeBuilder.buildQuery method continue drill down to the child nodes of the ChildOutputHolder node. Without this check, the PrivilegeBuilder.buildQuery method will terminate before hitting ChildOutputHolder. Since the outputs held by ChildOutputHolder are all useful, combined with columnPrune method, I think other child nodes will not be affected.

This logic is in for (child <- p.children) {, which will be applied to all children of p. Do we need it for children of p that are not ChildOutputHolder?

This logic is in for (child <- p.children) {, which will be applied to all children of p. Do we need it for children of p that are not ChildOutputHolder?

Normally, recursive checking of deeper nodes is necessary, but the presence of an Aggregate(count(*)) node is an exception, as it blocks recursion. In current case, ChildOutputHolder is only added when encountered Aggregate(count(*)) node. When a ChildOutputHolder node exists in the plan tree, it indicates that nodes deeper than ChildOutputHolder holds useful output information, we need to continue checking deeper nodes. At the same time, when recursing to the node deeper than ChildOutputHolder, we regress to normal judgment logic. Therefore, I think the judgment here is reasonable.

[AUTHZ] select(*)/select(1) should check sub plans' privileges

b711454

github-actions bot added module:spark module:extensions module:authz labels Sep 12, 2025

style fixed

b4d9992

pan3793 reviewed Sep 12, 2025

View reviewed changes

pan3793 requested a review from wForget September 12, 2025 15:19

optimize

952548f

LennonChin requested a review from pan3793 September 15, 2025 01:37

wForget reviewed Sep 15, 2025

View reviewed changes

...-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/plan/ChildOutputHolder.scala Show resolved Hide resolved

wForget reviewed Sep 15, 2025

View reviewed changes

LennonChin requested a review from wForget September 15, 2025 07:59

LennonChin changed the title ~~[AUTHZ] select(*)/select(1) should check sub nodes' privileges~~ [AUTHZ] count(*)/count(1) should check sub nodes' privileges Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204

[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204

LennonChin commented Sep 12, 2025 •

edited

Loading

Uh oh!

LennonChin commented Sep 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 12, 2025 •

edited

Loading

Uh oh!

pan3793 Sep 12, 2025

Uh oh!

pan3793 Sep 12, 2025 •

edited

Loading

Uh oh!

pan3793 commented Sep 12, 2025

Uh oh!

wForget commented Sep 15, 2025

Uh oh!

Uh oh!

wForget Sep 15, 2025

Uh oh!

LennonChin Sep 15, 2025

Uh oh!

wForget Sep 15, 2025

Uh oh!

LennonChin Sep 15, 2025 •

edited

Loading

Uh oh!

Uh oh!


		import org.apache.kyuubi.plugin.spark.authz.util.WithInternalChild

		case class ChildOutputHolder(child: LogicalPlan, fixedOutput: Seq[Attribute])

[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204

Are you sure you want to change the base?

[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204

Conversation

LennonChin commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LennonChin commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pan3793 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Sep 12, 2025

Uh oh!

wForget commented Sep 15, 2025

Uh oh!

Uh oh!

wForget Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

LennonChin Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

wForget Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

LennonChin Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LennonChin commented Sep 12, 2025 •

edited

Loading

LennonChin commented Sep 12, 2025 •

edited

Loading

codecov-commenter commented Sep 12, 2025 •

edited

Loading

pan3793 Sep 12, 2025 •

edited

Loading

LennonChin Sep 15, 2025 •

edited

Loading