-
Notifications
You must be signed in to change notification settings - Fork 962
[AUTHZ] count(*)/count(1) should check sub nodes' privileges #7204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
related issue: #7173, cc @bowenliang123 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7204 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 695 698 +3
Lines 43505 43495 -10
Branches 5888 5886 -2
======================================
+ Misses 43505 43495 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
||
import org.apache.kyuubi.plugin.spark.authz.util.WithInternalChild | ||
|
||
case class ChildOutputHolder(child: LogicalPlan, fixedOutput: Seq[Attribute]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add comments to explain why we need this.
fixedOutput
is not a good name, it fixed
what? maybe just call it childOutput
if (columnPrune(p.references.toSeq ++ p.output, p.inputSet).isEmpty) { | ||
// If plan is project and output don't have relation to input, can ignore. | ||
if (!p.isInstanceOf[Project]) { | ||
// If plan tree exists ChildOutputHolder, we should build child logic plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variable name existsChildOutputHolder
itself explains what it does, a good comment explains WHY, not WHAT.
this approach lgtm, I suggest adding some comments. cc @wForget will this affect lineage? also cc @zhouyifan279 since you have worked closely on this part |
It seems not, |
...-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/plan/ChildOutputHolder.scala
Show resolved
Hide resolved
// some nodes in the tree is fixed by RuleChildOutputMarker in some special | ||
// scenarios, such as the Aggregate(count(*)) child node. To avoid missing child node | ||
// permissions, we need to continue checking down. | ||
if (!p.isInstanceOf[Project] || existsChildOutputHolder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect children that are not ChildOutputHolder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect children that are not
ChildOutputHolder
?
This check is to allow PrivilegeBuilder.buildQuery
method continue drill down to the child nodes of the ChildOutputHolder
node. Without this check, the PrivilegeBuilder.buildQuery
method will terminate before hitting ChildOutputHolder
. Since the outputs held by ChildOutputHolder
are all useful, combined with columnPrune
method, I think other child nodes will not be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is in for (child <- p.children) {
, which will be applied to all children of p
. Do we need it for children of p
that are not ChildOutputHolder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is in
for (child <- p.children) {
, which will be applied to all children ofp
. Do we need it for children ofp
that are notChildOutputHolder
?
Normally, recursive checking of deeper nodes is necessary, but the presence of an Aggregate(count(*))
node is an exception, as it blocks recursion. In current case, ChildOutputHolder
is only added when encountered Aggregate(count(*))
node. When a ChildOutputHolder
node exists in the plan tree, it indicates that nodes deeper than ChildOutputHolder
holds useful output information, we need to continue checking deeper nodes. At the same time, when recursing to the node deeper than ChildOutputHolder
, we regress to normal judgment logic. Therefore, I think the judgment here is reasonable.
Spark Optimizer's ColumnPruning will replace
count(*)/count(1)
Aggregate
plan's child to aProject
node with empty projection list:but AuthZ plugin's
PrivilegesBuilder.buildQuery
method will ignore to check child node when plan'sinputSet
is empty, in this scenario,Aggregate
node's child plan's privileges are ignored, which causecount(*)/ count(1)
will ignored all privileges that should be checked.Why are the changes needed?
this patch add a holder node
ChildOutputHolder
whenAggregate
node'sreferences
and it's child node'soutputSet
hava no intersection, it will hold the child node'soutputSet
used for build privilege objects, andChildOutputHolder
node will be eliminated afterRuleAuthorization
rule work completed.How was this patch tested?
updated old unit tests and added new unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.