Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 28, 2016

What changes were proposed in this pull request?

JIRA: https://issues.apache.org/jira/browse/SPARK-14191

Expand operator now uses its child plan's constraints as its valid constraints (i.e., the base of constraints). This is not correct because Expand will set its group by attributes to null values. So the nullability of these attributes should be true.

E.g., for an Expand operator like:

val input = LocalRelation('a.int, 'b.int, 'c.int).where('c.attr > 10 && 'a.attr < 5 && 'b.attr > 2)
Expand(
  Seq(
    Seq('c, Literal.create(null, StringType), 1),
    Seq('c, 'a, 2)),
  Seq('c, 'a, 'gid.int),
  Project(Seq('a, 'c), input))

The Project operator has the constraints IsNotNull('a), IsNotNull('b) and IsNotNull('c). But the Expand should not have IsNotNull('a) in its constraints.

This PR is the first step for this issue and remove invalid constraints of Expand operator.

How was this patch tested?

A test is added to ConstraintPropagationSuite.

@SparkQA
Copy link

SparkQA commented Mar 28, 2016

Test build #54301 has finished for PR 11995 at commit 84821b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

child: LogicalPlan) extends UnaryNode {

child: LogicalPlan,
constraintsBase: Seq[Expression]) extends UnaryNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is not documented.

Also, this operator is kind of violating the style of logical operators. The goal here is for query nodes to be correct by construction (i.e. figure out things like constraints by inspecting themselves and their children).

It would be great if we could move constraint pruning logic into Expand.

@SparkQA
Copy link

SparkQA commented Mar 30, 2016

Test build #54495 has finished for PR 11995 at commit 23d6b37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 30, 2016

@marmbrus I've updated this and tests passed. Please take a look. Thanks!

child: LogicalPlan) extends UnaryNode {

child: LogicalPlan,
groupByAttrs: Seq[Attribute]) extends UnaryNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels kind of hacky to me and I think it comes down to the fact that this operator is poorly designed. Since we use this operator for more than just grouping, it seems kind of odd to add a new parameter called groupByAttrs to it. Also, we still have the property that its not always going to be correct by construction. If a user of this class incorrectly sets the groupByAttrs everything will still work, but it will just lie about its constraints.

I think the root problem is that projections and output are defined separately in the constructor. Everywhere else in logical plans, you either have an AttributeReference or, if you are producing a new value, you have an Alias. When you follow this pattern, the constraint system just works.

However, in Expand we have logic that decides to replace a column with null (which should be a new AttributeReference), but instead we impersonate the original value.

Until we come up with a principled solution, maybe we should just set validConstraints to be empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing the problem out. Agreed after re-thinking about it. As separating projections and output causes the problem. How about we get the output from projections?

As there are more than one projection, we can just get the output from the first projection and verify its consistency with other projections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do think it would be good if it just took a Seq[Seq[NamedExpression]] (or at least I can't come up with anything better). I'd still consider breaking this into two PRs. Simple fix for now that just removes invalid constraints and a refactoring that add back in valid ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let me remove the constraints first.

@viirya viirya changed the title [SPARK-14191][SQL] Fix Expand operator constraints [SPARK-14191][SQL] Remove invalid Expand operator constraints Mar 31, 2016
@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54601 has finished for PR 11995 at commit ab89e62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 1, 2016

ping @marmbrus

child = a.copy(aggregateExpressions = a.aggregateExpressions.filter(p.references.contains)))
case a @ Project(_, e @ Expand(_, _, grandChild)) if (e.outputSet -- a.references).nonEmpty =>
case a @ Project(_, e @ Expand(_, _, grandChild))
if (e.outputSet -- a.references).nonEmpty =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is best to avoid spurious changes because it pollutes git blame. I can revert this while merging this time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Got it.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 1, 2016

Thanks, merging to master.

Statistics(sizeInBytes = sizeInBytes)
}

override protected def validConstraints: Set[Expression] = Set.empty[Expression]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also going to add a comment here to explain why this is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@asfgit asfgit closed this in a884daa Apr 1, 2016
@viirya viirya deleted the fix-expand-constraints branch December 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants