-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13947][SQL] The error message from using an invalid column reference is not clear #17100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ion in case of attribute name collision
|
Jenkins, ok to test. |
|
@rberenguel : how about adding the "[SQL]" tag to this, since while the feature request comes out of PySpark its changing the SQL code. |
|
Test build #73604 has finished for PR 17100 at commit
|
|
@holdenk added! And I suspect I have some issue locally with coursier when running the tests... They supposedly work (as in, no outputted errors/failures?) 1/3 times, the other 2 cause coursier cache misses. But I think the no errors one is actually "I give up running the test suite anymore". I'll give a look at this (since it's pretty bad I can't rely on the local test suite!) and fix the failing test, is pretty straightforward. |
…e. Added a mental note to disable coursier when running the full Spark test suite
|
Test build #73700 has finished for PR 17100 at commit
|
…nalysisErrorSuite test that was failing. Just in case, removed the hardcoding of the names and hashes from this test
|
Test build #73714 has finished for PR 17100 at commit
|
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, maybe we can ask @marmbrus to take a look at this since its mostly a SQL change.
| failAnalysis( | ||
| s"resolved attribute(s) $missingAttributes missing from $input " + | ||
| s"in operator ${operator.simpleString}") | ||
| s"|Some resolved attribute(s) are not present among available attributes " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really long, would maybe """ + stripMargin be easier to write/read?
|
|
||
| assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil) | ||
| assertAnalysisError(plan, | ||
| "Some resolved attribute(s) are not present among available " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird mixing of strings formats.
# Conflicts: # sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
|
Test build #75013 has finished for PR 17100 at commit
|
|
Can you remove |
|
|
||
| val repeatedNameHint = if (commonNames.size > 0) { | ||
| s"\n|Observe that attribute(s) ${commonNames.mkString(",")} appear in your " + | ||
| "query with at least two different hashes, but same name." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users do not understand what hashes mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
|
To detect duplicate names, you need to pass SQLConf |
|
@gatorsmile Thanks for the pointers, finally found some time to come back to this. I'm not sure if my approach to get the |
|
Test build #76402 has finished for PR 17100 at commit
|
|
Test build #76401 has finished for PR 17100 at commit
|
|
Test build #76434 has finished for PR 17100 at commit
|
|
Can you give a look to the changes @gatorsmile when you have a few spare moments? Also not sure how the conflict has appeared: it was merging cleanly when the CI test suite ran (will obviously fix, but first I want to confirm it's the right way of doing this now) |
|
Test build #77502 has finished for PR 17100 at commit
|
|
re-ping @gatorsmile? |
| import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification | ||
| import org.apache.spark.sql.catalyst.plans._ | ||
| import org.apache.spark.sql.catalyst.plans.logical._ | ||
| import org.apache.spark.sql.internal.SQLConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
| case o if o.children.nonEmpty && o.missingInput.nonEmpty => | ||
| val resolver = plan.conf.resolver | ||
| val attrsWithSameName = o.missingInput.filter(x => | ||
| o.inputSet.exists(y => resolver(x.name, y.name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val attrsWithSameName = o.missingInput.filter { missing =>
o.inputSet.exists(input => resolver(missing.name, input.name))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
| val missingAttributes = o.missingInput.mkString(",") | ||
| val input = o.inputSet.mkString(",") | ||
| val availableAttributes = o.inputSet.mkString(",") | ||
| val repeatedNameHint = if (attrsWithSameName.size > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeatedNameHint doesn't need missingAttributes or availableAttributes, could you move this definition above right after attrsWithSameName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attrsWithSameName.size > 0 -> attrsWithSameName.nonEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand where I should move missing or available, isn't it already just right after attrsWithSameName? But I moved it after repeatedNameHint, since it's needed only for the failAnalysis argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that' what I meant :)
| val repeatedNameHint = if (attrsWithSameName.size > 0) { | ||
| val commonNames = attrsWithSameName.map(_.name).mkString(",") | ||
| s"""\n|Attribute(s) `$commonNames` seem to appear in two | ||
| |different datasets, with the same name.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datasets concept is a little strange here, would inputs or input operators be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n|Please check attribute(s) $commonNames, they seem to appear in two...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it. Agree datasets may not be the best word to define it
|
Test build #82970 has finished for PR 17100 at commit
|
| |$missingAttributes is not in $availableAttributes. | ||
| |$repeatedNameHint | ||
| |The failed query was for operator | ||
| |${operator.simpleString}""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I agree with @viirya, the original error message is OK for me. If you really want to improve it, I think just adding repeatedNameHint at the end would be good enough:
s"Resolved attribute(s) $missingAttributes missing from $input in operator ${operator.simpleString}. $repeatedNameHint"
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good for me, having the repeatedNameHint is the whole point, at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the final message could be:
if (repeatedNameHint.nonEmpty) msg + “ ” + repeatedNameHint else msg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm seems reasonable, indeed. and avoids that horrible trailing newline. Thanks a lot :)
| case o if o.children.nonEmpty && o.missingInput.nonEmpty => | ||
| val resolver = plan.conf.resolver | ||
| val attrsWithSameName = o.missingInput.filter(missing => | ||
| o.inputSet.exists(input => resolver(missing.name, input.name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also change the format?
o.missingInput.filter { missing =>
o.inputSet.exists(input => resolver(missing.name, input.name))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Do you know if there is any style rule for when to use parentheses vs braces in these kind of lambda functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually parentheses are used in one line, while curly braces are used in multi-line. For complete scala style guide, please check this: https://github.com/databricks/scala-style-guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wzhfy
|
|
||
| val missingAttributes = o.missingInput.mkString(",") | ||
| val input = o.inputSet.mkString(",") | ||
| val availableAttributes = o.inputSet.mkString(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you revert this change? original input is ok for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okies
| val repeatedNameHint = if (attrsWithSameName.nonEmpty) { | ||
| val commonNames = attrsWithSameName.map(_.name).mkString(",") | ||
| s"""|Please check attribute(s) `$commonNames`, they seem to appear in two | ||
| |different input operators, with the same name.""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message two different could be wrong (such as the test case). How about changing the hint message:
Attribute(s) with the same name are found in the input: `$sameNames`. Please check if the right attribute(s) are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it, now looks better no matter how many appear
| o.inputSet.exists(input => resolver(missing.name, input.name)) | ||
| } | ||
| val repeatedNameHint = if (attrsWithSameName.nonEmpty) { | ||
| val commonNames = attrsWithSameName.map(_.name).mkString(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commonNames -> sameNames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| val attrA = AttributeReference("a", LongType)(exprId = ExprId(1)) | ||
| val otherA = AttributeReference("a", LongType)(exprId = ExprId(2)) | ||
| val bAlias = Alias(sum(attrA), "b")() :: Nil | ||
| val plan = Aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make the test case more strong by adding another attribute c which is missing from input but doesn't have the same name in input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another attribute
| |Please check attribute(s) `a`, they seem to appear in two | ||
| |different input operators, with the same name.""".stripMargin | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one new line is enough here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
Test build #82981 has finished for PR 17100 at commit
|
|
Test build #82979 has finished for PR 17100 at commit
|
|
Test build #82980 has finished for PR 17100 at commit
|
|
Test build #82994 has finished for PR 17100 at commit
|
| val repeatedNameHint = if (attrsWithSameName.nonEmpty) { | ||
| val sameNames = attrsWithSameName.map(_.name).mkString(",") | ||
| s"""Attribute(s) with the same name appear in the operation: `$sameNames`. | ||
| |Please check if the right attribute(s) are used.""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer one line message for repeatedNameHint and msg, because they are parameters of AnalysisException, but that's not a strong point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong case for having two lines here, and I see the point of seeing it as one line. I guess the best way to have it as one line is splitting it into
val attributeRepetitionMsg = s"Attribute(s) with the same name appear in the operation: `$sameNames`"
val checkAttributesMsg = s"Please check if the right attribute(s) are used."to avoid hitting the < 100 chars linting rule?
|
LGTM except one minor comment. ping @gatorsmile |
| } | ||
| val repeatedNameHint = if (attrsWithSameName.nonEmpty) { | ||
| val sameNames = attrsWithSameName.map(_.name).mkString(",") | ||
| s"""Attribute(s) with the same name appear in the operation: `$sameNames`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, we do not quote multiple column names in the same quote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean wrapping each column name with the ticks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us follow what we did for missingAttributes. No need to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
| val msg = s"""Resolved attribute(s) $missingAttributes missing from $input | ||
| |in operator ${operator.simpleString}.""".stripMargin | ||
|
|
||
| failAnalysis(if (repeatedNameHint.nonEmpty) msg + "\n" + repeatedNameHint else msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
val missingAttributes = o.missingInput.mkString(",")
val input = o.inputSet.mkString(",")
val msgForMissingAttributes = s"Resolved attribute(s) $missingAttributes missing " +
s"from $input in operator ${operator.simpleString}."
val resolver = plan.conf.resolver
val attrsWithSameName = o.missingInput.filter { missing =>
o.inputSet.exists(input => resolver(missing.name, input.name))
}
val msg = if (attrsWithSameName.nonEmpty) {
val sameNames = attrsWithSameName.map(_.name).mkString(",")
s"$msgForMissingAttributes. Attribute(s) with the same name appear in the " +
s"operation: $sameNames. Please check if the right attribute(s) are used."
} else {
msgForMissingAttributes
}
failAnalysis(msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup thanks, is more orderly this way
|
Test build #83020 has finished for PR 17100 at commit
|
|
Thanks! Merged to master. |
|
Thanks @gatorsmile @holdenk @viirya @wzhfy for all the help |
|
Congratulations on your change @rberenguel :D :) |
|
@rberenguel Thank you for your contribution! |
What changes were proposed in this pull request?
Rewritten error message for clarity. Added extra information in case of attribute name collision, hinting the user to double-check referencing two different tables
How was this patch tested?
No functional changes, only final message has changed. It has been tested manually against the situation proposed in the JIRA ticket. Automated tests in repository pass.
This PR is original work from me and I license this work to the Spark project