Skip to content

Conversation

@rberenguel
Copy link
Contributor

What changes were proposed in this pull request?

Rewritten error message for clarity. Added extra information in case of attribute name collision, hinting the user to double-check referencing two different tables

How was this patch tested?

No functional changes, only final message has changed. It has been tested manually against the situation proposed in the JIRA ticket. Automated tests in repository pass.

This PR is original work from me and I license this work to the Spark project

@holdenk
Copy link
Contributor

holdenk commented Feb 28, 2017

Jenkins, ok to test.

@holdenk
Copy link
Contributor

holdenk commented Feb 28, 2017

@rberenguel : how about adding the "[SQL]" tag to this, since while the feature request comes out of PySpark its changing the SQL code.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73604 has finished for PR 17100 at commit 981224b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rberenguel rberenguel changed the title [SPARK-13947][PYTHON] PySpark DataFrames: The error message from using an invalid table reference is not clear [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error message from using an invalid table reference is not clear Mar 1, 2017
@rberenguel
Copy link
Contributor Author

@holdenk added! And I suspect I have some issue locally with coursier when running the tests... They supposedly work (as in, no outputted errors/failures?) 1/3 times, the other 2 cause coursier cache misses. But I think the no errors one is actually "I give up running the test suite anymore". I'll give a look at this (since it's pretty bad I can't rely on the local test suite!) and fix the failing test, is pretty straightforward.

…e. Added a mental note to disable coursier when running the full Spark test suite
@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73700 has finished for PR 17100 at commit ea07688.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…nalysisErrorSuite test that was failing. Just in case, removed the hardcoding of the names and hashes from this test
@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73714 has finished for PR 17100 at commit 65b9596.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, maybe we can ask @marmbrus to take a look at this since its mostly a SQL change.

failAnalysis(
s"resolved attribute(s) $missingAttributes missing from $input " +
s"in operator ${operator.simpleString}")
s"|Some resolved attribute(s) are not present among available attributes " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really long, would maybe """ + stripMargin be easier to write/read?


assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
assertAnalysisError(plan,
"Some resolved attribute(s) are not present among available " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird mixing of strings formats.

Ruben Berenguel Montoro added 2 commits March 22, 2017 00:19
# Conflicts:
#	sql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out
@SparkQA
Copy link

SparkQA commented Mar 22, 2017

Test build #75013 has finished for PR 17100 at commit 7bb6f35.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Can you remove [PYTHON] from the PR title? Thanks!


val repeatedNameHint = if (commonNames.size > 0) {
s"\n|Observe that attribute(s) ${commonNames.mkString(",")} appear in your " +
"query with at least two different hashes, but same name."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users do not understand what hashes mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@gatorsmile
Copy link
Member

To detect duplicate names, you need to pass SQLConf conf in checkAnalysis for supporting the case sensitivity in name resolution. For each missing attribute, you can try using conf.resolver to find whether there exists a column from o.inputSet whose name is the same. After detecting the duplicate names, you can output the message to say the missing attributes are from a Dataset different from the input set.

@rberenguel rberenguel changed the title [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error message from using an invalid table reference is not clear [SPARK-13947][SQL] PySpark DataFrames: The error message from using an invalid table reference is not clear Mar 22, 2017
@rberenguel
Copy link
Contributor Author

@gatorsmile Thanks for the pointers, finally found some time to come back to this. I'm not sure if my approach to get the SQLConf into checkAnalysis is the correct one in my current local changes (since it seems to change a possible API endpoint). I changed the current implementation in the trait to be named instead def checkAnalysisWithConf(plan: LogicalPlan, conf: SQLConf): Unit and added an abstract method def checkAnalysis(plan: LogicalPlan): Unit that is then implemented in Analyzer (where we have a conf we can pass around). I haven't fixed all the rest yet, was puzzled enough with the correctness of this for now ;) Thanks

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76402 has finished for PR 17100 at commit 766a033.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76401 has finished for PR 17100 at commit 4ac8143.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2017

Test build #76434 has finished for PR 17100 at commit c2dfe11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rberenguel
Copy link
Contributor Author

Can you give a look to the changes @gatorsmile when you have a few spare moments? Also not sure how the conflict has appeared: it was merging cleanly when the CI test suite ran (will obviously fix, but first I want to confirm it's the right way of doing this now)

@SparkQA
Copy link

SparkQA commented May 29, 2017

Test build #77502 has finished for PR 17100 at commit 0cb9825.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Oct 10, 2017

re-ping @gatorsmile?

import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.internal.SQLConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
val resolver = plan.conf.resolver
val attrsWithSameName = o.missingInput.filter(x =>
o.inputSet.exists(y => resolver(x.name, y.name)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

val attrsWithSameName = o.missingInput.filter { missing =>
  o.inputSet.exists(input => resolver(missing.name, input.name))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

val missingAttributes = o.missingInput.mkString(",")
val input = o.inputSet.mkString(",")
val availableAttributes = o.inputSet.mkString(",")
val repeatedNameHint = if (attrsWithSameName.size > 0) {
Copy link
Contributor

@wzhfy wzhfy Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeatedNameHint doesn't need missingAttributes or availableAttributes, could you move this definition above right after attrsWithSameName?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attrsWithSameName.size > 0 -> attrsWithSameName.nonEmpty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where I should move missing or available, isn't it already just right after attrsWithSameName? But I moved it after repeatedNameHint, since it's needed only for the failAnalysis argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that' what I meant :)

val repeatedNameHint = if (attrsWithSameName.size > 0) {
val commonNames = attrsWithSameName.map(_.name).mkString(",")
s"""\n|Attribute(s) `$commonNames` seem to appear in two
|different datasets, with the same name."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datasets concept is a little strange here, would inputs or input operators be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n|Please check attribute(s) $commonNames, they seem to appear in two...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it. Agree datasets may not be the best word to define it

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82970 has finished for PR 17100 at commit 6e8ab42.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

|$missingAttributes is not in $availableAttributes.
|$repeatedNameHint
|The failed query was for operator
|${operator.simpleString}""".stripMargin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I agree with @viirya, the original error message is OK for me. If you really want to improve it, I think just adding repeatedNameHint at the end would be good enough:

s"Resolved attribute(s) $missingAttributes missing from $input in operator ${operator.simpleString}. $repeatedNameHint"

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for me, having the repeatedNameHint is the whole point, at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the final message could be:

if (repeatedNameHint.nonEmpty) msg + “ ” + repeatedNameHint else msg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm seems reasonable, indeed. and avoids that horrible trailing newline. Thanks a lot :)

case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
val resolver = plan.conf.resolver
val attrsWithSameName = o.missingInput.filter(missing =>
o.inputSet.exists(input => resolver(missing.name, input.name)))
Copy link
Contributor

@wzhfy wzhfy Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also change the format?

o.missingInput.filter { missing =>
  o.inputSet.exists(input => resolver(missing.name, input.name))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Do you know if there is any style rule for when to use parentheses vs braces in these kind of lambda functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually parentheses are used in one line, while curly braces are used in multi-line. For complete scala style guide, please check this: https://github.com/databricks/scala-style-guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wzhfy


val missingAttributes = o.missingInput.mkString(",")
val input = o.inputSet.mkString(",")
val availableAttributes = o.inputSet.mkString(",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you revert this change? original input is ok for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okies

val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
val commonNames = attrsWithSameName.map(_.name).mkString(",")
s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
|different input operators, with the same name.""".stripMargin
Copy link
Contributor

@wzhfy wzhfy Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message two different could be wrong (such as the test case). How about changing the hint message:

Attribute(s) with the same name are found in the input: `$sameNames`. Please check if the right attribute(s) are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it, now looks better no matter how many appear

o.inputSet.exists(input => resolver(missing.name, input.name))
}
val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
val commonNames = attrsWithSameName.map(_.name).mkString(",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonNames -> sameNames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
val bAlias = Alias(sum(attrA), "b")() :: Nil
val plan = Aggregate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make the test case more strong by adding another attribute c which is missing from input but doesn't have the same name in input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another attribute

|Please check attribute(s) `a`, they seem to appear in two
|different input operators, with the same name.""".stripMargin


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one new line is enough here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82981 has finished for PR 17100 at commit e3cc455.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82979 has finished for PR 17100 at commit 72b72cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82980 has finished for PR 17100 at commit 33a8a71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82994 has finished for PR 17100 at commit 34753b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
val sameNames = attrsWithSameName.map(_.name).mkString(",")
s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
|Please check if the right attribute(s) are used.""".stripMargin
Copy link
Contributor

@wzhfy wzhfy Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer one line message for repeatedNameHint and msg, because they are parameters of AnalysisException, but that's not a strong point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong case for having two lines here, and I see the point of seeing it as one line. I guess the best way to have it as one line is splitting it into

val attributeRepetitionMsg = s"Attribute(s) with the same name appear in the operation: `$sameNames`"
val checkAttributesMsg = s"Please check if the right attribute(s) are used."

to avoid hitting the < 100 chars linting rule?

@wzhfy
Copy link
Contributor

wzhfy commented Oct 24, 2017

LGTM except one minor comment. ping @gatorsmile

}
val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
val sameNames = attrsWithSameName.map(_.name).mkString(",")
s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, we do not quote multiple column names in the same quote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean wrapping each column name with the ticks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us follow what we did for missingAttributes. No need to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

val msg = s"""Resolved attribute(s) $missingAttributes missing from $input
|in operator ${operator.simpleString}.""".stripMargin

failAnalysis(if (repeatedNameHint.nonEmpty) msg + "\n" + repeatedNameHint else msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

            val missingAttributes = o.missingInput.mkString(",")
            val input = o.inputSet.mkString(",")
            val msgForMissingAttributes = s"Resolved attribute(s) $missingAttributes missing " +
              s"from $input in operator ${operator.simpleString}."

            val resolver = plan.conf.resolver
            val attrsWithSameName = o.missingInput.filter { missing =>
              o.inputSet.exists(input => resolver(missing.name, input.name))
            }

            val msg = if (attrsWithSameName.nonEmpty) {
              val sameNames = attrsWithSameName.map(_.name).mkString(",")
              s"$msgForMissingAttributes. Attribute(s) with the same name appear in the " +
                s"operation: $sameNames. Please check if the right attribute(s) are used."
            } else {
              msgForMissingAttributes
            }

            failAnalysis(msg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup thanks, is more orderly this way

@SparkQA
Copy link

SparkQA commented Oct 24, 2017

Test build #83020 has finished for PR 17100 at commit b8fbdea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 427359f Oct 25, 2017
@rberenguel rberenguel deleted the SPARK-13947-error-message branch October 25, 2017 09:34
@rberenguel
Copy link
Contributor Author

Thanks @gatorsmile @holdenk @viirya @wzhfy for all the help

@holdenk
Copy link
Contributor

holdenk commented Oct 25, 2017

Congratulations on your change @rberenguel :D :)

@gatorsmile
Copy link
Member

@rberenguel Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants