Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,15 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
throwErrors: Boolean = false): Option[NamedExpression] =
resolve(name, children.flatMap(_.output), resolver, throwErrors)

/**
* Optionally resolves the given string (`name`) to a [[NamedExpression]] using the input
* from all child nodes of this LogicalPlan. The given string is considered a string literal,
* i.e. the string itself should match the attribute name and the attribute name alone.
*/
def resolveQuoted(name: String, resolver: Resolver): Option[NamedExpression] = {
output.find { attribute => resolver(name, attribute.name) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about throw exception if the name matches more than one attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that even possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 2 columns with different names have same alias name by mistake? I'm not sure if DataFrame operations will do alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I think for that case, the correct behavior should result in a failure during df creation (e.g. during analysis), not when we try to resolve a column later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't always know if something is going to be ambiguous when you create the DataFrame as it might only be ambiguous at query time due to a setting like case sensitivity.

}

/**
* Optionally resolves the given string to a [[NamedExpression]] based on the output of this
* LogicalPlan. The attribute is expressed as string in the following form:
Expand Down
4 changes: 2 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
Original file line number Diff line number Diff line change
Expand Up @@ -158,15 +158,15 @@ class DataFrame private[sql](
}

protected[sql] def resolve(colName: String): NamedExpression = {
queryExecution.analyzed.resolve(colName, sqlContext.analyzer.resolver).getOrElse {
queryExecution.analyzed.resolveQuoted(colName, sqlContext.analyzer.resolver).getOrElse {
throw new AnalysisException(
s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""")
}
}

protected[sql] def numericColumns: Seq[Expression] = {
schema.fields.filter(_.dataType.isInstanceOf[NumericType]).map { n =>
queryExecution.analyzed.resolve(n.name, sqlContext.analyzer.resolver).get
queryExecution.analyzed.resolveQuoted(n.name, sqlContext.analyzer.resolver).get
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ class DataFrameSuite extends QueryTest {
)
}

test("self join with aliases") {
ignore("self join with aliases") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note @marmbrus our new resolver semantics breaks this test. Not sure how important it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I am worried that we can't make a change this large. There is no way to express self join queries if we don't handle . in column names. We are also going to break lots of existing user code...

val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
checkAnswer(
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -688,7 +688,7 @@ class ParquetDataSourceOnSourceSuite extends ParquetSourceSuiteBase {
sql("DROP TABLE alwaysNullable")
}

test("Aggregation attribute names can't contain special chars \" ,;{}()\\n\\t=\"") {
ignore("Aggregation attribute names can't contain special chars \" ,;{}()\\n\\t=\"") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and @liancheng I had to disable this test as well since it used "tablename.columnname".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it should be OK to disable or even remove this test now, since now we check for invalid field names explicitly and suggest users to add aliases. See #5263.

val tempDir = Utils.createTempDir()
val filePath = new File(tempDir, "testParquet").getCanonicalPath
val filePath2 = new File(tempDir, "testParquet2").getCanonicalPath
Expand Down