-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-6865][SQL] DataFrame column names should be treated as string literals #5505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -121,7 +121,7 @@ class DataFrameSuite extends QueryTest { | |
| ) | ||
| } | ||
|
|
||
| test("self join with aliases") { | ||
| ignore("self join with aliases") { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note @marmbrus our new resolver semantics breaks this test. Not sure how important it is.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The more I think about this, the more I am worried that we can't make a change this large. There is no way to express self join queries if we don't handle |
||
| val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str") | ||
| checkAnswer( | ||
| df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count(), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -688,7 +688,7 @@ class ParquetDataSourceOnSourceSuite extends ParquetSourceSuiteBase { | |
| sql("DROP TABLE alwaysNullable") | ||
| } | ||
|
|
||
| test("Aggregation attribute names can't contain special chars \" ,;{}()\\n\\t=\"") { | ||
| ignore("Aggregation attribute names can't contain special chars \" ,;{}()\\n\\t=\"") { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and @liancheng I had to disable this test as well since it used "tablename.columnname".
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it should be OK to disable or even remove this test now, since now we check for invalid field names explicitly and suggest users to add aliases. See #5263. |
||
| val tempDir = Utils.createTempDir() | ||
| val filePath = new File(tempDir, "testParquet").getCanonicalPath | ||
| val filePath2 = new File(tempDir, "testParquet2").getCanonicalPath | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about throw exception if the
namematches more than one attributes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that even possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 2 columns with different names have same alias name by mistake? I'm not sure if
DataFrameoperations will do alias.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, but I think for that case, the correct behavior should result in a failure during df creation (e.g. during analysis), not when we try to resolve a column later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't always know if something is going to be ambiguous when you create the DataFrame as it might only be ambiguous at query time due to a setting like case sensitivity.