[Spark-4041][SQL]attributes names in table scan should converted to lowercase when compare with relation attributes #2884

scwf · 2014-10-21T21:18:22Z

In MetastoreRelation the attributes name is lowercase because of hive using lowercase for fields name, so we should convert attributes name in table scan lowercase in indexWhere(_.name == a.name).
neededColumnIDs may be not correct if not convert to lowercase.

…mnsIDs

AmplabJenkins · 2014-10-21T21:22:11Z

Can one of the admins verify this patch?

AmplabJenkins · 2014-10-21T23:13:15Z

Can one of the admins verify this patch?

yhuai · 2014-10-22T01:49:29Z

Can you add a unit test?

liancheng · 2014-10-22T07:20:39Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala

Nit: it would be safer if you use _.name.toLowerCase == a.name.toLowerCase.

scwf · 2014-10-22T08:16:42Z

@yhuai, it's hard to make a unit test for this since addColumnMetadataToConf and hiveExtraConf both can not accessed in test case, they are private. Actually i debug the code and get this issue, it seems this will leads to NPE in hive-0.13 on my local test but ok in master branch(hive-0.12)
@liancheng, updated

liancheng · 2014-10-22T08:36:51Z

I think this change is generally safe. LGTM, thanks.

SparkQA · 2014-10-23T06:40:24Z

QA tests have started for PR 2884 at commit 3ff3a80.

This patch merges cleanly.

SparkQA · 2014-10-23T06:43:52Z

QA tests have finished for PR 2884 at commit 3ff3a80.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

scwf · 2014-10-23T07:23:21Z

test failed due to streaming compile error, can you retest this?

SparkQA · 2014-10-23T08:46:21Z

QA tests have started for PR 2884 at commit 3ff3a80.

This patch merges cleanly.

SparkQA · 2014-10-23T08:49:48Z

QA tests have finished for PR 2884 at commit 3ff3a80.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2014-10-23T09:49:53Z

Hm, the failure was caused by a known Jenkins configuration issue.

liancheng · 2014-10-23T23:23:30Z

retest this please

SparkQA · 2014-10-23T23:30:03Z

QA tests have started for PR 2884 at commit 3ff3a80.

This patch merges cleanly.

SparkQA · 2014-10-24T00:22:08Z

QA tests have finished for PR 2884 at commit 3ff3a80.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-24T00:22:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22096/
Test PASSed.

marmbrus · 2014-10-24T18:09:00Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala

Wait, should this be done by name at all? Couldn't we be using an AttributeMap from Attribute->ordinal instead?

Yes, column names are case insensitive in hive, we should use lowercase for names in hive module(only change here is not enough, also need convert to lowercase there https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L273).
I think using an AttributeMap can not fix this problem, how about add a lowerName for Attribute and in hive we use this method instead?

scwf · 2014-10-25T08:16:52Z

Added a test case for lower case issue, the test will throw NPE if not converted to lowercase

marmbrus · 2014-10-26T23:05:32Z

Great, thanks for finding this and adding a test. Regarding the implementation, I'd like to try to avoid doing too much string munging as its generally easy to forget to do (hence the issue). Also, in general we try to avoid looking at string names anywhere other than in analysis. This is the whole idea behind having expression ids in AttributeReferences (and the idea behind AttributeMaps).

Since we can't completely get away from string names when working with Hive, what do you think about this approach: https://github.com/marmbrus/spark/compare/hiveTableScanCase

I think this more cleanly isolates the need to reason about case sensitivity into the analysis phase.

scwf · 2014-10-27T02:43:36Z

Cool, i think this is better

scwf · 2014-10-27T07:32:26Z

retest this please

SparkQA · 2014-10-27T08:34:13Z

Test build #473 has started for PR 2884 at commit 6174046.

This patch merges cleanly.

SparkQA · 2014-10-27T08:42:22Z

Test build #473 has finished for PR 2884 at commit 6174046.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

scwf · 2014-10-27T17:56:11Z

retest this again, seems Jenkins get something wrong and failed in JoinSuite. locally it passed.

SparkQA · 2014-10-27T17:56:26Z

Test build #475 has started for PR 2884 at commit 6174046.

This patch merges cleanly.

SparkQA · 2014-10-27T18:17:04Z

QA tests have started for PR 2884 at commit 6174046.

This patch does not merge cleanly.

scwf · 2014-10-27T18:59:55Z

CliSuite failed due to Futures timed out, @liancheng any idea about this? seems #2823 does not fix the problem.

SparkQA · 2014-10-27T19:30:08Z

QA tests have finished for PR 2884 at commit 6174046.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2014-10-27T19:31:18Z

Test build #475 has finished for PR 2884 at commit 6174046.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2014-10-28T03:47:45Z

Minor comment: In the future please put SPARK-XXXX in all capitals in the title so that our merge scripts recognize it. Thanks!

Thanks for working on this! Merged to master.

attributes names in table scan should convert lowercase in neededColu…

294fcb7

…mnsIDs

liancheng reviewed Oct 22, 2014
View reviewed changes

more safer change

3ff3a80

marmbrus reviewed Oct 24, 2014
View reviewed changes

use lowerName and add a test case for this issue

dc74a24

scwf force-pushed the fixColumnIds branch from c8b04da to dc74a24 Compare October 27, 2014 02:38

use AttributeMap for this issue

6174046

asfgit closed this in 89af6df Oct 28, 2014

scwf deleted the fixColumnIds branch January 7, 2015 09:50

[Spark-4041][SQL]attributes names in table scan should converted to lowercase when compare with relation attributes #2884

[Spark-4041][SQL]attributes names in table scan should converted to lowercase when compare with relation attributes #2884

Uh oh!

Conversation

scwf commented Oct 21, 2014

Uh oh!

AmplabJenkins commented Oct 21, 2014

Uh oh!

AmplabJenkins commented Oct 21, 2014

Uh oh!

yhuai commented Oct 22, 2014

Uh oh!

liancheng Oct 22, 2014

Choose a reason for hiding this comment

Uh oh!

scwf commented Oct 22, 2014

Uh oh!

liancheng commented Oct 22, 2014

Uh oh!

SparkQA commented Oct 23, 2014

Uh oh!

SparkQA commented Oct 23, 2014

Uh oh!

scwf commented Oct 23, 2014

Uh oh!

SparkQA commented Oct 23, 2014

Uh oh!

SparkQA commented Oct 23, 2014

Uh oh!

liancheng commented Oct 23, 2014

Uh oh!

liancheng commented Oct 23, 2014

Uh oh!

SparkQA commented Oct 23, 2014

Uh oh!

SparkQA commented Oct 24, 2014

Uh oh!

AmplabJenkins commented Oct 24, 2014

Uh oh!

marmbrus Oct 24, 2014

Choose a reason for hiding this comment

Uh oh!

scwf Oct 24, 2014

Choose a reason for hiding this comment

Uh oh!

scwf commented Oct 25, 2014

Uh oh!

marmbrus commented Oct 26, 2014

Uh oh!

scwf commented Oct 27, 2014

Uh oh!

scwf commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

scwf commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

scwf commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

SparkQA commented Oct 27, 2014

Uh oh!

marmbrus commented Oct 28, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants