Skip to content

Conversation

@scwf
Copy link
Contributor

@scwf scwf commented Oct 21, 2014

In MetastoreRelation the attributes name is lowercase because of hive using lowercase for fields name, so we should convert attributes name in table scan lowercase in indexWhere(_.name == a.name).
neededColumnIDs may be not correct if not convert to lowercase.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

1 similar comment
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yhuai
Copy link
Contributor

yhuai commented Oct 22, 2014

Can you add a unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be safer if you use _.name.toLowerCase == a.name.toLowerCase.

@scwf
Copy link
Contributor Author

scwf commented Oct 22, 2014

@yhuai, it's hard to make a unit test for this since addColumnMetadataToConf and hiveExtraConf both can not accessed in test case, they are private. Actually i debug the code and get this issue, it seems this will leads to NPE in hive-0.13 on my local test but ok in master branch(hive-0.12)
@liancheng, updated

@liancheng
Copy link
Contributor

I think this change is generally safe. LGTM, thanks.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2884 at commit 3ff3a80.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have finished for PR 2884 at commit 3ff3a80.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@scwf
Copy link
Contributor Author

scwf commented Oct 23, 2014

test failed due to streaming compile error, can you retest this?

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2884 at commit 3ff3a80.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have finished for PR 2884 at commit 3ff3a80.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Hm, the failure was caused by a known Jenkins configuration issue.

@liancheng
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2884 at commit 3ff3a80.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

QA tests have finished for PR 2884 at commit 3ff3a80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22096/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, should this be done by name at all? Couldn't we be using an AttributeMap from Attribute->ordinal instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, column names are case insensitive in hive, we should use lowercase for names in hive module(only change here is not enough, also need convert to lowercase there https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L273).
I think using an AttributeMap can not fix this problem, how about add a lowerName for Attribute and in hive we use this method instead?

@scwf
Copy link
Contributor Author

scwf commented Oct 25, 2014

Added a test case for lower case issue, the test will throw NPE if not converted to lowercase

@marmbrus
Copy link
Contributor

Great, thanks for finding this and adding a test. Regarding the implementation, I'd like to try to avoid doing too much string munging as its generally easy to forget to do (hence the issue). Also, in general we try to avoid looking at string names anywhere other than in analysis. This is the whole idea behind having expression ids in AttributeReferences (and the idea behind AttributeMaps).

Since we can't completely get away from string names when working with Hive, what do you think about this approach: https://github.com/marmbrus/spark/compare/hiveTableScanCase

I think this more cleanly isolates the need to reason about case sensitivity into the analysis phase.

@scwf
Copy link
Contributor Author

scwf commented Oct 27, 2014

Cool, i think this is better

@scwf
Copy link
Contributor Author

scwf commented Oct 27, 2014

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #473 has started for PR 2884 at commit 6174046.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #473 has finished for PR 2884 at commit 6174046.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@scwf
Copy link
Contributor Author

scwf commented Oct 27, 2014

retest this again, seems Jenkins get something wrong and failed in JoinSuite. locally it passed.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #475 has started for PR 2884 at commit 6174046.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

QA tests have started for PR 2884 at commit 6174046.

  • This patch does not merge cleanly.

@scwf
Copy link
Contributor Author

scwf commented Oct 27, 2014

CliSuite failed due to Futures timed out, @liancheng any idea about this? seems #2823 does not fix the problem.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

QA tests have finished for PR 2884 at commit 6174046.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #475 has finished for PR 2884 at commit 6174046.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Minor comment: In the future please put SPARK-XXXX in all capitals in the title so that our merge scripts recognize it. Thanks!

Thanks for working on this! Merged to master.

@asfgit asfgit closed this in 89af6df Oct 28, 2014
@scwf scwf deleted the fixColumnIds branch January 7, 2015 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants