Bug fixes and auto testing of Redshift data source #36

kai-zeng · 2015-07-31T22:24:13Z

Major changes:
(1) Configuring aws access key pairs: The previous implementation is not consistent on the aws access key pairs used in unloading and reading data. This PR fixes this bug.
Now the aws access key pair is configured in this priority order:
<1> If the 'tempdir' option encodes the aws access key pair. That key pair is used.
<2> Otherwise, we use the aws key pair specified in the hadoop configuration of the spark context.

(2) Automating the test. We set up a redshift server for this test, and encode the security information using Travis.

(3) Auto convert column names to lower cases as Redshift COPY only supports lowercase column names.

(4) Other miscellaneous fixes.

kai-zeng · 2015-07-31T22:26:06Z

cc @marmbrus @yhuai

JoshRosen · 2015-08-18T00:54:52Z

I'm going to be taking over this PR. As it stands now, I think that this PR is too big because it contains both bugfixes and general test coverage improvements. I'm going to see about splitting these into a series of smaller incremental PRs.

JoshRosen · 2015-08-18T01:00:13Z

src/test/scala/com/databricks/spark/redshift/RedshiftSuite.scala

If this is a new file with original code then it should have the DB copyright.

Actually, I see now that most of this suite is based on / copied from RedshiftSourceSuite. It would have been good to leave a comment explaining this (maybe instead of the uninformative comment that's currently at the top of this suite). It would have also been nice to factor out some of the code into common helper classes (e.g. not repeating expectedData in both suites).

JoshRosen · 2015-08-18T01:24:01Z

(1) Configuring aws access key pairs: The previous implementation is not consistent on the aws access key pairs used in unloading and reading data. This PR fixes this bug.
Now the aws access key pair is configured in this priority order:
<1> If the 'tempdir' option encodes the aws access key pair. That key pair is used.
<2> Otherwise, we use the aws key pair specified in the hadoop configuration of the spark context.

Would have been good to compare / contrast this with the fix proposed by @koeninger in #32.

JoshRosen · 2015-08-18T01:26:03Z

src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala

If you're going to reformat this anyways then you might as well replace this with a .getOrElse.

JoshRosen · 2015-08-18T01:36:21Z

This PR is really difficult to review / understand in its current state, so I'm going to open a new WIP PR where I try to gradually revert some of the unnecessary changes in order to figure out which fixes were actually necessary.

JoshRosen · 2015-08-18T01:55:40Z

src/test/scala/com/databricks/spark/redshift/ParametersSuite.scala

Is there a more specific exception that we could expect?

JoshRosen · 2015-08-18T18:59:10Z

Closing this in favor of #41.

JoshRosen · 2015-08-18T19:21:59Z

src/test/scala/com/databricks/spark/redshift/RedshiftSourceSuite.scala

Nit: I don't like should matchers. Also, it would be nice to use something similar to Spark SQL's checkAnswer so that you get useful debugging output for failures.

kai added 4 commits July 30, 2015 20:25

initial work on fixing s3 credentials and hadoop client version

45f9223

fix quote escaping and test cases

a904e91

update travis encrypted environment variables

5c52a0d

auto convert columns to lower case for Redshift COPY

c959337

kai added 3 commits July 31, 2015 15:34

add spark snapshot repository

03c5258

fix spark snapshot repo

bd99bea

fix spark snapshot repo

b4ae9a8

JoshRosen reviewed Aug 18, 2015
View reviewed changes

src/test/scala/com/databricks/spark/redshift/ParametersSuite.scala

Copy link

Contributor

JoshRosen Aug 18, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more specific exception that we could expect?

JoshRosen closed this Aug 18, 2015

JoshRosen reviewed Aug 18, 2015
View reviewed changes

JoshRosen deleted the redshift-test branch August 20, 2015 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fixes and auto testing of Redshift data source #36

Bug fixes and auto testing of Redshift data source #36

Uh oh!

kai-zeng commented Jul 31, 2015

Uh oh!

kai-zeng commented Jul 31, 2015

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Uh oh!

JoshRosen Aug 19, 2015

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug fixes and auto testing of Redshift data source #36

Bug fixes and auto testing of Redshift data source #36

Uh oh!

Conversation

kai-zeng commented Jul 31, 2015

Uh oh!

kai-zeng commented Jul 31, 2015

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen Aug 19, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Aug 18, 2015

Uh oh!

JoshRosen Aug 18, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants