Skip to content

Support reading unloaded decimals#62

Closed
JoshRosen wants to merge 5 commits intomasterfrom
read-decimal
Closed

Support reading unloaded decimals#62
JoshRosen wants to merge 5 commits intomasterfrom
read-decimal

Conversation

@JoshRosen
Copy link
Contributor

This patch adds support for reading unloaded decimal columns. Previously, we would fail to parse the string into a decimal, leading to ClassCastExceptions.

@JoshRosen JoshRosen added this to the 0.5 milestone Aug 29, 2015
@JoshRosen
Copy link
Contributor Author

Note that there are still problems when loading decimals into Redshift; they appear to be truncated to longs. This will be fixed separately (see #61), but may be blocked by adding proper decimal support to spark-avro.

@JoshRosen
Copy link
Contributor Author

Note that there are two more things that I need to do before this is merge-worthy:

  • Fix / extend support for decimals in the schema-parsing code for the redshiftFile API.
  • Add tests for reading the largest / smallest possible decimal values and tests for varying precision and scale. I may be able to adapt some test cases from Spark's own decimal tests.

@JoshRosen
Copy link
Contributor Author

Oh no! It looks like stopping and starting new HiveContexts is somehow leaking PermGen space or something. I'm going to try enabling forking in tests and will bump up the permgen limit to see if that quickly fixes the problem.

@codecov-io
Copy link

Current coverage is 88.50%

Merging #62 into master will decrease coverage by -0.24% as of dee5998

@@            master    #62   diff @@
=====================================
  Files           10     10       
  Stmts          391    400     +9
  Branches        93     95     +2
  Methods          0      0       
=====================================
+ Hit            347    354     +7
  Partial          0      0       
- Missed          44     46     +2

Review entire Coverage Diff as of dee5998

Powered by Codecov. Updated on successful CI builds.

@JoshRosen
Copy link
Contributor Author

Only remaining task here is adding a comment and adding tests + support in redshiftFile.

@JoshRosen
Copy link
Contributor Author

I think that we're going to deprecate redshiftFile in 0.5.0 (and remove it in the next release if nobody complains; see #65), so I'm going to skip fixing the decimal support there. Given this, I'm going to merge this now in order to minimize merge conflicts with a bunch of other in-flight patches. I might loop back later to add additional tests and solicit post-hoc review.

@JoshRosen JoshRosen closed this in 4b47628 Sep 1, 2015
@JoshRosen JoshRosen deleted the read-decimal branch September 1, 2015 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants