-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8398][CORE] Hadoop input/output format advanced control #6848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8398][CORE] Hadoop input/output format advanced control #6848
Conversation
|
Jenkins, this is OK to test |
|
I think we'll also want to add these to |
|
ok i will look into JavaSparkContext and a few simple regression tests. On Wed, Jun 17, 2015 at 12:34 AM, Imran Rashid notifications@github.com
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove space around the {}
|
add to whitelist |
|
The changes here look fine. @JoshRosen do we have to worry about breaking binary compatibility in some ways here? Even though we provide a default value to the last parameter we're technically adding a new parameters to a several public APIs here. |
|
@andrewor14 Adding a new parameter with a default value will break binary compatibility from a Java point-of-view, as far as I know. |
|
MiMa should tell us, though. |
|
Test build #35167 has finished for PR 6848 at commit
|
…p input/output formats in java api
|
Test build #35275 has finished for PR 6848 at commit
|
|
Test build #35293 has finished for PR 6848 at commit
|
|
Test build #35323 has finished for PR 6848 at commit
|
|
Test build #35330 has finished for PR 6848 at commit
|
|
i see MiMa failed. i will try to produce a version that is binary compatible. |
|
Test build #35369 has finished for PR 6848 at commit
|
|
retest this please. MiMa tests have been a little flaky recently. |
|
Test build #35471 has finished for PR 6848 at commit
|
|
Test build #43170 has finished for PR 6848 at commit
|
|
Test build #43898 has finished for PR 6848 at commit
|
|
Test build #45653 has finished for PR 6848 at commit
|
|
Would we want to maybe consider this for Spark 2.0? It seems like if were maybe going to be adding new default params to functions this might be the time to do it (of course only if people have the bandwidth to update & also review)? It also seems like some unrelated R changes might have accidentally gotten mixed in during one of the merges that should be reverted if we want to move forward with this. |
|
i am happy to update this, if there is any interest. or otherwise i will On Mon, Apr 18, 2016 at 9:31 PM, Holden Karau notifications@github.com
|
|
IMO, this is useful in one way that hadoop configuration need not be a global state. We can have a default set of configuration that we use everywhere as a default. And then in every hadoop related method a user has an alternative to override the default. Binary compatibility will definitely be broken, but source compatibility might not be affected i.e. one might need to recompile the project with newer spark version. As it is asked already, it should be okay for 2.0 ? @andrewor14 ping ! |
|
Test build #56367 has finished for PR 6848 at commit
|
|
Test build #56370 has finished for PR 6848 at commit
|
|
Jenkins, retest this please. On Wed, Apr 20, 2016 at 1:26 PM, UCB AMPLab notifications@github.com
|
|
Test build #56392 has finished for PR 6848 at commit
|
|
ok i updated this for spark 2. the unit test failures seem unrelated |
|
Jenkins, retest this please. |
|
Test build #56527 has finished for PR 6848 at commit
|
|
Test build #56599 has finished for PR 6848 at commit
|
|
@koertkuipers now days we try and provide a description for our pull request (sometimes it can be copied from the JIRA) for the eventual commit message - it might be good to add that? |
|
@holdenk ok i tried to make it look all up to latest standards for pullreqs |
|
Test build #58363 has finished for PR 6848 at commit
|
|
Test build #60306 has finished for PR 6848 at commit
|
|
I'm going to close this for now. |
Closes apache#14537. Closes apache#16181. Closes apache#8318. Closes apache#6848. Closes apache#7265. Closes apache#9543.
What changes were proposed in this pull request?
Consistently expose Configuration/JobConf for all methods that use Hadoop input/output formats, which facilitates re-use and discourages many additional parameters (that end up changing the Configuration/JobConf internally).
How was this patch tested?
New tests in SparkContextSuite that check that the resulting HadoopRDD/NewHadoopRDD indeed has the settings passed in using the Configuration/JobConf parameter.