Skip to content

Conversation

@koertkuipers
Copy link
Contributor

@koertkuipers koertkuipers commented Jun 16, 2015

What changes were proposed in this pull request?

Consistently expose Configuration/JobConf for all methods that use Hadoop input/output formats, which facilitates re-use and discourages many additional parameters (that end up changing the Configuration/JobConf internally).

How was this patch tested?

New tests in SparkContextSuite that check that the resulting HadoopRDD/NewHadoopRDD indeed has the settings passed in using the Configuration/JobConf parameter.

@koertkuipers koertkuipers changed the title SPARK 8398 hadoop input/output format advanced control SPARK-8398 hadoop input/output format advanced control Jun 16, 2015
@squito
Copy link
Contributor

squito commented Jun 17, 2015

Jenkins, this is OK to test

@squito
Copy link
Contributor

squito commented Jun 17, 2015

I think we'll also want to add these to JavaSparkContext and the python api (though I am not entirely certain how to do that off the top of my head.)
It would also be nice to have some really simple regression tests that those confs actually get used.

@koertkuipers
Copy link
Contributor Author

ok i will look into JavaSparkContext and a few simple regression tests.
will probably need some help with python.

On Wed, Jun 17, 2015 at 12:34 AM, Imran Rashid notifications@github.com
wrote:

I think we'll also want to add these to JavaSparkContext and the python
api (though I am not entirely certain how to do that off the top of my
head.)
It would also be nice to have some really simple regression tests that
those confs actually get used.


Reply to this email directly or view it on GitHub
#6848 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove space around the {}

@andrewor14
Copy link
Contributor

add to whitelist

@andrewor14
Copy link
Contributor

The changes here look fine. @JoshRosen do we have to worry about breaking binary compatibility in some ways here? Even though we provide a default value to the last parameter we're technically adding a new parameters to a several public APIs here.

@JoshRosen
Copy link
Contributor

@andrewor14 Adding a new parameter with a default value will break binary compatibility from a Java point-of-view, as far as I know.

@JoshRosen
Copy link
Contributor

MiMa should tell us, though.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35167 has finished for PR 6848 at commit 135b96e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35275 has finished for PR 6848 at commit 425a578.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35293 has finished for PR 6848 at commit 2bfa320.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35323 has finished for PR 6848 at commit 2122160.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35330 has finished for PR 6848 at commit df2c2ae.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@koertkuipers
Copy link
Contributor Author

i see MiMa failed. i will try to produce a version that is binary compatible.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35369 has finished for PR 6848 at commit e2f7023.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * converters, but then we couldn't have an object for every subclass of Writable (you can't

@andrewor14
Copy link
Contributor

retest this please. MiMa tests have been a little flaky recently.

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35471 has finished for PR 6848 at commit e2f7023.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * converters, but then we couldn't have an object for every subclass of Writable (you can't

@SparkQA
Copy link

SparkQA commented Oct 2, 2015

Test build #43170 has finished for PR 6848 at commit 7ca662c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * converters, but then we couldn't have an object for every subclass of Writable (you can't

@SparkQA
Copy link

SparkQA commented Oct 18, 2015

Test build #43898 has finished for PR 6848 at commit 470b3d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * converters, but then we couldn't have an object for every subclass of Writable (you can't

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45653 has finished for PR 6848 at commit 208c019.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * converters, but then we couldn't have an object for every subclass of Writable (you can't\n

@holdenk
Copy link
Contributor

holdenk commented Apr 19, 2016

Would we want to maybe consider this for Spark 2.0? It seems like if were maybe going to be adding new default params to functions this might be the time to do it (of course only if people have the bandwidth to update & also review)?

It also seems like some unrelated R changes might have accidentally gotten mixed in during one of the merges that should be reverted if we want to move forward with this.

@koertkuipers
Copy link
Contributor Author

i am happy to update this, if there is any interest. or otherwise i will
close.

On Mon, Apr 18, 2016 at 9:31 PM, Holden Karau notifications@github.com
wrote:

Would we want to maybe consider this for Spark 2.0? It seems like if were
maybe going to be adding new default params to functions this might be the
time to do it (of course only if people have the bandwidth to update & also
review)?

It also seems like some unrelated R changes might have accidentally gotten
mixed in during one of the merges that should be reverted if we want to
move forward with this.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#6848 (comment)

@ScrapCodes
Copy link
Member

IMO, this is useful in one way that hadoop configuration need not be a global state. We can have a default set of configuration that we use everywhere as a default. And then in every hadoop related method a user has an alternative to override the default.

Binary compatibility will definitely be broken, but source compatibility might not be affected i.e. one might need to recompile the project with newer spark version. As it is asked already, it should be okay for 2.0 ?

@andrewor14 ping !

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56367 has finished for PR 6848 at commit 0ea4e5c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56370 has finished for PR 6848 at commit 60c34e1.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@koertkuipers
Copy link
Contributor Author

Jenkins, retest this please.

On Wed, Apr 20, 2016 at 1:26 PM, UCB AMPLab notifications@github.com
wrote:

Build finished. Test FAILed.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#6848 (comment)

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56392 has finished for PR 6848 at commit c06548d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Timer(val iteration: Int)
    • case class Case(name: String, fn: Timer => Unit)
    • class ContinuousQuery(object):
    • class Trigger(object):
    • class ProcessingTime(Trigger):
    • trait ScalaReflection
    • case class FilePartition(index: Int, files: Seq[PartitionedFile]) extends RDDPartition
    • abstract class OutputWriterFactory extends Serializable
    • abstract class OutputWriter
    • case class HadoopFsRelation(
    • trait FileFormat
    • case class Partition(values: InternalRow, files: Seq[FileStatus])
    • trait FileCatalog
    • class HDFSFileCatalog(
    • case class FakeFileStatus(

@koertkuipers
Copy link
Contributor Author

ok i updated this for spark 2. the unit test failures seem unrelated

@ScrapCodes
Copy link
Member

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56527 has finished for PR 6848 at commit c06548d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Timer(val iteration: Int)
    • case class Case(name: String, fn: Timer => Unit)
    • class ContinuousQuery(object):
    • class Trigger(object):
    • class ProcessingTime(Trigger):
    • trait ScalaReflection
    • case class FilePartition(index: Int, files: Seq[PartitionedFile]) extends RDDPartition
    • abstract class OutputWriterFactory extends Serializable
    • abstract class OutputWriter
    • case class HadoopFsRelation(
    • trait FileFormat
    • case class Partition(values: InternalRow, files: Seq[FileStatus])
    • trait FileCatalog
    • class HDFSFileCatalog(
    • case class FakeFileStatus(

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56599 has finished for PR 6848 at commit 34f97d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Apr 22, 2016

@koertkuipers now days we try and provide a description for our pull request (sometimes it can be copied from the JIRA) for the eventual commit message - it might be good to add that?
Also CC @JoshRosen & @squito to maybe take a look at the updated PR :)

@koertkuipers koertkuipers changed the title SPARK-8398 hadoop input/output format advanced control [SPARK-8398][CORE] Hadoop input/output format advanced control Apr 22, 2016
@koertkuipers
Copy link
Contributor Author

koertkuipers commented Apr 23, 2016

@holdenk ok i tried to make it look all up to latest standards for pullreqs

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58363 has finished for PR 6848 at commit 34f97d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #60306 has finished for PR 6848 at commit 34f97d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Dec 7, 2016

I'm going to close this for now.

@asfgit asfgit closed this in 08d6441 Dec 7, 2016
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants