Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

We should only have one centre place to try catch the exception for corrupted files.

How was this patch tested?

existing test

@cloud-fan
Copy link
Contributor Author

cc @sameeragarwal @viirya

}
} catch {
// Throw FileNotFoundException even `ignoreCorruptFiles` is true
case e: java.io.FileNotFoundException => throw e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: FileNotFoundException will be thrown anyway, do wee need this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileNotFoundException extends IOException

@viirya
Copy link
Member

viirya commented Mar 11, 2017

LGTM with minor comments.

start: Long,
length: Long,
locations: Array[String] = Array.empty) {
@transient locations: Array[String] = Array.empty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to mark it as transient? filePartitions: Seq[FilePartition]) is already transient in FileScanRDD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not for FileScanRDD.filePartitions, this is for FilePartitions that sent by scheduler. The location is only useful during planning, we should not send it to executors.

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74367 has finished for PR 17253 at commit 05febbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow] = {
// TODO: Remove this default implementation when the other formats have been ported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more TODO here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't need to implement this method in all sub-classes. Some FileFormat may override buildReaderWithPartitionValues directly(parquet), Some FileFormat may not be used in read path(HiveFileFormat)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I am wondering why the buildReader method is now marked as protected? Maybe you can comment here: https://issues.apache.org/jira/browse/SPARK-27751

@viirya
Copy link
Member

viirya commented Mar 11, 2017

LGTM

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Throw FileNotFoundException even `ignoreCorruptFiles` is true
case e: java.io.FileNotFoundException => throw e
case e @ (_: RuntimeException | _: IOException) =>
logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better English: "Skipped the rest of the content in the corrupted file:"

try {
readFunction(currentFile)
} catch {
case e: java.io.FileNotFoundException =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import the class? same below

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74420 has finished for PR 17253 at commit ad64848.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 05887fc Mar 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants