Skip to content

Conversation

@liancheng
Copy link
Contributor

What changes were proposed in this pull request?

Various data sources share approximately the same code for partition value appending. This PR adds a utility method to eliminate this duplication.

How was this patch tested?

Existing tests should do the work.

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54431 has finished for PR 12033 at commit 01fbf17.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // check the type in next() and we get a class cast exception. If we make that function

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54436 has finished for PR 12033 at commit 60a0ef0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

rows.map { row =>
appendPartitionColumns(joinedRow(row, file.partitionValues))
}
FileFormat.appendPartitionValues(rows, fullSchema, file.partitionValues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will send the reader function to executors, and run it many times for all the files in that RDD partition. However, the projection we used to append partition values can be only initialized once.

Should we extend the reader function to make it have an initialize method? e.g. we can create a trait ReaderFunction and define the initialize and execute API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We may also add a appendPartitionValues method in buildReader, so that partition value appending can be abstracted away using a default implementation. Only special data sources like Parquet need to override and take care of it.

@yhuai
Copy link
Contributor

yhuai commented May 2, 2016

want to quickly update this pr?

@liancheng
Copy link
Contributor Author

Closing this one in favor of #12866. @cloud-fan Your comment is also addressed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants