-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for column projection to parquet sources #1056
Conversation
protected def copyWithColumnGlobs(columnGlobs: Set[ColumnProjectionGlob]): This | ||
} | ||
|
||
case class ColumnProjectionGlob(glob: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extends AnyVal should work here.
This seems fine to me, but I think we should include an example, even in the comments, of making a source where inside the job the user can pass a filter (so, companion object or constructor takes an optional filter argument). |
The reason I made this a trait not a constructor arg was so that you can use this feature even if the Source you want to add a filter predicate to doesn't have a constructor arg for the filter predicate. For example:
I can add an example of this. Do you think I should add constructor params as well? That can be done like this:
|
* you intend to use can also make your job significantly more efficient (parquet column projection | ||
* push-down will skip reading unused columns from disk). | ||
* The columns are specified in the format described here: | ||
* https://github.com/apache/incubator-parquet-mr/blob/master/parquet_cascading.md#21-projection-pushdown-with-thriftscrooge-records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc describes setting a key in the config. Is this how this works under the covers? What about multi-input (merges, joins)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is setup in sourceConfInit which handles this case
Add support for column projection to parquet sources
This is similar to #1050, but adds another method
.withColumns(...)
to parquet sources for specifying projection push down.