Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of the partitions #12

Open
JulienPeloton opened this issue Mar 13, 2018 · 0 comments
Open

Better handling of the partitions #12

JulienPeloton opened this issue Mar 13, 2018 · 0 comments
Assignees
Labels

Comments

@JulienPeloton
Copy link
Member

JulienPeloton commented Mar 13, 2018

This is linked to the issue #6, but somewhat different.
The way it is done now is that we have partition size roughly equal to HDFS block size (~128 MB). Ideally partition size should follow resource of the cluster (typically 2-3x the number of cores or executors in use).
I guess repartitioning at the very end would be very costly though... Need to investigate at a lower level then.

@JulienPeloton JulienPeloton added the enhancement New feature or request label Mar 13, 2018
@JulienPeloton JulienPeloton self-assigned this Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant