Better handling of the partitions #12

JulienPeloton · 2018-03-13T21:46:37Z

This is linked to the issue #6, but somewhat different.
The way it is done now is that we have partition size roughly equal to HDFS block size (~128 MB). Ideally partition size should follow resource of the cluster (typically 2-3x the number of cores or executors in use).
I guess repartitioning at the very end would be very costly though... Need to investigate at a lower level then.

JulienPeloton added the enhancement New feature or request label Mar 13, 2018

JulienPeloton self-assigned this Mar 15, 2018

JulienPeloton added the BlockOrSplit label Mar 16, 2018

JulienPeloton added the Scala label Apr 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of the partitions #12

Better handling of the partitions #12

JulienPeloton commented Mar 13, 2018 •

edited

Loading

Better handling of the partitions #12

Better handling of the partitions #12

Comments

JulienPeloton commented Mar 13, 2018 • edited Loading

JulienPeloton commented Mar 13, 2018 •

edited

Loading