Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Add a dummy sample to infer output shape. #6645

Merged
merged 5 commits into from
Jan 30, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jan 27, 2021

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

  • Infer output shape based on local prediction.
  • Remove set param in predict function as it's not thread safe nor necessary as
    we now let dask to decide the parallelism.
  • Simplify prediction on DaskDMatrix.
  • Remove unnecessary serialization.

A small part extracted from #6638 with added test and remove redundant serialization.

@trivialfis
Copy link
Member Author

This PR also fixes a performance issue in predict function that the booster might got serialized multiple times in map_blocks/partitions.

@trivialfis
Copy link
Member Author

This PR should significantly improve the performance for prediction.

@pseudotensor

@trivialfis
Copy link
Member Author

#6648 needs to be merged first.

@codecov-io
Copy link

codecov-io commented Jan 28, 2021

Codecov Report

Merging #6645 (80a8b7b) into master (c3c8e66) will increase coverage by 0.10%.
The diff coverage is 93.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6645      +/-   ##
==========================================
+ Coverage   81.01%   81.12%   +0.10%     
==========================================
  Files          13       13              
  Lines        3703     3703              
==========================================
+ Hits         3000     3004       +4     
+ Misses        703      699       -4     
Impacted Files Coverage Δ
python-package/xgboost/dask.py 82.62% <93.93%> (+0.14%) ⬆️
python-package/xgboost/tracker.py 95.11% <0.00%> (+1.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3c8e66...80a8b7b. Read the comment docs.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
@trivialfis trivialfis merged commit d8ec7aa into dmlc:master Jan 30, 2021
@trivialfis trivialfis deleted the dask-shape branch January 30, 2021 10:55
@trivialfis trivialfis mentioned this pull request Feb 8, 2021
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants