Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new predict function for R. #6819

Merged
merged 30 commits into from
Jun 11, 2021
Merged

Use new predict function for R. #6819

merged 30 commits into from
Jun 11, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Mar 31, 2021

This is an early PR so I can get some suggestions from R experts. The primary change in R interface is the use of iterationrange and deprecation of ntreelimit.

TODOs:

  • Remove the use of ntreelimit in internal code base.
  • Document its deprecation.
  • Figure out how to utilize the shape returned by the new predict function.
  • Add new tests for both iterationrange and strict_shape.
  • Handle 1-based indexing for best_iteration.

@trivialfis trivialfis marked this pull request as draft March 31, 2021 20:56
@codecov-commenter
Copy link

codecov-commenter commented Jun 7, 2021

Codecov Report

Merging #6819 (48d5105) into master (7beb2f7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6819      +/-   ##
==========================================
- Coverage   81.72%   81.71%   -0.01%     
==========================================
  Files          13       13              
  Lines        3917     3916       -1     
==========================================
- Hits         3201     3200       -1     
  Misses        716      716              
Impacted Files Coverage Δ
python-package/xgboost/core.py 82.83% <100.00%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7beb2f7...48d5105. Read the comment docs.

@trivialfis trivialfis changed the title [WIP] Use new predict function for R. Use new predict function for R. Jun 7, 2021
@trivialfis trivialfis marked this pull request as ready for review June 7, 2021 16:32
Comment on lines 181 to 184
#' @param iterationrange Specifies which layer of trees are used in prediction. For example, if a
#' random forest is trained with 100 rounds. Specifying `iteration_range=(0,
#' 20)`, then only the forests built during [0, 20) (half open set) rounds are
#' used in this prediction. It's 0 based index (unlike R vector).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use 0-based index here?

@jameslamb What's the convention for R packages that wrap native code? Do users expect to use 1-based indexing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a note that iteration_range=(0,0) indicates the use of all trees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm in favor of parameters like this being 1-based and having the documentation clearly indicate that. I think that's more friendly for R users.

But, to be fair, in {lightgbm} we have not been very consistent about this. Keyword arguments in the package's interface expect 1-based values, but {lightgbm} won't look inside parameters passed through a list params and subtract 1 from any parameters that are indices.

So I think that there is not really a "right" answer to this, and that it's more important to:

  1. be consistent (as much as possible)
  2. over-communicate in the documentation (always say whether it is 1-based or 0-based)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to use 1 based index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use 1-based index. Thanks for the suggestions!

R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
@trivialfis trivialfis requested a review from hcho3 June 9, 2021 17:23
@trivialfis trivialfis merged commit b56614e into dmlc:master Jun 11, 2021
@trivialfis trivialfis deleted the R-predict branch June 11, 2021 05:03
@Kodiologist
Copy link
Contributor

I recently found a segfault in version 1.4.1.1 of the R package when providing ntreelimit greater than the number of trees in the model, but I'm guessing the bug no longer exists in XGBoost master, due to this PR.

@trivialfis
Copy link
Member Author

@Kodiologist Did you test it? I think I have error tests in Python with model slicing, but not 100 percent sure about R error handling.

@Kodiologist
Copy link
Contributor

No, not with anything newer than 1.4.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants