-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] deprecate the use of 'info' in Dataset #4573
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm qualified enough to provide a thoughtful review, but I have a few comments/questions below:
# Store as param | ||
params[[key]] <- additional_params[[key]] | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like additional params like min_data
are lost here before 4.0.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! I don't think so, but it's not obvious at all what I did, so let me share some links.
First, Dataset
in the R package is not exported, so its constructor can be considered "private" in the package's API. That isn't true for any other public methods on this class, since they can be called by user code once a function like lgb.Dataset()
returns a Dataset
. So as long as all the exported methods / functions maintain backwards-compatibility, no user code will be broken.
Since this PR adds each of the elements of INFO_KEYS
as keyword arguments in Dataset$new()
, the only thing that could be left in ...
is information you want to include in params.
As of this PR, exported entrypoints that create a Dataset
will handle merging params
and ...
before calling Dataset$new()
.
Dataset$create_valid()
LightGBM/R-package/R/lgb.Dataset.R
Lines 136 to 156 in c8212d8
params <- modifyList(additional_params, params) | |
# the Dataset's existing parameters should be overwritten by any passed in to this call | |
params <- modifyList(private$params, params) | |
# Create new dataset | |
ret <- Dataset$new( | |
data = data | |
, params = params | |
, reference = self | |
, colnames = private$colnames | |
, categorical_feature = private$categorical_feature | |
, predictor = private$predictor | |
, free_raw_data = private$free_raw_data | |
, used_indices = NULL | |
, info = info | |
, label = label | |
, weight = weight | |
, group = group | |
, init_score = init_score | |
) |
lgb.Dataset()
LightGBM/R-package/R/lgb.Dataset.R
Lines 798 to 826 in c8212d8
additional_params <- list(...) | |
params <- modifyList(params, additional_params) | |
if (length(additional_params) > 0L) { | |
warning(paste0( | |
"lgb.Dataset: Found the following passed through '...': " | |
, paste(names(additional_params), collapse = ", ") | |
, ". These will be used, but in future releases of lightgbm, this warning will become an error. " | |
, "Add these to 'params' instead. See ?lgb.Dataset for documentation on how to call this function." | |
)) | |
} | |
# Create new dataset | |
return( | |
invisible(Dataset$new( | |
data = data | |
, params = params | |
, reference = reference | |
, colnames = colnames | |
, categorical_feature = categorical_feature | |
, predictor = NULL | |
, free_raw_data = free_raw_data | |
, used_indices = NULL | |
, info = info | |
, label = label | |
, weight = weight | |
, group = group | |
, init_score = init_score | |
)) |
=====
this made me realize that one place, Dataset$slice()
, is missing the "combine params before calling Dataset$new()
logic. I'll open a separate PR for that.
LightGBM/R-package/R/lgb.Dataset.R
Lines 501 to 527 in 32445ab
slice = function(idxset, ...) { | |
additional_params <- list(...) | |
if (length(additional_params) > 0L) { | |
warning(paste0( | |
"Dataset$slice(): Found the following passed through '...': " | |
, paste(names(additional_params), collapse = ", ") | |
, ". These are ignored and should be removed. " | |
, "In future releases of lightgbm, this warning will become an error." | |
)) | |
} | |
# Perform slicing | |
return( | |
Dataset$new( | |
data = NULL | |
, params = private$params | |
, reference = self | |
, colnames = private$colnames | |
, categorical_feature = private$categorical_feature | |
, predictor = private$predictor | |
, free_raw_data = private$free_raw_data | |
, used_indices = sort(idxset, decreasing = FALSE) | |
, info = NULL | |
, ... | |
) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of this PR, exported entrypoints that create a
Dataset
will handle mergingparams
and...
before callingDataset$new()
.
Ah, here is the key point! Thanks a lot for as always detailed explanation!
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any other comments, thanks!
R-package/R/lgb.Dataset.R
Outdated
params <- modifyList(additional_params, params) | ||
|
||
# the Dataset's existing parameters should be overwritten by any passed in to this call | ||
params <- modifyList(private$get_params(), params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, this should have been self$get_params()
, not private
. Will fix in a local commit because I also might as well merge in latest master
if this PR is going to get another CI run anyway.
Thanks very much as always for the reviews, @StrikerRUS ! I'm confident in merging this without another reviewer's approval since it's an important step towards the next release, and since:
|
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Addresses the rest of #4543 (changes not covered by #4571), as part of #4310.
info
inlgb.Dataset()
group
,weight
,init_score
, andlabel
tolgb.Dataset()
Notes for Reviewers
While working through this, I realized that today the only way to pass new parameters through to
Dataset
when usinglgb.Dataset.create.valid()
is to pass them through...
. So this PR also proposes adding a new keywordparams
to that function (and the corresponding public methodDataset$create_valid()
), so that...
can be eliminated in release 4.0.0 while still allowing customization of parameters withinlgb.Dataset.create.valid()
calls.No existing unit tests need to be changed in this PR, since there are not any that initialize a list
info
and pass it in. All of the tests and demo examples in this project are already passingweight
,group
,label
, andinit_score
directly as keyword arguments.