-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] enable saving Booster with saveRDS() and loading it with readRDS() (fixes #4296) #4685
Merged
Merged
Changes from 10 commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
4c177a5
idiomatic serialization
david-cortes ada260b
linter
david-cortes 0d551a0
linter, namespace
david-cortes 088eef3
comments, linter, fix failing test
david-cortes 5e2a922
standardize error messages for null handles
david-cortes 5c1d260
auto-restore handle in more functions
david-cortes 8ed14a6
linter
david-cortes 840de5e
missing declaration
david-cortes af16b2d
correct wrong signature
david-cortes 9d7e6f8
fix docs
david-cortes 4428a23
Update R-package/R/lgb.train.R
david-cortes 730f2e6
Update R-package/R/lgb.drop_serialized.R
david-cortes 719af93
Update R-package/R/lgb.restore_handle.R
david-cortes 41a75bd
Update R-package/R/lgb.restore_handle.R
david-cortes 9b5de4d
Update R-package/R/lgb.make_serializable.R
david-cortes 1f4aa91
move 'restore_handle' from feature importance to dump method
david-cortes 84af4e7
missing header
david-cortes 25557f7
move arguments order, update docs
david-cortes ff78dd2
linter
david-cortes 19f3c4a
avoid leaving files in working directory
david-cortes 2f3a334
add test for save_model=NULL
david-cortes 6e7b852
missing comma
david-cortes 617b226
Update R-package/R/lgb.restore_handle.R
david-cortes 8a078f4
Update R-package/src/lightgbm_R.cpp
david-cortes 8e194af
change name of error function
david-cortes d4c8ef1
update comment
david-cortes 44ca8db
restore old serialization functions but set as deprecated
david-cortes d6f4c74
Update R-package/R/readRDS.lgb.Booster.R
david-cortes 8d282e4
Update R-package/R/saveRDS.lgb.Booster.R
david-cortes 0817eb0
update docs
david-cortes f845554
Update R-package/R/readRDS.lgb.Booster.R
david-cortes 51fa088
Update R-package/R/saveRDS.lgb.Booster.R
david-cortes 8522ce7
Update R-package/tests/testthat/test_basic.R
david-cortes c116270
Update R-package/R/readRDS.lgb.Booster.R
david-cortes bee5bc1
comments
david-cortes b0f9f93
fix variable name
david-cortes 2d3a132
restore serialization test for linear models
david-cortes c534952
Update R-package/R/lightgbm.R
david-cortes 58fd21f
Merge branch 'master' into serial
david-cortes b1b4e2b
update docs
david-cortes eb7fd32
fix issues with null terminator
david-cortes 34707ae
solve conflicts
david-cortes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#' @name lgb.drop_serialized | ||
#' @title Drop serialized raw bytes in a LightGBM model object | ||
#' @description If a LightGBM model object was produced with argument `serializable=TRUE`, the R object will keep | ||
#' a copy of the underlying C++ object as raw bytes, which can be used to reconstruct such object after getting | ||
#' serialized and de-serialized, but at the cost of extra memory usage. If these raw bytes are not needed anymore, | ||
#' they can be dropped through this function in order to save memory. Note that the object will be modified in-place. | ||
#' @param model \code{lgb.Booster} object which was produced with `serializable=TRUE`. | ||
#' | ||
#' @return \code{lgb.Booster} (the same `model` object that was passed as input, as invisible). | ||
#' @seealso \link{lgb.restore_handle}, \link{lgb.make_serializable}. | ||
#' @export | ||
lgb.drop_serialized <- function(model) { | ||
stopifnot(lgb.is.Booster(model)) | ||
david-cortes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
model$drop_raw() | ||
return(invisible(model)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#' @name lgb.make_serializable | ||
#' @title Make a LightGBM object serializable by keeping raw bytes | ||
#' @description If a LightGBM model object was produced with argument `serializable=FALSE`, the R object will not | ||
#' be serializable (e.g. cannot save and load with \code{saveRDS} and \code{readRDS}) as it will lack the raw bytes | ||
#' needed to reconstruct its underlying C++ object. This function can be used to forcibly produce those serialized | ||
#' raw bytes and make the object serializable. Note that the object will be modified in-place. | ||
#' @param model \code{lgb.Booster} object which was produced with `serializable=FALSE`. | ||
#' | ||
#' @return \code{lgb.Booster} (the same `model` object that was passed as input, as invisible). | ||
#' @seealso \link{lgb.restore_handle}, \link{lgb.drop_serialized}. | ||
#' @export | ||
lgb.make_serializable <- function(model) { | ||
stopifnot(lgb.is.Booster(model)) | ||
david-cortes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
model$save_raw() | ||
return(invisible(model)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#' @name lgb.restore_handle | ||
jameslamb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#' @title Restore the C++ component of a deserialized LGB model | ||
david-cortes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#' @description After a LightGBM model object is de-serialized through functions such as \code{save} or | ||
#' \code{saveRDS}, its underlying C++ object will be blank and needs to be restored to able to use it. Such | ||
#' object is restored automatically when calling functions such as \code{predict}, but this function can be | ||
#' used to forcibly restore it beforehand. Note that the object will be modified in-place. | ||
#' @param model \code{lgb.Booster} object which was de-serialized and whose underlying C++ object and R handle | ||
#' need to be restored. | ||
#' | ||
#' @return \code{lgb.Booster} (the same `model` object that was passed as input, as invisible). | ||
david-cortes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#' @seealso \link{lgb.make_serializable}, \link{lgb.drop_serialized}. | ||
#' @examples | ||
#' library(lightgbm) | ||
#' data("agaricus.train") | ||
#' model <- lightgbm( | ||
#' agaricus.train$data | ||
#' , agaricus.train$label | ||
#' , params = list(objective = "binary", nthreads = 1L) | ||
#' , nrounds = 5L | ||
#' , verbose = 0) | ||
#' fname <- tempfile(fileext="rds") | ||
#' saveRDS(model, fname) | ||
#' | ||
#' model_new <- readRDS(fname) | ||
#' model_new$check_null_handle() | ||
#' lgb.restore_handle(model_new) | ||
#' model_new$check_null_handle() | ||
#' @export | ||
lgb.restore_handle <- function(model) { | ||
stopifnot(lgb.is.Booster(model)) | ||
david-cortes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
model$restore_handle() | ||
return(invisible(model)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved inside
Booster$predict()
?That way, it'll be guaranteed to run regardless of whether someone uses
predict(bst, data)
orbst$predict()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's placed earlier it can throw an error before doing any other long operations with the data. I also assume that since the R6 methods are not documented, they are meant for internal usage, and a user trying to call them directly would likely need to examine the code in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only code between this call and
Booster$predict()
is conversion of...
to a list and possibly raising a deprecation warning.I'd prefer to concentrate these
$restore_handle()
calls in theBooster
object as much as possible, to minimize how many places in the package's code need to know about managing the raw model object.The fact that those methods are not documented is a gap that should be filled (not in this PR, please).
However, all of the
Booster
's public methods exceptinitialize()
are treated as part of the public API of the R package. We treat them that way because other exported functions can returnBooster
instance. For example,lgb.train()
returns aBooster
instance, and then user code can call any public methods on that instance without needing to use:::
or reach into$.__enclos_env__$private
.