-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] fix segfaults caused by missing Booster and Dataset handles (fixes #4208) #4586
Conversation
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1b63d7a7e8194345af9ede7ebb6b95ae |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: failure ❌. |
valgrind just timed out after 3 hours, trying again 😫 https://github.com/microsoft/LightGBM/runs/3500789328?check_suite_focus=true |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
I see this in one place in the test logs for the valgrind job: https://github.com/microsoft/LightGBM/runs/3501825714?check_suite_focus=true
Hard to tell exactly which test, but I see
Interestingly, I don't see that same error in the R 4.0 Windows MSVC job. https://github.com/microsoft/LightGBM/pull/4586/checks?check_run_id=3500712973 And I'm also curious why that is showing up in the logs but the Will have to investigate it tomorrow. |
Ah ok, figured it out! The error in the logs of #4586 (comment) is not an issue. It's coming from this new test added in this PR: test_that("Booster$new() using a Dataset with a null handle should raise an informative error and not segfault", {
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
dtrain$construct()
tmp_file <- tempfile(fileext = ".bin")
saveRDS(dtrain, tmp_file)
rm(dtrain)
dtrain <- readRDS(tmp_file)
expect_error({
bst <- Booster$new(train_set = dtrain)
}, regexp = "lgb.Booster: cannot create Booster handle")
}) That "error" is intentionally being caught in this try-catch: LightGBM/R-package/R/lgb.Booster.R Line 33 in b7120d2
LightGBM/R-package/R/lgb.Booster.R Line 113 in b7120d2
So when users actually run code similar to that test, they'll get the error "cannot create Booster handle" but then also see the printed text about "Attempting to create a Dataset...".
This also is not a problem. It's probably because of how we have to redirect stderr in Windows CI jobs. LightGBM/.ci/test_r_package_windows.ps1 Line 23 in b7120d2
|
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: failure ❌. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1883c3e26f194d248848dc03add599c4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has a lot of potential merge conflicts with #4597 PR. Which one would you like to merge first?
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
I'll wait to update this until #4613 is also merged. |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-10422aa86126471181da15ab410b3506 |
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1aecf804abfa4417a2b4e10e1f7275f4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #4208, as part of #4310.
Currently, the R package can segfault if Booster or Dataset methods are called on an object where the handle to the C++ side is null. This situation can happen if, for example, someone uses
saveRDS()
on one of these objects and then loads it withreadRDS()
. That is exactly what happens when restarting an R session, if you have R configured to save your workspace and reload it automatically.This PR proposes changes to ensure that an informative error is raised in such situations, instead of a segfault.
Notes for Reviewers
#4296 documents the desire for
{lightgbm}
to guarantee that a Booster / Dataset saved withsaveRDS()
is usable after loading it withreadRDS()
. That might be addressed in a future release, but it's out of scope for release 3.3.0 and this PR. At least as of this PR, users will experience informative errors instead of their R sessions crashing.I was not able to trigger the
_AssertDatasetHandleNotNull()
error message in tests, since all paths user-facing paths through theDataset
object either already have protective checks usinglgb.is.null.handle()
or raise other errors before even trying to invoke C++ code. But I still think those asserts are worth adding, to catch errors that might be introduced in future refactorings of the R package or in other code paths that are missed in tests.