-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LGBM_DatasetCreateFromCSC does not allow thread control #4598
Comments
Thanks very much for reporting this. Can you please provide specific links / quotes for what you're referring to when you say "the CRAN policies"? |
The CRAN Repository Policy: Says the following:
Some of the configs they test on might even crash the R process if you attempt to do some multi-threaded computation with the maximum number of threads. The issue is BTW not just about the R package since this function is used by other interfaces too. |
None of the Dataset creation functions in LightGBM's C API have LBGM_DatasetCreate function signatures in C API (click me)LightGBM/include/LightGBM/c_api.h Lines 109 to 112 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 126 to 133 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 203 to 213 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 226 to 231 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 248 to 258 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 272 to 279 in b462d0a
LightGBM/include/LightGBM/c_api.h Lines 294 to 302 in b462d0a
The number of threads used in Dataset construction is user-controllable, via parameter As you can see in the links above, all of the In each of those methods, the For example, in Lines 1326 to 1329 in b462d0a
Given all that....I don't think it is accurate that "LGBM_DatasetCreateFromCSC does not allow thread control". Passing The following example demonstrates construction of a LightGBM # test.R
library(lightgbm)
library(Matrix)
X <- matrix(
data = rnorm(1e8)
, nrow = 1e5
)
Xcsc <- as(X, "CsparseMatrix")
num_threads <- as.integer(Sys.getenv("EXAMPLE_NTHREAD"))
start_time <- Sys.time()
dtrain <- lightgbm::lgb.Dataset(
data = Xcsc
, params = list(num_threads = num_threads)
)
dtrain$construct()
print(sprintf("--- construction (num_threads=%d) ---", num_threads))
print(Sys.time() - start_time) On my Mac, I ran this script 3 times each with EXAMPLE_NTHREAD=1 Rscript --vanilla ./test.R
# [1] "--- construction (num_threads=1) ---"
# Time difference of 17.88016 secs
# Time difference of 17.72601 secs
# Time difference of 17.60532 secs
EXAMPLE_NTHREAD=2 Rscript --vanilla ./test.R
# [1] "--- construction (num_threads=2) ---"
# Time difference of 10.77466 secs
# Time difference of 10.12384 secs
# Time difference of 10.49387 secs
EXAMPLE_NTHREAD=4 Rscript --vanilla ./test.R
# [1] "--- construction (num_threads=4) ---"
# Time difference of 7.150842 secs
# Time difference of 7.223807 secs
# Time difference of 7.122708 secs Therefore, I think that the proposal in this issue (add an |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
The C api has a function named
LGBM_DatasetCreateFromCSC
which runs a parallel openmp block with the default number of threads, which will typically be the maximum number of available threads.I think it'd be better and more transparent if the number of threads were user-controllable, allowing a parameter
int nthreads
in the function signature instead, just like it does for other functions within the package.This function is also problematic for the R package as it gets called by
LGBM_DatasetCreateFromCSC_R
, which is used in one of the R tests (currently skipped because it fails some checks). The CRAN policies state that examples and tests are not supposed to use more than 1 or 2 threads, and this thus prevents the R test from becoming enabled once the other issues it raises become solved.The text was updated successfully, but these errors were encountered: