[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

jmoralez · 2022-07-11T23:51:16Z

Adds the option to set the environment variable LGBM_NUM_THREADS to override all other sources that set omp_num_threads. This allows to globally set the number of threads for all operations from LightGBM, which will be used to limit the number of threads to 2 for the R-package tests. This is meant only for internal use so it isn't documented but we could if someone thinks it would be useful.

Fixes #5102.

This reverts commit 20fd5e6.

jmoralez · 2022-07-19T23:38:40Z

@jameslamb trying the suggestion from #5102 (comment) I found that we needed to set the parameters in many more places, and for example predict doesn't take parameters so it involved more changes than expected. I took a different approach here but with the same idea, I've updated the description accordingly. Please let me know what you think.

jameslamb · 2022-07-20T03:21:21Z

@jmoralez ah ok, the point about predict() makes sense.

But to be honest, I don't understand how the new approach using Sys.setenv(LGBM_DEFAULT_NUM_THREADS = "2") at the top of testthat.R is working when Sys.setenv("OMP_NUM_THREADS" = 2) didn't (#5102 (comment)). I thought the reason Sys.setenv() didn't work for you before was because it didn't affect the environment for test cases, as a result of how {testthat} invokes your test code.

How exactly are you testing that this change restricts LightGBM to using 2 threads?

Seeing how much effort this is taking, and how many different places have to be modified, I think we might want to consider closing this and instead focusing on #4705, then using the solution to #4705 to satisfy #5102.

jmoralez · 2022-07-20T15:09:54Z

how the new approach using Sys.setenv(LGBM_DEFAULT_NUM_THREADS = "2") at the top of testthat.R is working when Sys.setenv("OMP_NUM_THREADS" = 2) didn't

I was relying on the OMP to check for the env variable, this time I'm manually reading it.

I thought the reason Sys.setenv() didn't work for you before was because it didn't affect the environment for test cases, as a result of how {testthat} invokes your test code.

It was due to how OMP works, it just reads the env variable at program startup, not at runtime.

How exactly are you testing that this change restricts LightGBM to using 2 threads?

Looking at the htop while running Rscript testthat.R and seeing that the CPU usage is below 200%. But that's just to make sure, these changes make it so that whenever we try to set the omp_num_threads we actually set the content of that env variable, so all other sources are ignored.

Seeing how much effort this is taking, and how many different places have to be modified, I think we might want to consider closing this and instead focusing on #4705, then using the solution to #4705 to satisfy #5102.

I think that #4705 has two possible solutions:

Do something like a context manager as suggested in Calling multithreaded functions sets global number of OMP threads #4705 (comment), where whenever we want to modify the number of threads we set the omp_num_threads to that number but afterwards we restore it to its original value.
Use a different variable to store the number of threads and pass it to the pragmas, that'd be something like what data.table does but it's hard because there are +200 parallel for.

In either case I think it wouldn't help with #5102 because for that we want to globally set the number of threads to a number or set an upper bound for them. And in each of them we could use this approach because if we do 1. we can leave the proposed code here as is and it would work, and in 2. we could modify this to set the default number of threads for lightgbm instead of OMP.

jameslamb · 2022-10-11T03:11:46Z

ok @jmoralez now that we got v3.3.3 out, I'm ready to come back and help move this forward.

I'm sorry, but I don't support expanding LightGBM's API this way, or adding another layer of complexity to how LightGBM chooses how many threads to use. We can say "this environment variable is internal only", but we can't actually enforce that, and this PR would add the first use of std::getenv() in the package which might slightly increase the risk of running into portability problems (e.g. that working differently on different operating systems, or in different locales).

I don't think that's worth it for the narrow purpose of "more closely follow CRAN's suggested guidelines, which they've never complained about before".

Given that CRAN has never rejected a submission for {lightgbm}'s use of more than 2 threads, I think it is ok to use an approach which possibly misses a few cases, in exchange for not adding a new entrypoint (e.g. the first LightGBM-specific environment variable) in the actual code in lib_lightgbm or the R package.

I recommend that we just do what #5102 initially proposed in its description, and follow the same approach that was used for verbosity in #4862.

Thank you again for investigating these other different options in this PR! But I think we should just move forward with something that is simple and known to work for this case, even though it feels inelegant and might miss a few cases.

jameslamb · 2022-10-12T02:22:42Z

@jmoralez I saw you closed this.

Are you planning to open another PR implementing the approach I suggested? If not, would you support one if I did?

Totally understand if you feel like you don't want to put any more effort into this particular issue.

jmoralez · 2022-10-12T14:27:39Z

I'd prefer to work on something else, I support it if you want to do it though. I'm not sure how CRAN measures the CPU usage of a package tests so as you say it probably doesn't have to be perfect. Because even if we set the threads for all trainings, every predict will use all of them and also some of the dataset constructions (#4598, #5153).

jameslamb · 2022-10-28T04:25:41Z

Ok thanks @jmoralez ! And great points about predict and Dataset construction.

I will pick this up at some point (in a new PR).

github-actions · 2023-08-19T03:04:01Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jmoralez added 6 commits July 11, 2022 18:50

add num_threads to params

20fd5e6

Revert "add num_threads to params"

35e46b8

This reverts commit 20fd5e6.

retrieve env variable at runtime

460f5e7

use int32_t for strtol

a9df492

introduce default variable that overrides params

2cf8865

lint

efadad0

jmoralez changed the title ~~[tests][R-package] add num_threads to params~~ [tests][R-package] run all tests using 2 threads Jul 19, 2022

jmoralez changed the title ~~[tests][R-package] run all tests using 2 threads~~ [tests][R-package] run all tests using 2 threads (fixes #5102) Jul 19, 2022

jmoralez closed this Oct 11, 2022

jmoralez deleted the r-threads-basic branch October 11, 2022 04:17

jameslamb mentioned this pull request Oct 28, 2022

[R-package] tests and examples should not use more than 2 threads #5102

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

jmoralez commented Jul 11, 2022 •

edited

Loading

jmoralez commented Jul 19, 2022

jameslamb commented Jul 20, 2022

jmoralez commented Jul 20, 2022

jameslamb commented Oct 11, 2022

jameslamb commented Oct 12, 2022

jmoralez commented Oct 12, 2022

jameslamb commented Oct 28, 2022

github-actions bot commented Aug 19, 2023

[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

Conversation

jmoralez commented Jul 11, 2022 • edited Loading

jmoralez commented Jul 19, 2022

jameslamb commented Jul 20, 2022

jmoralez commented Jul 20, 2022

jameslamb commented Oct 11, 2022

jameslamb commented Oct 12, 2022

jmoralez commented Oct 12, 2022

jameslamb commented Oct 28, 2022

github-actions bot commented Aug 19, 2023

jmoralez commented Jul 11, 2022 •

edited

Loading