Funtion reduceResultsBatchmark saves backend when store_backends=FALSE #18

MislavSag · 2023-10-13T14:55:52Z

Hi again,

I am experimenting with reduceResultsBatchmark function.

The function consumes lots of RAM's on my local machine, even after setting store_models=FALSE and store_backends=FALSE. I have looked at source code and it seems it stores task in Result object no meter of store_backends=FALSE argument:

 rdata = mlr3::ResultData$new(data.table(task = list(task), 
                                          learner = list(learner), resampling = list(resampling), 
                                          iteration = tab$repl, prediction = map(results, 
                                                                                 "prediction"), learner_state = map(results, 
                                                                                                                    "learner_state"), uhash = tab$job.name), store_backends = store_backends)

It just sets store_backends attribute to store_backends, but it saves task inside the object.

Is that intentional? I would expect task would not save the backend if the store_backend argument is set to FALSE.

The text was updated successfully, but these errors were encountered:

sebffischer · 2023-10-16T07:34:14Z

Hey @MislavSag. The task is stored, but task$backend should return NULL.
Are you using renv in your project? renv recently changed the way packages are installed, which does not work well with R6 classes (hence the large RAM usage).
We are waiting for feedback from the renv team until we will decide how to proceed.
In case you are using renv, you should be able to address the problems by setting options("install.opts" = "--without-keep.source") in you .Rprofile.

MislavSag · 2023-10-17T09:31:31Z

Hi, I don't use renv package.
Ultimately, I decided to import only results from the registry folder and add some columns from the backend using row ids.
I don't know why it consumes so much RAM.

sebffischer · 2023-10-17T10:06:12Z

Do you install packages with the --with-keep.source option?

MislavSag · 2023-10-17T12:50:58Z

I don't use this option. If it't default it TRUE, than I use it.

sebffischer · 2023-10-17T13:32:51Z

Thanks for the info, can you provide a reprex?

sebffischer · 2023-10-23T11:41:33Z

Or instead, can you please show me the output of attributes(mlr3::benchmark_grid)?

MislavSag · 2023-11-21T12:39:18Z

Here is the ouput:

> attributes(mlr3::benchmark_grid)
NULL

I am nor sure how should I reproduce the problem. I can upload to the cloud sample structure (say 10 results) so you can try to import them. But generaly, it takes lots of time to import results.

sebffischer · 2023-11-21T14:23:13Z

Thanks, this is already helpful. Maybe you can run loadResult(1) and send us the rds files like you suggested (e.g. in mattermost to sebastian_fischer)

MislavSag · 2023-11-30T16:03:10Z

This is what loadResult(1) returns:

$learner_state
$learner_state$log
Empty data.table (0 rows and 3 cols): stage,class,msg

$learner_state$train_time
[1] 122.86

$learner_state$param_vals
named list()

$learner_state$task_hash
[1] "c9a12e2eccb321ef"

$learner_state$data_prototype
Empty data.table (0 rows and 1656 cols): retExcessStand5,aRCHLM132TRUE,aRCHLM66TRUE,ac9132TRUE,ac966TRUE,accountPayables...

$learner_state$task_prototype
Empty data.table (0 rows and 1656 cols): retExcessStand5,aRCHLM132TRUE,aRCHLM66TRUE,ac9132TRUE,ac966TRUE,accountPayables...

$learner_state$mlr3_version
[1] ‘0.17.0.9000’

$learner_state$predict_time
[1] 1.481


$prediction
$prediction$test
<PredictionDataRegr:1449>


$param_values
named list()

$learner_hash
[1] "fa8c1bb28276406e"

sebffischer · 2023-11-30T17:59:16Z

We have run into large RAM usage / slow reduceResultsBatchmark repeatedly with mlr3batchmark so this is also not really due to your specific circumstances. We will try to look into this.The only unusual thing here is the large amount of features. In your case, the task_prototype and data_prototype will definitely make the problem worse, because they are relatively large (reproducing such a data.table creates objects of size ~100KB for me).

sebffischer · 2023-12-07T13:52:22Z

Are you using a GraphLearner?

MislavSag · 2023-12-07T14:30:24Z

Yes, all my learners are graph learners.

sebffischer · 2023-12-07T14:43:20Z

Okay, this problem will hopefully be gone when the new version of paradox (which we use to represent the parameter sets) is done.

MislavSag · 2023-12-13T10:18:49Z

@sebffischer , could you recommend some workaround before new paradox package comes oout ? I hve 16.000 results now and it takes more than a day to import this, even in parallel.

sebffischer · 2023-12-18T08:20:33Z

This depends on what exactly you want to do with the results.
When you are interested in the evaluated scores, you can do use the code below as the starting point.
Note that the learner hash and task hash uniquely identify a learner / task, whereas different learners can have the same IDs. You will likely have to adapt the code below and might have to look a little into the batchtools documentation: https://github.com/mllg/batchtools.

library(mlr3verse)
#> Loading required package: mlr3
library(batchtools)
library(mlr3batchmark)
library(mlr3misc)
#> 
#> Attaching package: 'mlr3misc'
#> The following object is masked from 'package:batchtools':
#> 
#>     chunk

reg = makeExperimentRegistry(NA)
#> No readable configuration file found
#> Created registry in '/var/folders/ft/n79895td0xn0gpr6ny8jyh800000gn/T/Rtmpq5etTo/registry1b37547891f5' using cluster functions 'Interactive'

design = benchmark_grid(
  tsks(c("iris", "sonar")),
  lrns(c("classif.rpart", "classif.featureless")),
  rsmp("cv")
)

batchmark(design)
#> Adding algorithm 'run_learner'
#> Adding problem '1c326920b82b400b'
#> Exporting new objects: '6b67bf63ecedae30' ...
#> Exporting new objects: '70dd22724e5c724d' ...
#> Exporting new objects: '7c35d835f3dfae37' ...
#> Adding 20 experiments ('1c326920b82b400b'[1] x 'run_learner'[2] x repls[10]) ...
#> Adding problem '7e770c7dda9c66ef'
#> Exporting new objects: 'c1fa2fa572e6d386' ...
#> Adding 20 experiments ('7e770c7dda9c66ef'[1] x 'run_learner'[2] x repls[10]) ...

submitJobs()
#> Submitting 40 jobs in 40 chunks using cluster functions 'Interactive' ...

job_table = getJobTable()

unique_jobs = unique(job_table$job.name)
measure = msr("classif.acc")
result = map_dtr(unique_jobs, function(job_name) {
  ids = job_table[job_name, "job.id", on = "job.name"][[1]]
  learner_info = job_table[job_name, "algo.pars", on = "job.name"]$algo.pars[[1]]
  task_info = job_table[job_name, "prob.pars", on = "job.name"]$prob.pars[[1]]
  task_id = task_info$task_id
  task_hash = task_info$task_hash

  learner_id = learner_info$learner_id
  learner_hash = learner_info$learner_hash

  scores = map_dbl(ids, function(id) {
    result = loadResult(id)
    test_prediction = as_prediction(result$prediction$test)

    score = measure$score(test_prediction)

    score
  })

  avg_score = mean(scores)

  list(acc = avg_score, learner_id = learner_id, task_id = task_id, learner_hash = learner_hash, task_hash = task_hash)
})

result
#>          acc          learner_id task_id     learner_hash        task_hash
#> 1: 0.9333333       classif.rpart    iris 70dd22724e5c724d 1c326920b82b400b
#> 2: 0.2333333 classif.featureless    iris 7c35d835f3dfae37 1c326920b82b400b
#> 3: 0.7223810       classif.rpart   sonar 70dd22724e5c724d 7e770c7dda9c66ef
#> 4: 0.5330952 classif.featureless   sonar 7c35d835f3dfae37 7e770c7dda9c66ef

^{Created on 2023-12-18 with reprex v2.0.2}

MislavSag · 2023-12-21T14:47:08Z

I have gound the workaround, that worked till today.

Now, when I tried to import tasks from problems folder:

tasks_files = dir_ls(fs::path(PATH, "problems"))
task = readRDS(tasks_files[2])
tasks = lapply(tasks_files, readRDS)
names(tasks) = lapply(tasks, function(t) t$data$id)

I get an error

> task
$name
[1] "4492359b9e42ed34"

$seed
NULL

$cache
[1] FALSE

$data
Error in .__Task__id(self = self, private = private, super = super, rhs = rhs) : 
  could not find function ".__Task__id"

I am not sure if this error is linked to mlr3batchmark package or some other package from mlr3 universe.

sebffischer · 2023-12-21T15:39:50Z

you need to load mlr3 for that

MislavSag · 2023-12-22T10:56:26Z

I have loaded mlr3 with library(mlr3) but get the same error.

MislavSag · 2023-12-22T11:01:57Z

I have used mlr 17.0 on HPC, and than 17.1. locally. Can that be the source of the error ?

sebffischer · 2023-12-22T11:19:59Z

Yes, this can be the case, as id was only made an active binding very recently in this commit: mlr-org/mlr3@244572f

Can you try using the same version and reporting whether it works? (It should)

sebffischer · 2023-12-22T11:26:16Z

It seems that you have created the Task object with an mlr3 version where the task ID was already an active binding, and then loaded it with an mlr3 version where task is not yet an active binding. Thanks a lot for putting our attention to this issue, at least this needs to be properly documented somewhere.

#18

sebffischer · 2024-01-08T17:01:22Z

We have now addressed this with a warning message

tdhock · 2024-02-14T18:50:25Z

hi I have the same issue: reduceResultsBatchmark is taking up too much RAM on my cluster system, which is killing my job.
I expected that I should be able to give reduceResultsBatchmark some argument to tell it to use less RAM.
I tried store_backends=FALSE, but that did not help.
After looking at the source code of reduceResultsBatchmark, I see that the large RAM usage happens on this line:

results = batchtools::reduceResultsList(tab$job.id, reg = reg)

I see that reduceResultsList has an argument fun which defaults to NULL, meaning the identity function (the whole result file is read into RAM and returned). I was wondering if you could please give reduceResultsBatchmark a new argument, say reduceResultsList.fun which would be passed on to reduceResultsList? I believe this would fix the issue for me.

MislavSag closed this as completed Dec 22, 2023

sebffischer reopened this Dec 22, 2023

sebffischer added a commit that referenced this issue Jan 3, 2024

docs: better warning when mlr3 versions mismatch

5eaf280

#18

sebffischer mentioned this issue Jan 3, 2024

docs: better warning when mlr3 versions mismatch #27

Merged

sebffischer closed this as completed Jan 8, 2024

tdhock mentioned this issue Feb 14, 2024

add reduceResultsList.fun arg to reduceResultsBatchmark #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Funtion reduceResultsBatchmark saves backend when store_backends=FALSE #18

Funtion reduceResultsBatchmark saves backend when store_backends=FALSE #18

MislavSag commented Oct 13, 2023

sebffischer commented Oct 16, 2023

MislavSag commented Oct 17, 2023

sebffischer commented Oct 17, 2023

MislavSag commented Oct 17, 2023

sebffischer commented Oct 17, 2023

sebffischer commented Oct 23, 2023

MislavSag commented Nov 21, 2023

sebffischer commented Nov 21, 2023 •

edited

Loading

MislavSag commented Nov 30, 2023

sebffischer commented Nov 30, 2023

sebffischer commented Dec 7, 2023

MislavSag commented Dec 7, 2023

sebffischer commented Dec 7, 2023

MislavSag commented Dec 13, 2023

sebffischer commented Dec 18, 2023

MislavSag commented Dec 21, 2023

sebffischer commented Dec 21, 2023

MislavSag commented Dec 22, 2023

MislavSag commented Dec 22, 2023

sebffischer commented Dec 22, 2023

sebffischer commented Dec 22, 2023

sebffischer commented Jan 8, 2024

tdhock commented Feb 14, 2024

Funtion reduceResultsBatchmark saves backend when store_backends=FALSE #18

Funtion reduceResultsBatchmark saves backend when store_backends=FALSE #18

Comments

MislavSag commented Oct 13, 2023

sebffischer commented Oct 16, 2023

MislavSag commented Oct 17, 2023

sebffischer commented Oct 17, 2023

MislavSag commented Oct 17, 2023

sebffischer commented Oct 17, 2023

sebffischer commented Oct 23, 2023

MislavSag commented Nov 21, 2023

sebffischer commented Nov 21, 2023 • edited Loading

MislavSag commented Nov 30, 2023

sebffischer commented Nov 30, 2023

sebffischer commented Dec 7, 2023

MislavSag commented Dec 7, 2023

sebffischer commented Dec 7, 2023

MislavSag commented Dec 13, 2023

sebffischer commented Dec 18, 2023

MislavSag commented Dec 21, 2023

sebffischer commented Dec 21, 2023

MislavSag commented Dec 22, 2023

MislavSag commented Dec 22, 2023

sebffischer commented Dec 22, 2023

sebffischer commented Dec 22, 2023

sebffischer commented Jan 8, 2024

tdhock commented Feb 14, 2024

sebffischer commented Nov 21, 2023 •

edited

Loading