-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forcing DAG direction #1294
Comments
library(drake)
model_types <- c("model1", "model2")
plan <- drake_plan(
life_counter_data = getLifeCounterData(environment = "PROD",
key_directory = config_parameters$LOCAL_CONFIG$DirectoryKeyCloud_RStudio,
max_forecasting_horizon = argument_parser$horizon),
unit_metadata = getMetadata(environment = "PROD",
key_directory = config_parameters$LOCAL_CONFIG$DirectoryKeyCloud_RStudio,
operex_schema = config_parameters$SF_CONFIG$schema_name, db_src = c(1,2,3)),
unit_with_recent_data = getLastData(life_counter_data),
processed_data = featureEngineering(raw_data = life_counter_data,
metadata = unit_metadata,
recent_units = unit_with_recent_data,
max_forecasting_horizon = argument_parser$horizon),
ts_models = target(
trainModels(
input_data = processed_data,
max_forecast_horizon = argument_parser$horizon,
max_multisession_cores = argument_parser$sessions,
model_type = type
),
transform = map(type = !!model_types)
),
accuracy = target(
accuracy_explorer(
mode = "train",
models = ts_models,
max_forecast_horizon = argument_parser$horizon,
directory_out = "/data1/"
),
transform = map(ts_models, .id = type)
),
saving = target(
saveModels(
models = ts_models,
directory_out = "/data1/",
max_forecasting_horizon = argument_parser$horizon,
max_multisession_cores = argument_parser$sessions
),
transform = map(ts_models, .id = type)
),
aggregated_accuracy = target(
# Could be dplyr::bind_rows(accuracy)
# if the accuracy_* targets are data frames:
list(accuracy),
transform = combine(accuracy)
),
final_accuracy = {
# Mention the symbol "aggregated_accuracy" so final_accuracy runs last:
aggregated_accuracy
bestModel(models_metrics_uri = "/data1/",
metric_selected = "MAPE",
final_metrics_uri = "/data1/",
metrics_store = "local",
max_forecast_horizon = argument_parser$horizon)
}
) |
Also, please have a look at https://books.ropensci.org/drake/plans.html#how-to-choose-good-targets. Targets are R objects that |
There's also |
Hi @wlandau The reason because I saved my models is because my workflow was crashing when I was storing the targets, but I do not know if this was normal. |
If you do decide to save models, I recommend Do you need to store the entire model object? I am not familiar with The Bayesian analysis example here and here is an example of how to deal with these problems. Markov chain Monte Carlo generates a large number of posterior samples, and so it is unfeasible to save every single fitted model. So the custom functions in the workflow generate a data frame of posterior summaries instead of saving the entire model. |
I am using qs for saving the binaries: saveModels <- function(models, directory_out, max_forecasting_horizon, max_multisession_cores) {
print("Saving the all-mighty mable")
qsave(x = models, file = paste0(directory_out, attributes(models)$model, "_horizon_", max_forecasting_horizon, ".qs"),
preset = "custom",
shuffle_control = 15,
algorithm = "zstd",
nthreads = max_multisession_cores)
#saveRDS(object = models, file = paste0(directory_out, "ts_models_horizon_", max_forecasting_horizon, ".rds"))
print(paste0("End workflow for ", attributes(models)$model, " models with maximum forecasting horizon ", max_forecasting_horizon))
} The problem is that fable needs the binary containing the model to make the forecast. Should I use the format="qs" directly in the drake plan with file_out? BR |
So the physical model files need to be there? Nothing you can do about it? I that case, maybe combine the model-fitting step and forecasting step into a single target. Data in the cache will be lighter that way. Merging two targets into one is a good strategy sometimes if you find yourself running too many targets or saving too much data. See https://books.ropensci.org/drake/plans.html#how-to-choose-good-targets for a discussion of the tradeoffs. The example at https://github.com/wlandau/drake-examples/blob/13e6edf9d6c4b60c0c57d0fc303cfba63702e9f2/stan is a similar situation. In Bayesian analysis, posterior samples eat up a lot of data, and we don't want to save everything for every single model. So we combine model-fitting and summarization into a single step and return a one-line data frame for each model. See https://github.com/wlandau/drake-examples/blob/13e6edf9d6c4b60c0c57d0fc303cfba63702e9f2/stan/R/functions.R#L62-L85 and https://github.com/wlandau/drake-examples/blob/13e6edf9d6c4b60c0c57d0fc303cfba63702e9f2/stan/R/plan.R#L16-L20. |
Prework
Dear community, thanks to will I was able to complete my drake workflow splitting a model fitting using the fable package in a way that allow me to decrease the memory consumption of my server from 220GBs to 70GBs (pretty big success here) with the only limitation increasing 50% my running time (from 60mins to 90).
Prework is available here: #1293
Description
Now I am trying to fetch all of the accuracy metrics of my models to get the best one but the problem is that this step is being executed before my models run (maybe because the accuracy csv files are already there?)
Reproducible example
The plan is as follows:
My final accuracy function is as follows:
But my dag looks as follows:
Desired result
I would like to load the accuracy metrics after I have save my models and compute the accuracy.
Session info
The text was updated successfully, but these errors were encountered: