Change the default behavior of `run` to `run_only_missing` #55

Minyus · 2019-07-12T15:13:01Z

Description

It is more intuitive that run avoids re-computation of nodes in default leaving the current behavior as force_rerun option.

Context

A major motivation to save the intermediate files, even if it consumes the disk space and requires additional computation time, is to avoid re-computation of the nodes. Thus, it is more intuitive to utilize the saved files in default as implemented in run_only_missing in the current version of kedro.

This suggestion is related to #30 and #25 .

Possible Implementation

Modify run

The text was updated successfully, but these errors were encountered:

idanov · 2019-07-16T14:03:41Z

Hi @Minyus , thank you for opening an issue about this. As you pointed out run_only_missing could be very useful during development, however during production the operating mode would be to recompute the data on every run, since the raw data is expected to change. Otherwise the pipeline will eventually end up not doing anything and the results will be stale.

When an MLOps or DevOps person tries to run the pipeline in production, they would expect the pipeline to be run with the default option of a command like kedro run. It would be quite easy for them to miss the suggested --force-rerun flag and they might end up deploying a pipeline which will work only once and never again, since none of the data will be missing after a run.

Therefore the decision to run only missing is relevant only during development and can be done with an optional flag by the developer of the pipeline, since they are much more knowledgable about kedro than an MLOps or DevOps person deploying the code in production.

Even if #30 is added as an option, we would very likely keep it as an optional flag rather than default behaviour. I will close the issue for now.

Minyus added the Issue: Feature Request New feature or improvement to existing feature label Jul 12, 2019

idanov closed this as completed Jul 16, 2019

Minyus mentioned this issue Sep 19, 2020

Load data from intermediate after processing? #517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default behavior of `run` to `run_only_missing` #55

Change the default behavior of `run` to `run_only_missing` #55

Minyus commented Jul 12, 2019

idanov commented Jul 16, 2019 •

edited

Loading

Change the default behavior of run to run_only_missing #55

Change the default behavior of run to run_only_missing #55

Comments

Minyus commented Jul 12, 2019

Description

Context

Possible Implementation

idanov commented Jul 16, 2019 • edited Loading

Change the default behavior of `run` to `run_only_missing` #55

Change the default behavior of `run` to `run_only_missing` #55

idanov commented Jul 16, 2019 •

edited

Loading