Resistance run fails on Kive if the input files have already been purged #921

CBeelen · 2023-01-25T23:49:39Z

In the case where either the main or the midi sample finishes rather quickly and the other takes a long time to finish, it can happen that the output files generated by the faster run have already been purged when the slower run finishes. The resistance run is only started once both runs have finished, and it will then fail to find the faster run's output files which have already been cleaned up.

This happened for sample 90542B-RELOAD-HCV_S83: the de novo MIDI sample finished in a reasonable amount of time, but the main sample took about a month to assemble. When it was finally finished, the MIDI run's results had already been purged, and Kive failed to find its amino.csv, with this message:
ValueError: Dataset has no dataset_file or external_path.

In the cases where subsequent runs need a previous run's results as inputs, we should check whether the previous run's results are still around. If this is not the case, the easiest solution would be to just re-start the run whose results have already been purged. This is the case for the resistance and proviral runs, they need input files from the main and de novo pipelines.

We should re-start and check for all samples in a re-try loop with a sensible limit of retries, otherwise we could get caught in a loop of re-running the main and midi sample.

Other possible solutions could be to download the input files from raw_data and to check that their checksum is what we expect, or to mark the output files that are still required for subsequent runs with an expiry date or a keep-alive to prevent them from being cleaned up.

The text was updated successfully, but these errors were encountered:

CBeelen mentioned this issue Jan 26, 2023

Expiry date / label to keep output files of certain jobs cfe-lab/Kive#1870

Open

Donaim mentioned this issue Aug 29, 2023

Rerun a sample if it gets purged #464

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resistance run fails on Kive if the input files have already been purged #921

Resistance run fails on Kive if the input files have already been purged #921

CBeelen commented Jan 25, 2023

Resistance run fails on Kive if the input files have already been purged #921

Resistance run fails on Kive if the input files have already been purged #921

Comments

CBeelen commented Jan 25, 2023