You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case where either the main or the midi sample finishes rather quickly and the other takes a long time to finish, it can happen that the output files generated by the faster run have already been purged when the slower run finishes. The resistance run is only started once both runs have finished, and it will then fail to find the faster run's output files which have already been cleaned up.
This happened for sample 90542B-RELOAD-HCV_S83: the de novo MIDI sample finished in a reasonable amount of time, but the main sample took about a month to assemble. When it was finally finished, the MIDI run's results had already been purged, and Kive failed to find its amino.csv, with this message: ValueError: Dataset has no dataset_file or external_path.
In the cases where subsequent runs need a previous run's results as inputs, we should check whether the previous run's results are still around. If this is not the case, the easiest solution would be to just re-start the run whose results have already been purged. This is the case for the resistance and proviral runs, they need input files from the main and de novo pipelines.
We should re-start and check for all samples in a re-try loop with a sensible limit of retries, otherwise we could get caught in a loop of re-running the main and midi sample.
Other possible solutions could be to download the input files from raw_data and to check that their checksum is what we expect, or to mark the output files that are still required for subsequent runs with an expiry date or a keep-alive to prevent them from being cleaned up.
The text was updated successfully, but these errors were encountered:
In the case where either the main or the midi sample finishes rather quickly and the other takes a long time to finish, it can happen that the output files generated by the faster run have already been purged when the slower run finishes. The resistance run is only started once both runs have finished, and it will then fail to find the faster run's output files which have already been cleaned up.
This happened for sample 90542B-RELOAD-HCV_S83: the de novo MIDI sample finished in a reasonable amount of time, but the main sample took about a month to assemble. When it was finally finished, the MIDI run's results had already been purged, and Kive failed to find its amino.csv, with this message:
ValueError: Dataset has no dataset_file or external_path.
In the cases where subsequent runs need a previous run's results as inputs, we should check whether the previous run's results are still around. If this is not the case, the easiest solution would be to just re-start the run whose results have already been purged. This is the case for the resistance and proviral runs, they need input files from the main and de novo pipelines.
We should re-start and check for all samples in a re-try loop with a sensible limit of retries, otherwise we could get caught in a loop of re-running the main and midi sample.
Other possible solutions could be to download the input files from raw_data and to check that their checksum is what we expect, or to mark the output files that are still required for subsequent runs with an expiry date or a keep-alive to prevent them from being cleaned up.
The text was updated successfully, but these errors were encountered: