You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to record Brian Couger's request that large data mode should checkpoint during clustering so that it won't waste too much time if the pipeline has to be resumed in a HPC environment. Below are the notes I took during the recent Autometa strategy meeting, going into a potential way of implementing.
We start with a set of datapoints that is to be clustered.
Cycle through EPS values
Decide upon the “best” EPS value
Take out “good” bins
Go back to 1 (until no more “good” bins are yielded).
Checkpointing:
Every time 4 finishes, record good bins somewhere, and note which contigs are left.
Also delete all EPS tables when 4 finishes
For resume - just start 1 at the contigs you have left
Every time 1 finishes, record a table of the groupings for a given EPS value, remember which EPS value is “next”.
For resume: Note that tables already exist, read them into memory and start clustering algorithm at the next EPS value.
The text was updated successfully, but these errors were encountered:
The checkpointing behavior you have outlined above is not quite the same as the checkpointing behavior implemented in the linked PR, but the linked PR behavior will allow resuming from where binning stopped (at the taxon iteration level).
This issue is to record Brian Couger's request that large data mode should checkpoint during clustering so that it won't waste too much time if the pipeline has to be resumed in a HPC environment. Below are the notes I took during the recent Autometa strategy meeting, going into a potential way of implementing.
We start with a set of datapoints that is to be clustered.
Checkpointing:
Every time 4 finishes, record good bins somewhere, and note which contigs are left.
Also delete all EPS tables when 4 finishes
For resume - just start 1 at the contigs you have left
Every time 1 finishes, record a table of the groupings for a given EPS value, remember which EPS value is “next”.
For resume: Note that tables already exist, read them into memory and start clustering algorithm at the next EPS value.
The text was updated successfully, but these errors were encountered: