-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to busco 5.1.0 and enable automated lineage selection #179
Conversation
…put files for Busco
…wnload by default)
|
This is an awesome upgrade! Error handling:
Sounds good
I guess you mean "bin" instead of "contig" here. Missing in the MultiQC report is fine. I absolutely agree that in
Not sure if this is needed since
I agree that this isn't really relevant for kept unbinned contigs. However, I'd hope that those still appear in the Open todos:
Yes that would be great. edit:
I think this would be great to check out. |
lib/Completion.groovy
Outdated
for (bin in busco_failed_bins) { | ||
failed_bins += " ${bin}\n" | ||
} | ||
log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} For ${busco_failed_bins.size()} bin(s) the BUSCO analysis failed because no genes where found or placements failed:\n${failed_bins}See ${params.outdir}/GenomeBinning/QC/BUSCO/[bin]_busco.err for further information.${colors.reset}-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so this is also when placements failed
, is that the same problem? I am uncertain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in both cases bins would be listed here.
The problem with the placement error is it says:
ERROR: Placements failed. Try to rerun increasing the memory or select a lineage manually.
In my case it was definitely not a memory problem. However, I am afraid that this could be caused by memory issues and then we would block retrying (such error messages do not really help...). That is why I pointed to the ${bin}_busco.err
file, so the user could at least discover this. It would of course not be nice, but I also don't know how else to handle this.
if [ \${#summaries[@]} -ne 1 ]; then | ||
echo "ERROR: none or multiple 'BUSCO/short_summary.specific.*.BUSCO.txt' files found. Expected one." | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: in what case can there be several specific summary files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know :D I think I just wanted to check if it is exactly 1
Currently the
I adjusted the summary.busco.py script for this.
sure :) |
ab890e8
to
60204c6
Compare
bd53286
to
61b541d
Compare
ok, finally another update :) roughly the following changes were done:
and the last time I forgot to mention:
|
Really helpful changes!
|
61b541d
to
83804e7
Compare
Thanks a lot @d4straub for all your feedback! Ready for re-review again |
Correction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Among others in preparation for adding GTDB-tk (#178).
This updates Busco from 4.1.4 to 5.1.0 and enables the use of the automated lineage selection:
--auto-lineage
is used and the data is downloaded automatically (also for the tests currently)--busco_reference
(uses BUSCO parameter--lineage_dataset
). This still requires the download of afile_versions.tsv
file, which is used to check if the newest lineage dataset is used (BUSCO warns if not).--busco_download_path
can be used (BUSCO:--download_path
currently only in combination with--auto-lineage
or--auto-lineage-prok
)--busco_auto_lineage_prok
can be used to ignore eukaryotes (BUSCO:--auto_lineage_prok
)--save_busco_reference
is also used to save the lineage datasets downloaded by BUSCO, I added an extra process for this to only do this once and not for each BUSCO processError handling:
--busco_reference
the number of marker genes for the corresponding lineage is used and all output files just contain a "100% Missing" etc.--auto-lineage
number of marker genes is unknown and no busco results file is generated -> these contigs are missing in the MultiQC BUSCO report. In the finalbusco_summary.txt
I put aNA
since I still thought those contigs should be listed (could also be done differently)BUSCO_SUMMARY
process:${bin}_busco.failed_bins.txt
containing just the bin name100% missing
withNA
as total number in thebusco_summary.txt
. What do you think?Open todos:
--busco_reference
: add also a warning if for a binned genome 100% are missing?PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).