Aded fasta_gxf_busco_plot sub workflow #96

GallVp · 2024-11-27T04:43:04Z

Closes #77

Changes

Added fasta_gxf_busco_plot sub workflow
Routed the GFFREAD output from fasta_gxf_busco_plot to Orthofinder and other modules and, thus, removed the GFFREAD module from genome_and_annotation
Removed the BUSCO_BUSCO module from genome_and_annotation

Comments

The fasta_gxf_busco_plot sub workflow also evaluates the BSUCO from the genome itself. This is additional work that is being done now. I am not sure if this is needed. If this is useful, then the outputs can be routed to MultiQC so that it shows BUSCO for both the genome and its annotation. This is what I have done for the genepal pipeline. Pease see the logic here: https://github.com/Plant-Food-Research-Open/genepal/blob/ee702d71a9ea4f422c92729533ff0564c231b5e8/workflows/genepal.nf#L239-L262

FernandoDuarteF · 2024-11-27T14:23:39Z

Thank you @GallVp.

The fasta_gxf_busco_plot sub workflow also evaluates the BSUCO from the genome itself. This is additional work that is being done now. I am not sure if this is needed. If this is useful, then the outputs can be routed to MultiQC so that it shows BUSCO for both the genome and its annotation. This is what I have done for the genepal pipeline. Pease see the logic here: https://github.com/Plant-Food-Research-Open/genepal/blob/ee702d71a9ea4f422c92729533ff0564c231b5e8/workflows/genepal.nf#L239-L262

Do you know if there is significant difference in terms of time and performance between 'protein' and 'genome' modes? Specially for large genomes.

Perhaps we can leave this to the user? For example, we can set it as a parameter with 'protein' mode as default. Then if they want to include genome evaluation they can change it to 'genome'.

GallVp · 2024-11-27T20:52:47Z

Thank you @FernandoDuarteF

If we allow mode selection and the user picks 'prot', we will need to run GFFREAD outside the sub workflow to feed OrthoFinder and this sub workflow.

Yes, the genome mode can take a little longer. But generally users are interested in both the genome and annotation completeness when annotation is available. As soon as a person sees annotation completeness, their next natural question is how does it compare to the completeness of the genome itself. For example, if annotation completeness is 85%, which is not that great for today's standard, the user wants to know if this is the lapse of the annotation or is it because the genome itself is incomplete. That's why the following scenarios are the most common ones I have seen,

Genome only because annotation is missing.
Genome + annotation.

FernandoDuarteF · 2024-11-28T11:17:44Z

Hi @GallVp, I will go through this PR later, as there are some other small things we would like to sort out aiming for a first release.

We were also thinking about adding a BUSCO ideogram to your subworkflow (we have a local module for this purpose, see BUSCO protocols for an example of the expected). The module requires a custom Rscript that wrote. Is there a way to add nf-core modules based on custom scripts? Is there a feasible way to port our local module into an nf-core module?

Aded fasta_gxf_busco_plot

adf88f7

GallVp requested a review from FernandoDuarteF November 27, 2024 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aded fasta_gxf_busco_plot sub workflow #96

Aded fasta_gxf_busco_plot sub workflow #96

GallVp commented Nov 27, 2024

FernandoDuarteF commented Nov 27, 2024

GallVp commented Nov 27, 2024

FernandoDuarteF commented Nov 28, 2024

Aded fasta_gxf_busco_plot sub workflow #96

Are you sure you want to change the base?

Aded fasta_gxf_busco_plot sub workflow #96

Conversation

GallVp commented Nov 27, 2024

Changes

Comments

FernandoDuarteF commented Nov 27, 2024

GallVp commented Nov 27, 2024

FernandoDuarteF commented Nov 28, 2024