Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aded fasta_gxf_busco_plot sub workflow #96

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

GallVp
Copy link
Member

@GallVp GallVp commented Nov 27, 2024

Closes #77

Changes

  1. Added fasta_gxf_busco_plot sub workflow
  2. Routed the GFFREAD output from fasta_gxf_busco_plot to Orthofinder and other modules and, thus, removed the GFFREAD module from genome_and_annotation
  3. Removed the BUSCO_BUSCO module from genome_and_annotation

Comments

The fasta_gxf_busco_plot sub workflow also evaluates the BSUCO from the genome itself. This is additional work that is being done now. I am not sure if this is needed. If this is useful, then the outputs can be routed to MultiQC so that it shows BUSCO for both the genome and its annotation. This is what I have done for the genepal pipeline. Pease see the logic here: https://github.com/Plant-Food-Research-Open/genepal/blob/ee702d71a9ea4f422c92729533ff0564c231b5e8/workflows/genepal.nf#L239-L262

@FernandoDuarteF
Copy link
Collaborator

Thank you @GallVp.

The fasta_gxf_busco_plot sub workflow also evaluates the BSUCO from the genome itself. This is additional work that is being done now. I am not sure if this is needed. If this is useful, then the outputs can be routed to MultiQC so that it shows BUSCO for both the genome and its annotation. This is what I have done for the genepal pipeline. Pease see the logic here: https://github.com/Plant-Food-Research-Open/genepal/blob/ee702d71a9ea4f422c92729533ff0564c231b5e8/workflows/genepal.nf#L239-L262

Do you know if there is significant difference in terms of time and performance between 'protein' and 'genome' modes? Specially for large genomes.

Perhaps we can leave this to the user? For example, we can set it as a parameter with 'protein' mode as default. Then if they want to include genome evaluation they can change it to 'genome'.

@GallVp
Copy link
Member Author

GallVp commented Nov 27, 2024

Thank you @FernandoDuarteF

If we allow mode selection and the user picks 'prot', we will need to run GFFREAD outside the sub workflow to feed OrthoFinder and this sub workflow.

Yes, the genome mode can take a little longer. But generally users are interested in both the genome and annotation completeness when annotation is available. As soon as a person sees annotation completeness, their next natural question is how does it compare to the completeness of the genome itself. For example, if annotation completeness is 85%, which is not that great for today's standard, the user wants to know if this is the lapse of the annotation or is it because the genome itself is incomplete. That's why the following scenarios are the most common ones I have seen,

  1. Genome only because annotation is missing.
  2. Genome + annotation.

@FernandoDuarteF
Copy link
Collaborator

Hi @GallVp, I will go through this PR later, as there are some other small things we would like to sort out aiming for a first release.

We were also thinking about adding a BUSCO ideogram to your subworkflow (we have a local module for this purpose, see BUSCO protocols for an example of the expected). The module requires a custom Rscript that wrote. Is there a way to add nf-core modules based on custom scripts? Is there a feasible way to port our local module into an nf-core module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using FASTA_GXF_BUSCO_PLOT sub workflow
2 participants