Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with results post "--assign", single gene in multiple Orthogroups #941

Open
nhartwic opened this issue Nov 14, 2024 · 4 comments
Open

Comments

@nhartwic
Copy link

I know that "orthogroups/" is considered to be deprecated in some sense, but it is still the best description of where different orthofinder results (like individual gene trees or protein msa) actually end up so it ends up being the second place I visit after Species_Tree. I've noticed some problems with the results after using the "--assign" options, specifically some interaction that is causing "orthogroups/unassigned_genes.tsv" to not be generated correctly. Consider the following two results, one from the initial orthofinder run and one from the second orthofinder run with --assign

$grep Osat.NB.HPI3.Chr9.g378860.t1 orthov3.core/Orthogroups/*tsv 
orthov3/Orthogroups/Orthogroups_UnassignedGenes.tsv:OG0030720                   Osat.NB.HPI3.Chr9.g378860.t1

$ grep Osat.NB.HPI3.Chr9.g378860.t1 orthov3.full/Orthogroups/*tsv 
orthov3/Orthogroups/Orthogroups.tsv:OG0031414   AT4G00500.2.Araport11.447, AT4G16070.3.Araport11.447            Osat.NB.HPI3.Chr9.g378860.t1
orthov3/Orthogroups/Orthogroups_UnassignedGenes.tsv:OG0030720                   Osat.NB.HPI3.Chr9.g378860.t1

...This is clearly a bug. I haven't dug through the source code to figure out what is going wrong exactly, but clearly a gene can't be in both OG0030720 and OG0031414

@nhartwic nhartwic changed the title issues with results post "--assign" issues with results post "--assign", single gene in multiple Orthogroups Nov 14, 2024
@lauriebelch
Copy link

Would you be able to send the following files for the --assign run that has odd results;

  1. Orthogroups.tsv
  2. Orthogroups_UnassignedGenes.tsv
  3. Phylogenetic_Hierarchical_Orthogroups/N0.tsv

I'll see if I can work out what is happening

@nhartwic
Copy link
Author

Tar archives with full output. Links good for 6 days:

Orthofinder initial run
Orthofinder secondary --assign run

@lauriebelch
Copy link

Thanks - so if we take as an example gene Osat.NB.HPI3.Chr9.g378860.t1:

In the --core run it is not assigned to an orthogroup, so ends up in unassigned genes (OG0030720)
The orthogroup names in unassigned genes start with OG0014328 (as there are 14327 orthogroups in this run)
In the --assign run it is assigned to an orthogroup (OG0031414)
... however it is still in unassigned genes (OG0030720)
According the Statistics_overall there should be 15214 unassigned genes, but we have 4641.
... and we don't have any unassigned genes from the species added in the --assign run
(sorry I know this is mostly repeating what you said - just writing it out for myself here to help with troubleshooting!)

Thanks for pointing this out - i'll see if we can provide a fix for this

@nhartwic
Copy link
Author

I'm trying to get back into this. I need to do a large (>300 samples) orthofinder run and would prefer to use the "--assign" option for its superior performance.

If the only issue is an erroneous Orthogroups_UnassignedGenes.tsv, then I'll probably run using the current version and just correct that output since that isn't very difficult. I am concerned that the issues run deeper though. I plan to do some testing to try to figure out the extent of the issues, but if you have any knowledge here, I'd love your input as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants