Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The gene family has not been associated to a partition. #292

Open
ericolo opened this issue Oct 20, 2024 · 4 comments
Open

ValueError: The gene family has not been associated to a partition. #292

ericolo opened this issue Oct 20, 2024 · 4 comments

Comments

@ericolo
Copy link

ericolo commented Oct 20, 2024

Hi !

I get this error sometimes for species with few genomes, I know that the recommended is 15 genomes at least, but I'm trying to get a distribution of the gamma tendency to decide at what point (number of genomes) I can trust the pangenome:

ValueError: The gene family has not been associated to a partition.

Does this happen precisely because there are too few genomes and the partitions cannot be built ?

Also, what output file contains this gamma value ? Haven't look extensively but I couldn't find it yet

I'm using ppanggolin like this:
ppanggolin workflow --anno list_genomes.tsv -c 64 -o output --clusters clusters.tsv --infer_singletons --rarefaction
Thanks in advance !
Eric

@ericolo
Copy link
Author

ericolo commented Oct 21, 2024

For more info, here's the log file from that run:
debug_log.txt

And here is the complete error message:
error.txt

Thanks !

@axbazin
Copy link
Member

axbazin commented Oct 22, 2024

Hi,
This definitely happens because there are too few genomes yes. I mean, 2 is not a lot to apply a statistical model. Though the place ppanggolin crashes is a bit unexpected to me, it should probably crash at the partitioning step rather than the hdf5 writing step, if it must crash for that reason.

As for the gamma tendency values, I believe it should be written in the "rarefaction_parameters.csv" file, it is being written when you call --rarefaction with the workflow.

It looks like that information is indeed missing from the documentation, we'll look to add that in for the next release.

@ericolo
Copy link
Author

ericolo commented Oct 23, 2024

Thanks, I think I'll just proceed only with species containing 15 genomes or more
Because it works with other species that also have 2 genomes only I thought there was another problem causing this error.

Thanks a lot !

@axbazin
Copy link
Member

axbazin commented Oct 24, 2024

Ah I see, you may be right then, maybe there is something particular with those 2.

In any case, if you want to rely on the gamma-tendency for your analysis, I would definitively not go with species that have 2 genomes. I'm not sure it can compute it with just 2, but even if it can I don't think it would be very reliable anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants