-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"empty" chromosomes should not be in tree sequence, at least sometimes? #205
Comments
Yeah, this definitely deserves to be an issue, but I'm not sure how to handle it. When simulating sex chromosomes, SLiM has a concept of "null chromosomes" that are, as you say, essentially non-chromosomes – placeholders. In other situations, such as this, SLiM has no way to know whether the user's intent is to have a chromosome that contains no mutations, or a null chromosome. There needs to be a way, particularly with addRecombinant(), to express these semantics in distinct ways. That would allow SLiM to do a variety of nice/smart things. I'm not sure what the right path forward is; there are backward compatibility issues to worry about too, although perhaps they are minor. |
OK, this bug has become high-priority because #208 reveals that not treating null genomes correctly actually leads to inconsistent internal state for certain models involving haploids and overlapping generations. Anybody who is interested can read over there for the gory details. Here, suffice to say that this needs to be fixed immediately because of that. @petrelharp and I have discussed this offline and arrived at a proposal, enumerated below. SLiM already has a concept of "null genomes" – genome objects that are not just empty, but conceptually don't even exist. When simulating the X chromosome, SLiM uses null genomes as placeholders for the Y in males; when simulating the Y, it uses null genomes as placeholders for the X in both sexes. What we propose is to extend this existing concept to other cases where it applies – in particular, to the second genome of haploids, but maybe there are other cases where this will turn out to apply as well. Right now, passing (NULL, NULL, NULL) to If you actually want an empty but non-null genome, you will be allowed to pass (NULL, NULL, 0) to request that. Right now passing that combination of arguments to This shift will break backward compatibility for some models that rely upon the existing (NULL, NULL, NULL) functionality. If the intent is to create an empty non-null genome, they should shift to using (NULL, NULL, 0) instead to do that. If the intent is to create a null genome, conceptually, then all is well – except that the model will now be under SLiM's existing restrictions that disallow any change/inspection/manipulation of null genomes, so model code that previously executed operations over all genomes will now need to be narrowed down to involve only the non-null genomes. This should result in a cleaner model design, where it is more clear which genomes are actually being inspected/modified in a meaningful way, so it should be an improvement, but it will break some existing code. To minimize the impact of that, a new property will be added to A benefit of these changes is that substitution of fixed mutations will now be automatic in virtually all models, whether haploid, haplodiploid, or anything else. The reason that, e.g., haploid models have needed to explicitly remove/substitute fixed mutations themselves is that SLiM didn't know whether empty genomes were simply empty (in which case a given mutation would not be fixed) or were null (in which case the genome would be irrelevant for evaluation of fixation). Now SLiM will know the difference between these two things, and so it will be able to evaluate fixation correctly for non-diploid models, allowing most uses of removeMutations() to be removed from existing models. On the tree-seq side, we propose that null genomes should simply be omitted from the tree sequence tables entirely. They exist as placeholders on the SLiM side, for ease of bookkeeping; there is no need for them to exist on the tskit side. This will be a change of policy for sex-chromosome simulations that output tree sequences; and it will also change the tree sequence observed for haploid etc. models. As a consequence, some backward compatibility may be broken and some analysis code may have to change. The upside is that these confusing, cluttering placeholders will no longer be present in the tree sequence, which should make visualization, interpretation, and analysis of such tree sequences much easier. The existing policy is that Haploid etc. models that are WF models will not change their mode of operation, because there will be no way to express to SLiM the semantic concept that a given genome should be null, rather than empty. Instead, such models will continue to remove mutations from second genomes using A new method, This bug has only survived this long because of multiple failures in the internal consistency-checking code employed by SLiM. Those failure points will be patched up as part of the fix of this bug. Whoo, that was complicated. It would be great if interested parties would read the above and comment as needed, so that any issues can be anticipated prior to implementation. Thanks! @petrelharp @elissasoroj @jeanrjc @mam737 @dinhe878 @yannickwurm |
Question: will
|
Indeed, yes. I think the motivation for breaking backward compatibility with |
This does seem a less bad break than starting to add mutations to the empty genomes thus obtained. |
So, I'm partway through implementing this, and discovered an unexpected consequence. Recipe 16.13 (vanilla nonWF haploid model) no longer functions properly, because it removes mutations itself at a frequency of 0.5. Now that (NULL, NULL, NULL) makes null genomes instead of empty genomes, the frequency calculations in SLiM now automatically do the right thing – fixation in a haploid model now happens at a calculated frequency of 1.0, not 0.5. So the model's code is removing mutations that are only halfway to fixation. This is bad – the model runs, but produces unintended/incorrect results. I think the problem is that I relaxed the "null genome" checks inside removeMutations(), as described above, in an attempt to make it "just work" for models like this. I should leave those null genome checks in place, so that models like this will now error out when they try to remove mutations from null genomes. That will alert the user to the fact that something has changed that has broken backward compatibility, and hopefully they will figure out the right fix. |
Another update: I decided that (NULL, NULL, 0) to addRecombinant() to request a non-null empty genome is not needed, and will not be implemented. It's just an ugly hack in the API, using the breaks vector to pass in a flag to the method (eww), and it's really not needed. Virtually nobody is expected to even want to do this. If you want an offspring individual with both genomes empty, use addEmpty(). If you really want an offspring individual with one empty (but non-null) genome, make it inherit from an individual in a dummy subpop that you create to represent "wild-type ancestors" or whatever. I.e., make the inheritance pattern reflect biological reality. |
Other implementation decision: for right now, null genomes will continue to be recorded in the tree sequence. This bug will still be fixed, because empty genomes will be changed to null genomes, as they should be, and then you can filter them out with a simplification if you want to. This bug is really about empty non-null genomes hanging around in the tree sequence, primarily; they are harder to filter out. Null genomes are hard to keep out of the tree sequence because there's a fair bit of code in SLiM that assumes that there are exactly two nodes per individual, with specific IDs (* 2 and * 2 + 1). Once the work I'm doing is done, perhaps we can revisit the idea of keeping null genomes out of the tree sequence, too, but it's an orthogonal issue I think. |
Thanks @yannickwurm; I would've tagged him but I couldn't find his username on GitHub. |
Another implementation decision: I won't be adding an On the other hand, it looks like I will be adding arguments to both |
I have realized that these changes open up the topic of dominance coefficients. Normally, mutations in SLiM use a dominance coefficient kept by the MutationType when they are heterozygous. Right now, the only use of "null genomes" is in sex-chromosome simulations, where individuals that are XX can be either homozygous or heterozygous for an X-linked mutation they possess, while individuals that are XY can only possess one copy and the Y chromosome is a null genome. This special case, of the XY fitness, is presently controlled with a special X-dominance coefficient kept by SLiMSim, which is the same for all mutation types (representing the degree of dosage compensation exhibited by the organism, I guess). This strategy was already pretty clunky, and now that null genomes will be present in other cases (e.g., haploids), there is a need for a more general mechanism. In haplodiploid models, or models of alternation of generations, there will similarly be both haploids and diploids present, so a different dominance coefficient is needed to govern fitness in the haploid case (including the XY case, conceptually) than in the diploid case (including the XX case). So I propose to remove the current X-dominance coefficient completely, deliberately breaking compatibility for those who were using it (to push them to the new scheme), and replacing it with a new property on MutationType, rather than SLiMSim, called Right now I think haplodiploid models and alternation of generations models often control these dominance effects with If anybody tagged on this has any thoughts regarding this proposal, now is the time to comment. :-> It's amazing how you tug on one string and it turns out to be connected to all the other strings, isn't it? |
lol, yes. I just hope they don't get all tangled up. |
OK, I've just committed a fix for this. The diffs are quite complex and touch on a bunch of different areas, but I think I got it right, and it passes all tests on my end. :-> @petrelharp, please run your tests against this new version to confirm that all is well. As mentioned in the discussion above, the fixes for this issue involved some intentional breaks with backward compatibility, and models involving haploidy will likely need to be revised. @petrelharp @elissasoroj @jeanrjc @mam737 @dinhe878 @yannickwurm please be aware of this. I'm happy to help with script revisions for anybody who is unsure how to proceed; the changes are straightforward but may be non-obvious. :-> |
Also @roddypr |
All the pyslim and |
A summary of what changed from a user's perspective, for those wanting to adjust their models (sorry for all the verbiage, this was quite a complex change!):
Hopefully that clarifies the issues. I can send a draft of the SLiM 3.7 manual to anyone who wants it; all of these changes are now documented (you can also see the new doc in a current build of SLiMgui, in the help panel). Again, I'm happy to assist in modifications to SLiM scripts as needed; just send me a model script that runs under SLiMgui and I'll tweak it for you. (I can't help with changes to Python analysis code, though, since I'm not fluent in Python.) Sorry for all this; I strive to maintain backward compatibility, but this is a case where breaking models was really unavoidable, and the new design is markedly better. It's faster, too – nonWF models that use |
Ah, one important addendum to the above:
|
Wow, this was pretty big! It all sounds better, though! Note that once #218 goes in the null chromosomes won't in the tree sequence, though. |
Yes; it's not clear to me whether #218 will go in any time soon, though. As you and I realized, there are substantial obstacles there, and the benefit is not that large since if one doesn't want the null genomes in the tree one can just simplify them all away anyway. So, we'll see. |
Over in tskit-dev/pyslim#192 (reply in thread) we (me and @elissasoroj) we got tripped up by 'empty' chromosomes being included in the tree sequence. If they are really supposed to be "empty chromosomes", then they should be included, as they are, but what @elissasoroj was using them for in the script was essentially as non-chromosomes. Perhaps there's a less confusing way to arrange all this?
The text was updated successfully, but these errors were encountered: