Replies: 8 comments 19 replies
-
This seems surprising, and may be a bug in SLiM's code. Your parameters – recombination rate 7.34e-6, nonCrossoverFraction 0.8515, and meanLength 518 – seem like they would be unlikely to cause a collision in the generated gene conversion tracts very often at all – maybe sometimes it would have to try 2 or even 3 times, but it says it is trying 100 times and failing. That doesn't make sense to me. What chromosome length are you modeling? Can you post your full model code here, ideally with all extraneous detail pruned away so that it is more or less as simple as possible while still causing this error? Also, what version of SLiM are you using? |
Beta Was this translation helpful? Give feedback.
-
Equally surprising is the fact that the problem (largely) goes away with a larger window length, i.e. if I use initializeGenomicElement(g1, 0, 9999) rather than initializeGenomicElement(g1,0,4999) as I have below. Window size shouldn't matter very much in terms of the probability of overlapping conversion tracts, because hile increasing it does lead to conversion events being spaced farther apart on average, it also introduces more of them. In any case, to answer your questions, I'm running slim version 3.5, and here is an example of the code that throws the overlapping conversion tract errors for me:
|
Beta Was this translation helpful? Give feedback.
-
OK, here's what's going on. It's not common, but sometimes you get two gene conversion tracts even though the recombination rate is 7.34e-6 and the genome length is 5000 base positions. When you do, SLiM draws lengths for the left and right extent of each of the gene conversion tracts from a geometric distribution with mean 518 (see section 1.5.6 if this is less than completely clear to you). And sometimes those draws just happen to be unusually long. The expected length of each tract is 518x2 = 1036 base positions, so the expected length of two tracts together is 1036x2 = 2072, but for the example I'm looking at in the debugger right now, SLiM drew extents of (1848, 1631) for the first tract and (1412, 97) for the second tract, for a total length of 3479 for the first tract and 1509 for the second tract. This totals to 4988, so it could fit both tracts into the genome, but it would have to draw just the right positions for both; in 100 tries, it fails to do so. In any case, with even slightly worse luck we'd draw lengths that total to more than 5000, and then we have no chance whatsoever of satisfying the constraints. In fact, with sufficiently bad luck we could draw even a single GC tract that is longer than 5000 base positions, so really we don't even need to get two GC tracts for this to happen, it could happen with one. So. SLiM draws a position for each GC tract, checks whether they are fully inside the genome and whether they overlap, and if there is a problem, it loops back to try again, and if it fails 100 times in a row, it throws the error you're seeing. But here's the key fact: when it loops back, it draws new candidate positions for the two tracts, but it does not redraw their lengths. This is necessary, in order to satisfy the constraints requested by the user. If we draw new lengths whenever we fail to come up with good positions, that will bias the length distribution towards shorter lengths (because longer lengths will tend to have positioning problems more often); so then we will no longer supply GC tracts with a mean length of 518 (per side) or 1036 (total), as requested. So it's not a bug; it's functioning as designed. In a sense, the problem is simply that you have requested a set of constraints that have a pretty high probability, over the course of a run of this model, of not being satisfied. This is, I think, due to your rescaling. Basically, with your rescaling factor of 100 you're compacting 100 generations worth of gene conversion into a single generation, and have thereby made GC tract collisions much, much more likely to occur. The locus you're simulating (5000 bases) is also pretty short compared to your expected tract length (1036 bases), and so when you get more than one tract in a given gamete, and get unusually high length draws for those tracts, boom. I can imagine that that answer is less than satisfying to you. :-> I'm not sure how I might change the behavior to fix the problem, though. I'm open to suggestions. I could imagine things like:
I think the ideal solution is simply for you to choose constraints on the model that are less likely to lead to a violation and an error – probably rescaling less. But I'm open to other ideas. Tagging @philippmesser and @petrelharp in case they have any input on this. |
Beta Was this translation helpful? Give feedback.
-
I'm thinking that this would be a useful feature if introduced as an
optional input argument, so that users can decide whether they want to use
it or to just keep sampling until they find conversion tract boundaries
that work (as would be the case in most runs).
Thank you, Max
…On Tue, May 9, 2023 at 1:48 PM Ben Haller ***@***.***> wrote:
Ah ha, good sleuthing. Let's see:
This is necessary, in order to satisfy the constraints requested by the
user. If we draw new lengths whenever we fail to come up with good
positions, that will bias the length distribution towards shorter lengths
(because longer lengths will tend to have positioning problems more often);
so then we will no longer supply GC tracts with a mean length of 518 (per
side) or 1036 (total), as requested.
I agree with this in principle, but I actually think it would be Just Fine
to always redraw lengths (and preferable to adding another confusing
argument). This is because (a) in many realistic cases (no rescaling, long
genome) it should not have any measurable effect; (b) we have such a very
vague idea of typical gene conversion lengths anyhow (I know from trying to
track down estimates for stdpopsim); (c) it might arguably be more
realistic, if maybe GC events that go off the end of a chromosome result in
inviable gametes.
Hmm. OK, I'm open to this, I think? @mshpak76
<https://github.com/mshpak76> what do you think? I think @petrelharp
<https://github.com/petrelharp> is right, of course, that for no
rescaling and a long genome it would have no measurable effect; for *your*
model, I don't know how large the introduced bias might be. It would be a
change in behavior, but I think it's justified to prevent models from
erroring. (Right now, with a freakishly large tract length draw, *any* GC
model could error, in principle, although the probability is extremely
small.)
Another option might be if the GC tract goes off the end of the
chromosome, just truncate it at the end?
I don't think I like that so much; I think that might indeed cause
noticeable bias, even for typical models. And it's not clear how you would
solve the problem that's happening here, of being unable to draw two tracts
that are non-overlapping; you have to make some arbitrary decision
(truncate both tracts at the mean of their overlap range, or something).
Better to just re-draw.
@mshpak76 <https://github.com/mshpak76> if you want me to switch the
implementation to redrawing the lengths on failure to lay out, I'm OK with
that; Peter has convinced me it's a largely harmless change that would
resolve the issue.
—
Reply to this email directly, view it on GitHub
<#373 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUZMZCNLT2VCQ6PLVY7OJH3XFKGQFANCNFSM6AAAAAAX2DX35Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
=======================
Max Shpak, Ph.D.
Department of Genetics
University of Wisconsin
Madison, WI 53706
|
Beta Was this translation helpful? Give feedback.
-
OK, the GitHub head now has a new optional parameter to So, closing this discussion now. Thanks for the report, @mshpak76; I think this is a useful, if rather "edge case", addition. Please let me know if you encounter any further difficulties. |
Beta Was this translation helpful? Give feedback.
-
Thank you for adding this option, I will test the code with the
redrawLengthsOnFailure later today or tomorrow .
…On Wed, May 10, 2023 at 1:00 PM Ben Haller ***@***.***> wrote:
OK, the GitHub head now has a new optional parameter to
initializeGeneConversion(), [l$ redrawLengthsOnFailure=F]. If you set
that to T, it redraws lengths as well as positions on a GC tract layout
failure. Your example model above now runs to completion without errors, at
least the couple of times I tried it. Of course it is still possible to
come up with a model with such stringent parameters for GC that tract
layout still fails; at that point, different parameters would need to be
chosen. :->
So, closing this discussion now. Thanks for the report, @mshpak76
<https://github.com/mshpak76>; I think this is a useful, if rather "edge
case", addition. Please let me know if you encounter any further
difficulties.
—
Reply to this email directly, view it on GitHub
<#373 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUZMZCP2LGVFIVFRPCVUNFLXFPJSXANCNFSM6AAAAAAX2DX35Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
=======================
Max Shpak, Ph.D.
Department of Genetics
University of Wisconsin
Madison, WI 53706
|
Beta Was this translation helpful? Give feedback.
-
Do I need to reinstall slim on all servers and workstations, or can the new
version of the initializeGeneConversion() function be imported?
…On Wed, May 10, 2023 at 3:02 PM Ben Haller ***@***.***> wrote:
My vote is actually to *not* complicate things, so have no argument and
just change the behavior.
And, I agree that this would be a fine quick fix.
The ship has sailed, the argument was added. It has a default of F,
providing the old/conservative behavior, so that avoids breaking backward
reproducibility for people's existing models, which is nice. It does
complicate things slightly, but nobody will notice it unless they're
actually reading the doc for initializeGeneConversion(), so I think it's
ok. :->
—
Reply to this email directly, view it on GitHub
<#373 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUZMZCNK74SZPV34ETR4423XFPX53ANCNFSM6AAAAAAX2DX35Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
=======================
Max Shpak, Ph.D.
Department of Genetics
University of Wisconsin
Madison, WI 53706
|
Beta Was this translation helpful? Give feedback.
-
Just an additional observation: I'm not sure if I understand the reasoning
behind not allowing a conversion tract to span regions outside of the
simulated genomic window. In practice, if we partition a chromosome into
e.g. 5 kb windows, there will often be conversion tracts at edges spanning
more than a single window. Conversely, the requirement that conversion
tracts are contained in a window creates an artifact of increasing the
density of conversion tracts in the center of a window and decreasing them
at the periphery, especially for short windows.
What problems would relaxing this "edge effect" requirement cause?
Thanks, Max
…On Wed, May 10, 2023 at 5:45 PM Ben Haller ***@***.***> wrote:
Do I need to reinstall slim on all servers and workstations, or can the
new version of the initializeGeneConversion() function be imported?
I'm not sure what you mean by "imported". You will want to build SLiM from
the current GitHub head, on any machine where you want this new
functionality.
—
Reply to this email directly, view it on GitHub
<#373 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AUZMZCKV6S2XDZZQ5VHRBBDXFQLB5ANCNFSM6AAAAAAX2DX35Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
=======================
Max Shpak, Ph.D.
Department of Genetics
University of Wisconsin
Madison, WI 53706
|
Beta Was this translation helpful? Give feedback.
-
I am attempting to simulate selection and migration in large populations (Ne ~ 1e+6) with mutation, recombination/conversion parameters based on estimates for Drosophila. Because of the long run time, I'm using an approximation where I simulate a smaller population size (Ne ~ 1e+4) while multiplying mutation rates, recombination rates, and selection coefficients by 100 to keep the same effective Ns, Nr, Nmu etc.
One problem that I experienced with this is that with a rescaled recombination rate and conversion tract length of:
I invariably get the following SLiM error:
I am not sure what I can do about this. If I reduce the tract size to 100 (e.g. similar to mammalian), I don't get this error, but I would like to simulate a Drosophila-like population as closely as possible. I also would rather not use very large window sizes in order to keep the simulations consistent with some empirical results based on ~ 5kb window size.
What are my options - is there a way to force SLiM to run with overlapping recombination tracts?
Beta Was this translation helpful? Give feedback.
All reactions