-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simultaneous pulse migrations with sum of proportions > 1 (again!) #100
Comments
Good catch. You're right that we don't want checks that the sum of pulses are greater than 1, just that each pulse proportion is 0 <= f <= 1. Do we have checks on the continuous migration rates, as discussed before? The sum of incoming continuous migration rates should not exceed one still. |
Yes, there are checks that the sum of incoming continuous migrations don't exceed one. |
Just had a conversation with @sgravel about tracts and demes and model specification. An idea came up, and curious to hear thoughts. To provide a way around this issue of what do we do when there are pulses at the same time into the same population, we could modify the spec to allow a list of sources and corresponding list of proportions in the same pulse event. Then if you want 1/4 from both A and B into C at the same time, we don't have to do the calculation of A->C with proportion 1/3 (so that 1/3 x 3/4 = 1/4) and then B->C with proportion 1/4, we can specify it as pulses:
- source: [A, B]
dest: C
proportion: [1/4, 1/4]
time: 10 as an option to avoid the ambiguity and non-clarity of pulses:
- source: A
dest: C
proportion: 1/3
time: 10
- source: B
dest: C
proportion: 1/4
time: 10 Both would still be valid, and we wouldn't change the current approach of sequential application of separately defined pulses. And proportions would still only need to be <= 1 if a single source, or sum(proportion) <= 1 if multiple sources. |
I like this. I'll admit that I had to read @grahamgower's example a few times to arrive at his statement of the expected outcome, so there is indeed an ambiguity there (to me, at least). |
I think the current sequential approach is not ambiguous as far as implementation goes, as long as sorts are done properly. But I definitely agree it is quite confusing, and I think this proposal would go a long way to relieving confusing about pulses into the same population at the same time. |
Yes, the implementation is not ambiguous. But the plus of the new proposal is that a user new to all of this can just write down their desired ancestry proportions w/o have to back-calculate anything. IMO, that's a big win. |
I'm not sold on this. Why can't a user just write this down using the existing list of ancestors and proportions? Or even writing continuous migrations with a time span of 1 generation? Pulse migrations are really something that ought to be avoidable most of the time. We need clear semantics for when folks do use pulses, and when pulses coincide, but these are not expected to be nominal use cases. For this reason, we throw up a warning when pulses do coincide! |
This requires defining an entirely new deme, which might not be preferable in a given scenario, as there may be conceptual continuity of a deme that would be have to be split in two with this suggestions.
This approach should largely be avoided in specifying demographic models, I'd say. In some discrete time forward simulators, the translation to the simulator may in fact result in the same migration matrices and events, but that won't be the case for many (most?) downstream software applications. For some, like dadi and moments, it would result in excessively large scaled mutation rates and break the numerics, when instantaneous pulse events are handled without issue. I think it's also a more unfriendly way of writing down the model for the user. The point I'm going for here is to reduce confusion and ambiguity for users, and this suggested approach would end up being only more confusing.
I don't agree with that, really. Instantaneous migration events, while biological simplifications, are extremely common in the literature and widely understood and implemented by simulation software and users. (There are even simulation/inference software out there that don't handle continuous migration but instead exclusively add pulse migration edges to handle gene flow.) Pulse migrations and instantaneous admixture events are a mainstay of pop gen methods, and I don't see a reason to say they should be avoided, or a reason to make cases that users will inevitably run into more confusing than they need to be. The reason we throw up the warning is because there is the danger of confusion and ambiguity. We should be looking for a way to remove that ambiguity and make the implementation of cases that users will want to model less confusing. |
Having multiple pulses at the same time should be a rare use case. Yes, I can image reasonable uses, but are they sufficiently common that it warrants such a change? Consider that this would be a backwards-incompatible change. It would need changes to the parsers (ref implementation, demes-python, demes-c), and then each of moments, fwdpy11, msprime, demesdraw, demes-slim... That's quite a lot of work. |
I'm okay with this. We should not (IMO) be considering the spec a 1.0 at this point. It has only been looked at by a few people with fairly narrow perspectives (e.g., the implications for only a few tools have been thought through). Once more downstream tool builders see the spec, and get their feedback in, I'd be more comfortable with being hesitant about backwards compatibility. |
I also think that as more people look at demes and consider using it in their applications, we'll find these issues that we want to iron out (we've been inviting other software developers to give it a try for exactly this reason!). So this is what happened here - tracts implementation cares about this case, and in my own experience the scenario has come up when using tracts quite a bit. Additionally, if we as the developers of demes still get confused by the pulses-at-same-time scenario (this is far from the first time it's come up), imagine how users down the road will find it. What I'm suggesting is fairly minimal. Currently, pulses take a deme name as the source and a number as the proportion. I'd suggest we allow the source to be a deme name or a list of deme names, and the proportion to be a number or a list of numbers. The same checks apply (deme(s) with that/those name(s) in the Graph, proportion or sum of proportion is within [0, 1]). The removal of ambiguity and confusion to me is worth the time making changes to the 0.x implementations. |
I agree we should make any breaking changes now - this is why we're getting feedback from people. |
Maybe the pulse source/proportion should be changed to |
Strong +1 on changing to lists rather than having mixed types - it adds a lot of complexity having to support lists or not-lists. |
Sounds good to me! Lists instead of values makes is also consistent with the merge/admix format. I think this will be a very helpful change. I can open a PR tackling the demes-python change, if we want an idea of how it could look before diving in elsewhere. |
Sounds good. There's also hundreds of yaml files in the spec repository that will need updating. Some semi-automated search/replace might do the trick. |
The original issue raised here is still unresolved. The following file should be accepted (see original post), but is rejected. time_units: generations
demes:
- name: A
epochs:
- start_size: 100
- name: B
epochs:
- start_size: 100
- name: C
epochs:
- start_size: 100
pulses:
- {sources: [A], dest: B, time: 10, proportions: [0.6]}
- {sources: [C], dest: B, time: 10, proportions: [0.6]} Error from reference implementation:
|
Proportions for a given pulse must have 0 <= sum(proportions) <= 1, but additional pulses defined at the same time can cause the sum of pulse migrations to exceed 1, in which case the pulses are applied sequentially. Closes popsim-consortium#100.
The following should be valid (although not recommended), with 0.6 of B being replaced by A, and then immediately afterwards 0.6 of B is replaced by C.
The realised ancestry proportions after these pulses (forwards in time) should be that B is composed of 0.6 ancestry from C and 0.6*(1 - 0.6) = 0.24 ancestry from A.
We currently have checks (in the reference implementation, demes-python, and demes-c) that the sum of proportions entering a deme at any given time don't exceed 1. I'm not sure why I thought we wanted those checks. Sorry!
The text was updated successfully, but these errors were encountered: