Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simultaneous pulse migrations with sum of proportions > 1 (again!) #100

Closed
grahamgower opened this issue Jul 6, 2021 · 17 comments · Fixed by #115
Closed

simultaneous pulse migrations with sum of proportions > 1 (again!) #100

grahamgower opened this issue Jul 6, 2021 · 17 comments · Fixed by #115

Comments

@grahamgower
Copy link
Member

The following should be valid (although not recommended), with 0.6 of B being replaced by A, and then immediately afterwards 0.6 of B is replaced by C.

pulses:
- {source: A, dest: B, time: 10, proportion: 0.6}
- {source: C, dest: B, time: 10, proportion: 0.6}

The realised ancestry proportions after these pulses (forwards in time) should be that B is composed of 0.6 ancestry from C and 0.6*(1 - 0.6) = 0.24 ancestry from A.

We currently have checks (in the reference implementation, demes-python, and demes-c) that the sum of proportions entering a deme at any given time don't exceed 1. I'm not sure why I thought we wanted those checks. Sorry!

@apragsdale
Copy link
Member

Good catch. You're right that we don't want checks that the sum of pulses are greater than 1, just that each pulse proportion is 0 <= f <= 1. Do we have checks on the continuous migration rates, as discussed before? The sum of incoming continuous migration rates should not exceed one still.

@grahamgower
Copy link
Member Author

Yes, there are checks that the sum of incoming continuous migrations don't exceed one.

@apragsdale
Copy link
Member

Just had a conversation with @sgravel about tracts and demes and model specification. An idea came up, and curious to hear thoughts. To provide a way around this issue of what do we do when there are pulses at the same time into the same population, we could modify the spec to allow a list of sources and corresponding list of proportions in the same pulse event. Then if you want 1/4 from both A and B into C at the same time, we don't have to do the calculation of A->C with proportion 1/3 (so that 1/3 x 3/4 = 1/4) and then B->C with proportion 1/4, we can specify it as

pulses:
- source: [A, B]
  dest: C
  proportion: [1/4, 1/4]
  time: 10

as an option to avoid the ambiguity and non-clarity of

pulses:
- source: A
  dest: C
  proportion: 1/3
  time: 10
- source: B
  dest: C
  proportion: 1/4
  time: 10

Both would still be valid, and we wouldn't change the current approach of sequential application of separately defined pulses. And proportions would still only need to be <= 1 if a single source, or sum(proportion) <= 1 if multiple sources.

@molpopgen
Copy link
Contributor

I like this. I'll admit that I had to read @grahamgower's example a few times to arrive at his statement of the expected outcome, so there is indeed an ambiguity there (to me, at least).

@apragsdale
Copy link
Member

I think the current sequential approach is not ambiguous as far as implementation goes, as long as sorts are done properly. But I definitely agree it is quite confusing, and I think this proposal would go a long way to relieving confusing about pulses into the same population at the same time.

@molpopgen
Copy link
Contributor

Yes, the implementation is not ambiguous. But the plus of the new proposal is that a user new to all of this can just write down their desired ancestry proportions w/o have to back-calculate anything. IMO, that's a big win.

@grahamgower
Copy link
Member Author

I'm not sold on this. Why can't a user just write this down using the existing list of ancestors and proportions? Or even writing continuous migrations with a time span of 1 generation? Pulse migrations are really something that ought to be avoidable most of the time. We need clear semantics for when folks do use pulses, and when pulses coincide, but these are not expected to be nominal use cases. For this reason, we throw up a warning when pulses do coincide!

@apragsdale
Copy link
Member

Why can't a user just write this down using the existing list of ancestors and proportions?

This requires defining an entirely new deme, which might not be preferable in a given scenario, as there may be conceptual continuity of a deme that would be have to be split in two with this suggestions.

Or even writing continuous migrations with a time span of 1 generation?

This approach should largely be avoided in specifying demographic models, I'd say. In some discrete time forward simulators, the translation to the simulator may in fact result in the same migration matrices and events, but that won't be the case for many (most?) downstream software applications. For some, like dadi and moments, it would result in excessively large scaled mutation rates and break the numerics, when instantaneous pulse events are handled without issue.

I think it's also a more unfriendly way of writing down the model for the user. The point I'm going for here is to reduce confusion and ambiguity for users, and this suggested approach would end up being only more confusing.

Pulse migrations are really something that ought to be avoidable most of the time. We need clear semantics for when folks do use pulses, and when pulses coincide, but these are not expected to be nominal use cases. For this reason, we throw up a warning when pulses do coincide!

I don't agree with that, really. Instantaneous migration events, while biological simplifications, are extremely common in the literature and widely understood and implemented by simulation software and users. (There are even simulation/inference software out there that don't handle continuous migration but instead exclusively add pulse migration edges to handle gene flow.) Pulse migrations and instantaneous admixture events are a mainstay of pop gen methods, and I don't see a reason to say they should be avoided, or a reason to make cases that users will inevitably run into more confusing than they need to be.

The reason we throw up the warning is because there is the danger of confusion and ambiguity. We should be looking for a way to remove that ambiguity and make the implementation of cases that users will want to model less confusing.

@grahamgower
Copy link
Member Author

Having multiple pulses at the same time should be a rare use case. Yes, I can image reasonable uses, but are they sufficiently common that it warrants such a change?

Consider that this would be a backwards-incompatible change. It would need changes to the parsers (ref implementation, demes-python, demes-c), and then each of moments, fwdpy11, msprime, demesdraw, demes-slim... That's quite a lot of work.

@molpopgen
Copy link
Contributor

Consider that this would be a backwards-incompatible change. It would need changes to the parsers (ref implementation, demes-python, demes-c), and then each of moments, fwdpy11, msprime, demesdraw, demes-slim... That's quite a lot of work.

I'm okay with this. We should not (IMO) be considering the spec a 1.0 at this point. It has only been looked at by a few people with fairly narrow perspectives (e.g., the implications for only a few tools have been thought through). Once more downstream tool builders see the spec, and get their feedback in, I'd be more comfortable with being hesitant about backwards compatibility.

@apragsdale
Copy link
Member

apragsdale commented Jul 7, 2021

I also think that as more people look at demes and consider using it in their applications, we'll find these issues that we want to iron out (we've been inviting other software developers to give it a try for exactly this reason!). So this is what happened here - tracts implementation cares about this case, and in my own experience the scenario has come up when using tracts quite a bit. Additionally, if we as the developers of demes still get confused by the pulses-at-same-time scenario (this is far from the first time it's come up), imagine how users down the road will find it.

What I'm suggesting is fairly minimal. Currently, pulses take a deme name as the source and a number as the proportion. I'd suggest we allow the source to be a deme name or a list of deme names, and the proportion to be a number or a list of numbers. The same checks apply (deme(s) with that/those name(s) in the Graph, proportion or sum of proportion is within [0, 1]). The removal of ambiguity and confusion to me is worth the time making changes to the 0.x implementations.

@jeromekelleher
Copy link
Member

I agree we should make any breaking changes now - this is why we're getting feedback from people.

@grahamgower
Copy link
Member Author

grahamgower commented Jul 7, 2021

What I'm suggesting is fairly minimal. Currently, pulses take a deme name as the source and a number as the proportion. I'd suggest we allow the source to be a deme name or a list of deme names, and the proportion to be a number or a list of numbers.

Maybe the pulse source/proportion should be changed to sources/proportions instead? This would be consistent with deme's ancestors/proportions fields that are spelled as plurals. It's a breaking change anyway, and forcing lists is much simpler than allowing both lists and non-lists.

@jeromekelleher
Copy link
Member

Strong +1 on changing to lists rather than having mixed types - it adds a lot of complexity having to support lists or not-lists.

@apragsdale
Copy link
Member

Sounds good to me! Lists instead of values makes is also consistent with the merge/admix format. I think this will be a very helpful change.

I can open a PR tackling the demes-python change, if we want an idea of how it could look before diving in elsewhere.

@grahamgower
Copy link
Member Author

Sounds good to me! Lists instead of values makes is also consistent with the merge/admix format. I think this will be a very helpful change.

I can open a PR tackling the demes-python change, if we want an idea of how it could look before diving in elsewhere.

Sounds good. There's also hundreds of yaml files in the spec repository that will need updating. Some semi-automated search/replace might do the trick.

@grahamgower
Copy link
Member Author

The original issue raised here is still unresolved. The following file should be accepted (see original post), but is rejected.

time_units: generations
demes:
- name: A
  epochs:
    - start_size: 100
- name: B
  epochs:
    - start_size: 100
- name: C
  epochs:
    - start_size: 100
pulses:
- {sources: [A], dest: B, time: 10, proportions: [0.6]}
- {sources: [C], dest: B, time: 10, proportions: [0.6]}

Error from reference implementation:

Traceback (most recent call last):
  File "/home/grg/src/demes/demes-spec/reference_implementation/resolve_yaml.py", line 15, in <module>
    graph = parser.parse(data)
  File "/home/grg/src/demes/demes-spec/reference_implementation/parser.py", line 805, in parse
    graph.validate()
  File "/home/grg/src/demes/demes-spec/reference_implementation/parser.py", line 570, in validate
    raise ValueError(
ValueError: Pulse proportions into B at time 10 sum to more than 1

grahamgower added a commit to grahamgower/demes-spec that referenced this issue Feb 10, 2022
Proportions for a given pulse must have 0 <= sum(proportions) <= 1,
but additional pulses defined at the same time can cause the sum
of pulse migrations to exceed 1, in which case the pulses are applied
sequentially.

Closes popsim-consortium#100.
@grahamgower grahamgower added this to the 1.0 milestone Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants