-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inherited attributes #23
Comments
Nice idea! It does seem like this would clean a few things up. |
I think containing those lower-level attributes within You say that when we go up the tree looking at defaults and don't find any, it would error out - wouldn't it rather just set the attribute to |
We already have, e.g. selfing and cloning rates are |
Exactly yeah; I was more thinking about stuff that we need values for, like |
The main question is, how does this interact with the logic for deriving epoch start/end times? Is that something we should specify explicitly as well, or is it too complicated to expect it to be implemented identically more than once? |
That all makes sense about the defaults of cloning and selfing rates set to zero already. Agree we probably want to only set it in one place.
I'm thinking that we probably don't want to have default start and end times for epochs specified at levels higher than the deme. I can only imagine corner cases where it's useful, and most cases where it would just make specifying a valid demography more difficult. |
Those "default" fields in the schema can easily be removed/changed. They don't actually do anything anyway, as far as I can tell. Some schema validators set those fields during validation. Validation of input models via the schema is probably unhelpful anyway, because it's nowhere near strict enough for our purposes (we can't specify inter-epoch constraints), and the error messages would be horrible for users.
Yeah, the defaults mechanism described in this issue don't make sense for start/end times of epochs at all. So we'd need to have a way of "opting out" from the defaults propagation. |
Agreed, start/end times are different and shouldn't have defaults. We can easily specify this in the schema, so that we only accept a fixed set of keys in the |
Thinking about this more, I'm not so sure it makes much headway into unifying a spec for simplified and fully-qualified graphs. The allowable fields in either spec are not so challening to define in a unified way (defaults or no defaults, the simplified spec would be a superset of the fully-qualified spec), the problem is one of clearly describing the semantics for omitted details in simplified graphs (and a json schema cannot describe object relations).
As far as I can tell, the only sensible things to include in |
Good points @grahamgower. Maybe it would be productive to have 1/2 hour meeting on this next week, and save a bit of back and forth here? |
Sure, happy to have a chat to work through the details. Next week is wide open for me. |
We didn't really resolve whether to include a |
I think it would still be clearer if we did have a |
I guess we should move this to the spec repo? |
I wonder if the defaults section proposed here doesn't fit with cloning/selfing rates either. @apragsdale mentioned phylogenetic inheritance of these properties in the last call, which would make more sense than some kind of graph-wide or even deme-wide default value. Consider other things people might want to put in a |
Definitely open to suggestions here @grahamgower - my main point is that we shouldn't define default values directly as attributes of the graph when they only apply to a subset of the entities defined in the Graph. |
Sure, I certainly agree with that. But I wonder whethere we ever want defaults like this, or if all defaults we can think of are actually heritable, or otherwise need some very custom semantics. So either we come up with some general way of specifying heritable traits, or we say that we're not doing that at all (at least not now). Maybe @apragsdale has some insight here about what he wanted to do with the selfing and cloning rates? |
Agreed - leaning towards the latter! |
Also agree. I brought that up that idea of traits being inherited from ancestors, but I don't really like it. I think it makes more sense to have defaults specified in some I don't know if the defaults should be |
What about #25 though? |
Even if we enforce specifying epochs for each deme, I think we still want to allow specifying inherited properties at the deme level. You could want to write a deme with a bunch of epochs with size changes, but the selfing rate is constant across all of them. You'd have to repeat the I think the hierarchy of inheritance would look like:
and that's not too overly complicated? |
What about Demes etc having a defaults section where you can specify these things? It would be really nice if the properties we define on objects were actual attributes of the thing in question (other than the defaults section). |
Sorry, this is what you were suggesting! No, looking up the document tree for values is totally fine imo |
I see - yes that makes sense. So something like this? defaults:
- selfing_rate: 0.1
demes:
- id: deme1
epochs:
- ...
defaults:
- selfing_rate: 0.2
- id: deme2
epochs:
- ... where the |
@apragsdale, what exactly did you have in mind with the selfing and cloning rates where you don't want trait heritablility? Just so that we don't rule this out completely, I was suggesting that we say "these traits are inherited", then we don't do anything at all in the spec or the python implementation, other than saying that. It's up to simulators to apply a trait value when it's specified (presumably at the roots), provide a sensible default if it's not specified, and propagate it through to descendants via splits, branches, mergers, admixtures, pulses and migrations. One can change the trait value at some internal part of the graph by setting a value in the relevant deme in the relevant epoch. The rule(s) for how any particular trait is inherited and propagated is up to the simulator. |
I think there are too many corner cases and ambiguities with passing a selfing rate from an ancestor deme to descendants (and what about two ancestors that have different selfing rates? which takes precedent?). I think it's cleaner to just have
I think this is a bit dangerous, because it means that two simulators look at the same YAML specification but then simulate under different rules. I think we want to determine the rules of how properties are inherited within |
Those are very good points, and I agree that having defaults are easier to define clear behaviour for, and there's negligible burden for us to implement these defaults. But if |
Should we add an example trait that is clearly heritable then @grahamgower, just to make the framework for future extensions clear? |
That's an interesting idea. The choice of example trait would want to be 100% heritable, and clearly quantitative, or clearly Mendelian. But now that you've got me thinking about this, I admit that supporting heritable traits is probably a huge can or worms. Probably it would be best for the user to specify Mendelian or quantitative. And a trait could instead require very specialised semantics (e.g. ploidy in plants); it could be partially heritable; it could have an interaction with a per-deme environmental attribute. Users will want multiple "traits" to emulate SNVs that are (partially) linked, and then we start talking contigs/chromosomes... So, sorry for derailing the discussion. Lets do the |
Lets do exactly this, and without saying |
OK, sounds good to me. What's the dislike for the fully-qualified object references in the defaults section though, out of interest? I thought it would be a good idea to future-proof things a bit in case we wanted to do something like adding default migration rates (where the name of the property is just |
The full path to selfing_rate is |
Well I was thinking of using the "class" reference here to avoid that - |
But that has two problems. First, when defaults are set at the toplevel, this actually sets the deme default, not the epoch default. Second, the class name is an implmentation detail of the current parser, which is in no way necessary. E.g. an implemntation could just keep everything as the nested dicts/lists object like that obtained from |
I thought we were going to make I think it's entirely natural for us to say within the spec that the JSON objects correspond to different classes - it would be quite awkward and difficult not to, I would imagine. |
Yep, that's definitely what we have now, and I don't see that changing. So naming selfing_rate as epochs.selfing_rate would be fine then. Or should it be demes.epochs.selfing_rate at the top level?
It's just a data model, why does there need to be any mention of classes? That sounds like an OO language implementation detail. I find the two different capitalisations within the yaml file is confusing. |
Sure, it's just a data model, and OO is just an abstraction that's sometimes useful in modelling things. It's a fairly well known abstraction with cross language conventions, so I thought this would be a good way to disambiguate as well as making the specification more readable/concrete. Let's see what @apragsdale and @molpopgen think here. |
I've read through this thread twice and I'm having trouble coming up with a "short version" of the question at hand? |
The question is, which of the following versions so you think is better: A: defaults:
- selfing_rate: 0.1
demes:
- id: deme1
epochs:
- ...
defaults:
- selfing_rate: 0.2
- id: deme2
epochs:
- ... B: defaults:
- demes.epochs.selfing_rate: 0.1
demes:
- id: deme1
epochs:
- ...
defaults:
- epochs.selfing_rate: 0.2
- id: deme2
epochs:
- ... C: defaults:
- Epoch.selfing_rate: 0.1
demes:
- id: deme1
epochs:
- ...
defaults:
- Epoch.selfing_rate: 0.2
- id: deme2
epochs:
- ... I'm not so keen on A because we can imagine having a parameter named X sometime in the future that is in both (say) SymmetricMigration and in an Epoch. B and C are suggestions to disambiguate this. C is my suggestion where we disambiguate a field by prefixing it with its "class" name, with the understanding that we use a simple language independent object model as our framework for describing the entities in the model and their relationships. Graham isn't so keen on this, as he (reasonably) views the object oriented stuff as unnecessary. |
I agree here that we want to avoid A, for future-proofing like you all have pointed out. I'm a bit torn between B and C here, but I think C is a bit cleaner. There isn't really any ambiguity about where epochs are found (only within demes), so I don't know if we need the I do agree with Graham that the mixing of upper and lower case is a bit jarring perhaps. I don't know if we want to be flexible and allow either I'm going to wait until we iron out all the details here before finalizing the PR over in popsim-consortium/demes-python#172 |
Speaking of default migration rates, I could imagine something like a classical n-demes-in-a-line type model: ...
defaults:
- migrations.rate: 0.01
demes:
- id: deme_1
epochs:
- initial_size: 1000
...
- id: deme_n
epochs:
- initial_size: 1000
migrations:
- symmetric:
- demes: [deme_1, deme_2]
- ...
- demes: [deme_{n-1}, deme_n] or that migrations:
- defaults:
- rate: 0.01
- symmetric:
- demes: [deme_1, deme_2]
- ...
- demes: [deme_{n-1}, deme_n] |
As a example, we already have
That would work. It's effectively the same as saying
I'd certainly encourage having the rate defined closer to where it applies, as that's clearer to me. But it probably doesn't matter much if both things are supported. |
Great - being able to set default migration rates was just the sort of thing I was thinking about. So, the remaining question is: ...
defaults:
- migrations.rate: 0.01
- epochs.initial_size: 1000
demes:
- id: deme_1
...
- id: deme_n
migrations:
- demes: [deme_1, deme_2]
- ...
- demes: [deme_{n-1}, deme_n] or ...
defaults:
- Migration.rate: 0.01
- Epoch.initial_size: 1000
demes:
- id: deme_1
...
- id: deme_n
migrations:
- demes: [deme_1, deme_2]
- ...
- demes: [deme_{n-1}, deme_n] (I've simplified down the yaml here assuming we do #29; also, seems OK to implicitly define a single Epoch when using the default initial_size like this?) There's not much in it. The only difference I can see is that we have two things called |
I agree that B or C is the way to go. I'm even okay with C, using epoch instead of Epoch? At the level of the YAML, I'm not sure what the E gives us? |
Sure, I just thought it was most natural to think of Epoch and Migration etc as classes so we'd adopt the standard naming convention around that for clarity. So, I think of it as I don't feel strongly about it, happy to go with the consensus here. |
@jeromekelleher, are you suggesting we should allow defaults for any property/attribute? I'm fairly sure we don't want to allow defaults for, say, |
I think we need to have an explicit list of things that are allowed. I've been working on a clean and "obviously right" way to do the default value propagation in my #26 - it's surprisingly tricky. |
While writing the code for #26 I figured out a reasonably simple way to propagate defaults, which currently will work for any property. Parsing out the prefix from e.g. defaults:
migration:
rate: 0.01
epoch:
initial_size: 1000
demes:
- id: deme_1
...
- id: deme_n
migrations:
- demes: [deme_1, deme_2]
- ...
- demes: [deme_{n-1}, deme_n] See what I mean? |
Perfect! |
Yeah, that's quite nice. |
We've talked about how to deal with inherited attributes in a few places, so I thought I'd start an issue here to bring it together.
There's a tension between our goals of (a) providing a specification for humans to read that is non-repetitive; and (b) providing a specification for interchange that's as simple to implement as possible. For (a) we would definitely like to have inherited attributes, because we don't want to write down the same parameters over and again. For (b) we would kinda prefer not to have inherited attributes, because it does make the specification a bit more complex, and we would have to be quite precise about what demes parsers are expected to do.
So, here's a proposal. Our object tree currently looks like this:
I don't like the idea of having things like
cloning_rate
defined as root attributes of the graph, because this implies that everything inherits this as a default (even pulses and migrations, say).What if we added an explicit
defaults
section to the various levels of the tree, and the defaults are then given as fully qualified references? So, something likeSo, the semantics can be defined precisely (if a value isn't given, go up the tree looking at the defaults, and if you don't find any, error out), and we're not cluttering up the namespace with things that aren't actually related to the object in question. The problem of outputting a compact/simplified (I'm on the fence about what the right terminology is here) is one of figuring out what can be put into the defaults to minimise repetition. This is a parsimony problem, which I'm sure we can solve.
The downside is that parsers have to be a bit more complicated, but I think this is something we can live with. So, to be clear, we get rid of the distinction between the low-level fully qualified graph that's used for interchange and settle on this single form which requires that implementations understand how default values work.
What do we think? I know I'm vacillating on what should and shouldn't go in the JSON schema definition (apologies!), but I feel like we're very close to something we can define precisely and start building on!
The text was updated successfully, but these errors were encountered: