Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the MultiPop functionality #120

Open
gregorgorjanc opened this issue Apr 1, 2023 · 7 comments
Open

Expand the MultiPop functionality #120

gregorgorjanc opened this issue Apr 1, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@gregorgorjanc
Copy link
Contributor

gregorgorjanc commented Apr 1, 2023

Work with Philip on

  • renaming MegaPop to MultiPop
  • looking into S4 method dispatch
  • expanding other/some AlphaSimR functions so they all work with MultiPop (what are the design principles we will follow?)
  • crossing - how do we handle that???
  • how should we handle information related to MultiPop entries (beyond just the Information on individuals within Pop of a MultiPop)
@philipbg
Copy link
Contributor

Some functions that could be useful in the GenoForage project and beyond:

makeMultipop (parents): this function will take a population of parents and cross them to make families, one family for each subpopulation, and compile these subpopulations/families into a multipop. Eg. F1<-makeMultipop(Parents)

crossWithin(multipop): this function will go through all the subpopulations of a multipop and do crossing within each of them e.g., F2 <- crossWithin(F1)

subset(multipop): this function will take subpopulations from the multipop and sort them into new subpopulations of a new multipop. I would use this for making synthetics from F2s. E.g., Syn1 <- subset(F2 after selection)

popSelection(multipop): this function will conduct selection among subpopulations on the basis of each subpopulation’s trait, e.g., mean phenotype. It is what we have been calling sward selection in the GenoForage project. E.g., F2 after selection <- popSelection(F2) [partially covered by selectFam() which takes a multipop argument, but is for full sib families]

Storage of information:

Along with the subpopulations I would need to be able to store various pieces of information about each subpopulation, such as their population-level phenotype. These would need to be updated during different function calls. For example, when I do a crossWithin() I want to update the subpopulation phenotypes.

@gaynorr gaynorr added this to the Release 1.5 milestone Jun 26, 2023
@gaynorr gaynorr added the enhancement New feature or request label Jun 26, 2023
@gaynorr
Copy link
Owner

gaynorr commented Jun 26, 2023

I'll tentatively put an initial version of this functionality on the roadmap for 1.5, because I haven't started on the 1.5 release yet and this will keep these ideas from being forgotten by me.

@gregorgorjanc
Copy link
Contributor Author

Met with @philipbg today. He will provide PR for "renaming MegaPop to MultiPop". This is just cosmetics, but it helps with understanding that this new class / object will hold multiple populations.

In the meeting, we identified a couple of interesting points.

  • the current pop@misc slot is a list that is meant to be of length nInd(pop) hence holding values for individuals
  • there might be cases where we would like to save a value that pertains to the whole population, and not any particular individual - some people in the lab use pop@misc slot for this, but it seems we really should have pop@popMisc
  • potentially call these slots pop@indMisc and pop@popMics to make it clear what they hold

@philipbg philipbg mentioned this issue Jul 10, 2023
@gaynorr
Copy link
Owner

gaynorr commented Jul 12, 2023

@gregorgorjanc The main reservation I have about population level slots is dealing with concatenation c() and subsetting [] of populations. When you change the population, you have to have rules in place for how to handle these slots and I don't currently have a good idea for how to best handle this.

@gregorgorjanc
Copy link
Contributor Author

@gaynorr I know what you mean!

Correct, one would have to recalculate pop@popMisc value if we do c(pop, pop2) or pop[subset]! In SIMplyBee we decided to always request calculation of colony values from individual values for the same reason - colonies are changing all the time! Based on that, we named functions getX() if they are extracting/getting existing values and calcX() if they are calculating values based on existing values. Somewhat like you do with gv() (extract/get genetic value) and bv() (calculate breeding value).

Maybe this then means that pop@popMisc is not a good idea at all? However, in some breeding populations having such a container is useful. If such a container is not available, then we have to do stuff like list(pop, popMisc) and the scripting becomes somewhat more tedious :(

Any further thoughts?

@gaynorr
Copy link
Owner

gaynorr commented Jul 12, 2023

@gregorgorjanc, I have been considering implementing a phenotype recording system in SimParam. The idea would be for it to serve as a database for all simulated phenotypes. Perhaps this sort of data would make sense living somewhere like this.

I can already track all genotypes created (in most cases) by using the recombination tracking option in SimParam. SimParam also stores the founder population. As long as there is only one founder population, every individual every simulated could have its genotype reconstructed with these two pieces of information. The idea here would be to use the phenotype records and genotype records as an alternative method for training GS models or calculating BLUPs across multiple stages of selection.

@gregorgorjanc
Copy link
Contributor Author

@gaynorr yeah, we definitely need some sort of a database system, particularly once we get into repeated phenotypes! For example, in dairy cattle simulations we did something like this

…
database = recordData(database, cowsLact2, year)
cowsLact2 = setPhenoCow(cowsLact2, meanLact2, year, …)
cowsLact2 = selectInd(cowsLact1, …)

database = recordData(database, cowsLact1, year)
cowsLact1 = setPhenoCow(cowsLact1, meanLact1, …)
cowsLact1 = selectInd(heifers, …)

database = recordData(database, heifers, year)
heifers = selectInd(femaleCalves, …)

database = recordData(database, femaleCows, year)
femaleCalves = randCross(…)
...

and database is just a list of data.frames - it actually works quite well, but one could and likely should think about a proper SQL database too! The challenge with pushing everything into SimParam could be that you might not think of some detail that a user comes up, for example, we added lactation mean, herd effect, herd-year effect, permanent environment effects etc. into setPhenoCow() and recordData() grabs that info (most of it is in the misc slot!). It's hard to envision the requirements for storing such meta-data in the SimParam!? Perhaps function hooks could do all that!?

But we are digressing;) How do we make work with a collection of populations (in MultiPop) as easy as we can now work with a collection of individuals (in Pop)? That's what started this thread;)

We have a valid concern that sub-setting (pop[subset]) or combining (c(pop1, pop2)) challenge the use of population-level information pop@popMisc - because these two actions would make this population-level information inaccurate/outdated. We could alternatively have calcX(pop) functions, but then I still have to store that output somewhere and if I am doing selections between populations, say multiPop[subset] or selectPop(multiPop, ...), then it feels natural to have that output/information attached to the populations. Maybe functions like selectPop(multiPop, ...) should all have a FUN argument where a user can pass stuff like calcX()? A rather messy way too look at this is to recognise that users can put anything into pop@popMisc so it's their responsibility to manage it!? Not ideal.

Maybe the following analogy could help us with the thinking. In plant breathing we often work with plots full of plants. At the moment we often represent plot with a single genotype in AlphaSimR and a population would represent many such plots, which is an approximation, unless the genotype is indeed fully inbred. What if we would actually like to generate individuals within a plot and have plots represented with populations? How would we then be doing the scripting? For example, say we want to select between plots using plot yields, which would be sum(pop@pheno).

gaynorr added a commit that referenced this issue Oct 30, 2023
-added `altAddTraitAD` #140

-add miscPop slot to pop #120

-expanded functionality of candidates in selection #138
@gaynorr gaynorr removed this from the Release 1.5 milestone Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants