Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform sampling #31

Merged
merged 3 commits into from
Oct 19, 2022
Merged

Uniform sampling #31

merged 3 commits into from
Oct 19, 2022

Conversation

ognian-
Copy link
Contributor

@ognian- ognian- commented Oct 18, 2022

Sampling trees uniformly

@ognian- ognian- self-assigned this Oct 18, 2022
@willdumm
Copy link
Contributor

willdumm commented Oct 18, 2022

This looks very nice, but in order to avoid revisiting the sampling code soon (when completing the last step of #29 ) we think we should try to make sampling a bit more general. This will be very similar to what you already have done here, but with a more general interface.

We have two goals here, with a third distant goal which may influence the design:

  • To sample correctly from the probability distribution on trees induced by any set of (downward conditional) edge probabilities. A tree sampling method could take a reference to a vector of such edge probabilities, and choose edges below a clade according to these probabilities.
  • To sample an edge beneath a clade according to the subtree weights beneath child edges of that clade, cached by SubtreeWeight<WeightOps>.ComputeWeightBelow for arbitrary WeightOps. What you have implemented is an example of this, but we will use something very similar, replacing TreeCount with something else, as part of Sampling from the DAG's topological fringe #29. For example, a tree sampling method could take a reference to a SubtreeWeight instance, and a lambda that describes how to convert subtree weights of edges beneath a clade, to a probability distribution on those edges. In the current implementation, the implicit function simply normalizes the subtree weights corresponding to edges to derive a probability distribution. We could want to do something more complicated, for example if subtree weights can have different signs.

Mary and I can think of essentially two ways to achieve these goals. The first is to overload the sampling method to either take a reference to a vector of edge probabilities, indexed by edgeIds, or a SubtreeWeight instance and a lambda function.

The second option would be to only implement the first version of the sampling method, and define some other object which produces (something acting like) a vector of edge probabilities, given a SubtreeWeight instance and a lambda function. This option is slightly favored, given that another distant goal will be to compute node supports using (downward conditional) edge probabilities. It would be convenient to be able to use the same code to compute edge probabilities for sampling, as we use to compute edge probabilities for the node support calculation.

The function which takes edge weights and outputs a probability distribution on edges can take a vector of WeightOps::Weights and output a vector of floats (or some other discrete distribution type). Given a SubtreeWeight<WeightOps> instance subtreeweight, the input weights should be for each edge something like WeightOps::AboveNode(subtreeweight.ComputeWeightBelow(edge.childnode()), WeightOps::ComputeEdge(edge)) (with some changes needed of course). For uniform sampling, this function would just normalize the weights, which would be boost cpp_ints.

@willdumm
Copy link
Contributor

Ognian has satisfied these concerns. We can use the SampleSelector passed to ExtractTree to modify the behavior of the sampler, including sampling according to a vector of downward conditional edge probabilities. We'll eventually need some convenience methods for building such edge probabilities, and for sampling from them, so that we can use them for e.g. node support calculations.

Copy link
Contributor

@marybarker marybarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ognian- ognian- merged commit 9cfefad into main Oct 19, 2022
@ognian- ognian- deleted the uniform-sampling branch October 19, 2022 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants