You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to sample trees from the DAG in a way that takes into account the distribution of trees in the DAG itself.
In order to do this efficiently, we would like to attach probability distributions to edges.
If the function count(e) yields the number of subtrees possible below edge e, then we assign a probability to every edge $e$ in a clade $c$ as the number of subtrees divided by the total number of subtrees for all the edges in the clade: $$P(e) = \frac{count(e)}{\sum_{e_i \in c} count(e_i)}$$
Then trees can be sampled uniformly by following an iterative process beginning at the UA node:
for each clade below the current node, choose an edge(and hence, child node) from the set of edges descending from the clade based on the weights assigned to those edges.
Sampling uniformly from a constrained DAG:
We would also like to sample uniformly from the set of trees in the DAG that contain a given (fixed) node.
There are 2 components to this problem:
Getting the set of trees that contain a fixed node.
Sampling uniformly from that set.
It turns out that this has a fairly straightforward answer, if we have already implemented a uniform sampling method.
There is a way to efficiently count the number of trees that a node shows up in (see method count_nodes in historydag repo).
Then we can assign probabilities to each node: $P(n) = \frac{|\text{trees containing }n|}{|\text{trees in the DAG}|}$.
Here is a bit of useful notation:
For a node n in a DAG, the leafset of n is the set of leaf nodes reachable below n.
If we have a tree T and a node n defining a subtree below $n$, then the subtree complement of n is the result when we prune the entire subtree below n from T.
In a DAG, if we fix a given node, then any tree in the DAG that contains the node can be partitioned into the subtree below n and a subtree complement, with leafset that is disjoint from the leafset of n.
Note that, for a fixed node n, sampling a subtree below n and sampling a subtree-complement from the DAG are independent.
Therefore to sample uniformly, given the constrained DAG, we can use a uniform sample below n and a uniform sample taken from the subtree complement separately.
The difficulty lies in how to choose a path between 'upward' from n to the UA such that the resulting path is also taken from a uniform distribution on such paths.
This is analogous to the 'downward' sampling problem, with the following 2 small modifications:
Only a single parent node is chosen at each step, whereas in the downward problem we have a single edge chosen from each clade, and a node may have multiple clades.
The probabilities assigned to edges in the upward direction must be calculated based on the choice of the node at each step.
At a given node n with k potential parents p1 ,...pk, we define the probability to assign in the upward direction to edge $e$i connecting n to pi using Bayes' rule as:
Here $P(n)$ and $P(p_i)$ are probabilities assigned to nodes, which we can compute.
Also $P(e_i)$ is the probability assigned to edge $e_i$ based on its membership in a clade.
The text was updated successfully, but these errors were encountered:
In general, given downward conditional edge probabilities $P(n_c | n_p)$ for each directed edge $(n_p, n_c)$, we can compute the probability of a node $n$ with the following recursion, given that the set $S(n)$ contains parent nodes of $n$ in the hDAG, and that the probability of the UA node $\rho$, $P(\rho) = 1$
Then the probability of an edge is the probability of its parent node, times its own downward conditional probability.
willdumm
changed the title
Sampling from the DAG with probabilities (and constriants)
Sampling from the DAG with probabilities (and constraints)
Nov 16, 2022
We can discuss, but this doesn't appear to be a direction we are heading because we aren't thinking that the hDAG itself will "cover" enough of the probabilistically-interesting space.
Sampling uniformly from the DAG:
We would like to sample trees from the DAG in a way that takes into account the distribution of trees in the DAG itself.
In order to do this efficiently, we would like to attach probability distributions to edges.$e$ in a clade $c$ as the number of subtrees divided by the total number of subtrees for all the edges in the clade:
$$P(e) = \frac{count(e)}{\sum_{e_i \in c} count(e_i)}$$
If the function
count(e)
yields the number of subtrees possible below edgee
, then we assign a probability to every edgeThen trees can be sampled uniformly by following an iterative process beginning at the UA node:
for each clade below the current node, choose an edge(and hence, child node) from the set of edges descending from the clade based on the weights assigned to those edges.
Sampling uniformly from a constrained DAG:
We would also like to sample uniformly from the set of trees in the DAG that contain a given (fixed) node.
There are 2 components to this problem:
It turns out that this has a fairly straightforward answer, if we have already implemented a uniform sampling method.
There is a way to efficiently count the number of trees that a node shows up in (see method
count_nodes
in historydag repo).Then we can assign probabilities to each node:
Here is a bit of useful notation:
n
in a DAG, the leafset ofn
is the set of leaf nodes reachable belown
.T
and a noden
defining a subtree belown
is the result when we prune the entire subtree belown
fromT
.In a DAG, if we fix a given node, then any tree in the DAG that contains the node can be partitioned into the subtree below
n
and a subtree complement, with leafset that is disjoint from the leafset ofn
.Note that, for a fixed node
n
, sampling a subtree belown
and sampling a subtree-complement from the DAG are independent.Therefore to sample uniformly, given the constrained DAG, we can use a uniform sample below
n
and a uniform sample taken from the subtree complement separately.The difficulty lies in how to choose a path between 'upward' from
n
to the UA such that the resulting path is also taken from a uniform distribution on such paths.This is analogous to the 'downward' sampling problem, with the following 2 small modifications:
At a given node n with k potential parents p1 ,...pk, we define the probability to assign in the upward direction to edge$e$ i connecting n to pi using Bayes' rule as:
Here$P(n)$ and $P(p_i)$ are probabilities assigned to nodes, which we can compute.$P(e_i)$ is the probability assigned to edge $e_i$ based on its membership in a clade.
Also
The text was updated successfully, but these errors were encountered: