Sampling from the DAG with probabilities (and constraints) #28

marybarker · 2022-10-12T21:34:36Z

Sampling uniformly from the DAG:

We would like to sample trees from the DAG in a way that takes into account the distribution of trees in the DAG itself.

In order to do this efficiently, we would like to attach probability distributions to edges.
If the function count(e) yields the number of subtrees possible below edge e, then we assign a probability to every edge $e$ in a clade $c$ as the number of subtrees divided by the total number of subtrees for all the edges in the clade:
$$P(e) = \frac{count(e)}{\sum_{e_i \in c} count(e_i)}$$
Then trees can be sampled uniformly by following an iterative process beginning at the UA node:
for each clade below the current node, choose an edge(and hence, child node) from the set of edges descending from the clade based on the weights assigned to those edges.

Sampling uniformly from a constrained DAG:

We would also like to sample uniformly from the set of trees in the DAG that contain a given (fixed) node.
There are 2 components to this problem:

Getting the set of trees that contain a fixed node.
Sampling uniformly from that set.
It turns out that this has a fairly straightforward answer, if we have already implemented a uniform sampling method.
There is a way to efficiently count the number of trees that a node shows up in (see method count_nodes in historydag repo).
Then we can assign probabilities to each node: $P(n) = \frac{|\text{trees containing }n|}{|\text{trees in the DAG}|}$.

Here is a bit of useful notation:

For a node n in a DAG, the leafset of n is the set of leaf nodes reachable below n.
If we have a tree T and a node n defining a subtree below $n$, then the subtree complement of n is the result when we prune the entire subtree below n from T.

In a DAG, if we fix a given node, then any tree in the DAG that contains the node can be partitioned into the subtree below n and a subtree complement, with leafset that is disjoint from the leafset of n.

Note that, for a fixed node n, sampling a subtree below n and sampling a subtree-complement from the DAG are independent.
Therefore to sample uniformly, given the constrained DAG, we can use a uniform sample below n and a uniform sample taken from the subtree complement separately.
The difficulty lies in how to choose a path between 'upward' from n to the UA such that the resulting path is also taken from a uniform distribution on such paths.

This is analogous to the 'downward' sampling problem, with the following 2 small modifications:

Only a single parent node is chosen at each step, whereas in the downward problem we have a single edge chosen from each clade, and a node may have multiple clades.
The probabilities assigned to edges in the upward direction must be calculated based on the choice of the node at each step.

At a given node n with k potential parents p₁ ,...p_k, we define the probability to assign in the upward direction to edge $e$_i connecting n to p_i using Bayes' rule as:

$$ P(e_i)^{\text{upward}} := P(p_i | n) = \frac{P(n | p_i )P(p_i ) } {P(n)} =\frac{P(e_i) P(p_i)}{p(n)}$$

Here $P(n)$ and $P(p_i)$ are probabilities assigned to nodes, which we can compute.
Also $P(e_i)$ is the probability assigned to edge $e_i$ based on its membership in a clade.

The text was updated successfully, but these errors were encountered:

willdumm · 2022-11-01T18:02:12Z

In general, given downward conditional edge probabilities $P(n_c | n_p)$ for each directed edge $(n_p, n_c)$, we can compute the probability of a node $n$ with the following recursion, given that the set $S(n)$ contains parent nodes of $n$ in the hDAG, and that the probability of the UA node $\rho$, $P(\rho) = 1$

$$ P(n)=\sum\limits_{n_{p} \in S(n)} P(n_p) P(n | n_p) $$

(We think)

Then the probability of an edge is the probability of its parent node, times its own downward conditional probability.

matsen · 2023-06-20T22:17:56Z

We can discuss, but this doesn't appear to be a direction we are heading because we aren't thinking that the hDAG itself will "cover" enough of the probabilistically-interesting space.

willdumm added the proposed feature label Oct 17, 2022

willdumm mentioned this issue Oct 18, 2022

Sampling from the DAG's topological fringe #29

Open

willdumm changed the title ~~Sampling from the DAG with probabilities (and constriants)~~ Sampling from the DAG with probabilities (and constraints) Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling from the DAG with probabilities (and constraints) #28

Sampling from the DAG with probabilities (and constraints) #28

marybarker commented Oct 12, 2022 •

edited

Loading

willdumm commented Nov 1, 2022 •

edited

Loading

matsen commented Jun 20, 2023

Sampling from the DAG with probabilities (and constraints) #28

Sampling from the DAG with probabilities (and constraints) #28

Comments

marybarker commented Oct 12, 2022 • edited Loading

Sampling uniformly from the DAG:

Sampling uniformly from a constrained DAG:

willdumm commented Nov 1, 2022 • edited Loading

matsen commented Jun 20, 2023

marybarker commented Oct 12, 2022 •

edited

Loading

willdumm commented Nov 1, 2022 •

edited

Loading