DRAGON computes correlations but not p-values #298

marouenbg · 2023-02-10T13:20:02Z

I am posting here for tracking purposes. So In DRAGON, sometimes when the data is large, I can't compute p-values to reduce large-scale networks.

The function fails in
https://github.com/netZoo/netZooPy/blob/master/netZooPy/dragon/dragon.py#L242

Here are the parameters of the failure:
Dlogli11 = lambda x: (1./4p1(p1-1)
*(sc.digamma(x/2)-sc.digamma((x-1)/2))
+term_Dlogli11)

term_Dlogli11=-77.92618920329457
p1=21337
n=832

Dlogli11(1.001)=227465526608.52557
Dlogli11(1000*n)=58.86679539046301

Here is the error:

Traceback (most recent call last):
File "", line 2, in
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 311, in estimate_p_values_dragon
simultaneous=simultaneous) for seedi in range(10)]
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 311, in
simultaneous=simultaneous) for seedi in range(10)]
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 250, in estimate_kappa_dragon
kappa11 = optimize.bisect(Dlogli11, 1.001, 1000*n)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/scipy/optimize/zeros.py", line 549, in bisect
r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
ValueError: f(a) and f(b) must have different signs

marouenbg · 2023-04-21T13:58:52Z

@katehoffshutta should we close this?

katehoffshutta · 2023-04-21T17:28:11Z

@marouenbg thanks for following up!

I would like to implement some control flow to help the user diagnose this problem before we close this issue.

However, as an update, the code for the solution has been implemented in PR #301.

The source of the issue was fitting the parameter kappa of the distribution described in Eq. (10) of the DRAGON paper (https://doi.org/10.1093/nar/gkac1157). In cases of a large predictor count P: sample size N ratio, the model is not able to fit kappa and so the null density in Eq. (10) cannot be used.

As an alternative, we have implemented a Monte Carlo-based p-value calculation. We simulate data under the null hypothesis of zero partial correlation between any of the variables in the mode. The edge weights of the DRAGON-estimated GGM are IID under this assumption and serve as an empirical null distribution. For P predictors, the number of edges is P(P-1)/2 and so the precision of the resulting empirical p-values is a quadratic function of the number of predictors P. I expect this precision to be sufficient for most practical applications (keeping in mind that the utility of this method is in the high-P, low-N situation).

Note that the Monte Carlo calculation can be somewhat slow. For example, with P=1300 and N=10000, the parametric distribution of equation (10) can be fit in 26 seconds while the Monte Carlo distribution takes 4 minutes 17 seconds. This could be further optimized with parallelization if we want to explore that.

I have a Jupyter notebook demonstrating the comparison of these two methods - should we host that somewhere? Perhaps netbooks?

marouenbg · 2023-04-22T22:47:12Z

Nice work Kate, then whenever you push your final updates, feel free to close this issue. Yes, a new notebook-or extending the existing Dragon vignette will be helpful.

katehoffshutta · 2023-05-26T18:04:56Z

@marouenbg @violafanfani We can close this issue; the code updates for the Monte Carlo p-values are active in the franklin release (0.9.15): https://github.com/netZoo/netZooPy/releases/tag/0.9.15

The control flow question remains open, but we are going to bundle this in with some other tasks to improve the DRAGON user workflow so I will make this a separate issue.

marouenbg mentioned this issue Apr 19, 2023

Add mc p-val function for DRAGON #301

Merged

violafanfani closed this as completed May 26, 2023

marouenbg mentioned this issue Dec 18, 2023

DRAGON- Ambiguous exception thrown during calculation of p-values #334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAGON computes correlations but not p-values #298

DRAGON computes correlations but not p-values #298

marouenbg commented Feb 10, 2023

marouenbg commented Apr 21, 2023

katehoffshutta commented Apr 21, 2023

marouenbg commented Apr 22, 2023

katehoffshutta commented May 26, 2023

DRAGON computes correlations but not p-values #298

DRAGON computes correlations but not p-values #298

Comments

marouenbg commented Feb 10, 2023

marouenbg commented Apr 21, 2023

katehoffshutta commented Apr 21, 2023

marouenbg commented Apr 22, 2023

katehoffshutta commented May 26, 2023