Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAGON computes correlations but not p-values #298

Closed
marouenbg opened this issue Feb 10, 2023 · 4 comments
Closed

DRAGON computes correlations but not p-values #298

marouenbg opened this issue Feb 10, 2023 · 4 comments

Comments

@marouenbg
Copy link
Collaborator

Hi @katehoffshutta,

I am posting here for tracking purposes. So In DRAGON, sometimes when the data is large, I can't compute p-values to reduce large-scale networks.

The function fails in
https://github.com/netZoo/netZooPy/blob/master/netZooPy/dragon/dragon.py#L242

Here are the parameters of the failure:
Dlogli11 = lambda x: (1./4p1(p1-1)
*(sc.digamma(x/2)-sc.digamma((x-1)/2))
+term_Dlogli11)

term_Dlogli11=-77.92618920329457
p1=21337
n=832

Dlogli11(1.001)=227465526608.52557
Dlogli11(1000*n)=58.86679539046301

Here is the error:

Traceback (most recent call last):
File "", line 2, in
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 311, in estimate_p_values_dragon
simultaneous=simultaneous) for seedi in range(10)]
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 311, in
simultaneous=simultaneous) for seedi in range(10)]
File "/home/ubuntu/netZooPy/netZooPy/dragon/dragon.py", line 250, in estimate_kappa_dragon
kappa11 = optimize.bisect(Dlogli11, 1.001, 1000*n)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/scipy/optimize/zeros.py", line 549, in bisect
r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
ValueError: f(a) and f(b) must have different signs

@marouenbg
Copy link
Collaborator Author

@katehoffshutta should we close this?

@katehoffshutta
Copy link
Contributor

@marouenbg thanks for following up!

I would like to implement some control flow to help the user diagnose this problem before we close this issue.

However, as an update, the code for the solution has been implemented in PR #301.

The source of the issue was fitting the parameter kappa of the distribution described in Eq. (10) of the DRAGON paper (https://doi.org/10.1093/nar/gkac1157). In cases of a large predictor count P: sample size N ratio, the model is not able to fit kappa and so the null density in Eq. (10) cannot be used.

As an alternative, we have implemented a Monte Carlo-based p-value calculation. We simulate data under the null hypothesis of zero partial correlation between any of the variables in the mode. The edge weights of the DRAGON-estimated GGM are IID under this assumption and serve as an empirical null distribution. For P predictors, the number of edges is P(P-1)/2 and so the precision of the resulting empirical p-values is a quadratic function of the number of predictors P. I expect this precision to be sufficient for most practical applications (keeping in mind that the utility of this method is in the high-P, low-N situation).

Note that the Monte Carlo calculation can be somewhat slow. For example, with P=1300 and N=10000, the parametric distribution of equation (10) can be fit in 26 seconds while the Monte Carlo distribution takes 4 minutes 17 seconds. This could be further optimized with parallelization if we want to explore that.

I have a Jupyter notebook demonstrating the comparison of these two methods - should we host that somewhere? Perhaps netbooks?

@marouenbg
Copy link
Collaborator Author

Nice work Kate, then whenever you push your final updates, feel free to close this issue. Yes, a new notebook-or extending the existing Dragon vignette will be helpful.

@katehoffshutta
Copy link
Contributor

@marouenbg @violafanfani We can close this issue; the code updates for the Monte Carlo p-values are active in the franklin release (0.9.15): https://github.com/netZoo/netZooPy/releases/tag/0.9.15

The control flow question remains open, but we are going to bundle this in with some other tasks to improve the DRAGON user workflow so I will make this a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants