Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running example notebook is taking a long time #11

Closed
liboxun opened this issue May 8, 2020 · 5 comments
Closed

Running example notebook is taking a long time #11

liboxun opened this issue May 8, 2020 · 5 comments

Comments

@liboxun
Copy link

liboxun commented May 8, 2020

I'm trying to run the PBMC tutorial Jupyter notebook (PBMC10k_SCENIC-protocol-CLI.ipynb).

It's taking some time to run pyscenic ctx. Right now it's been two days and it's still running. I'm running it with an on-campus HPC service. I'm starting to think maybe there's something that I overlooked.

How long should it typically take to run pyscenic ctx for the PBMC example?

Thanks in advance!

Boxun

@liboxun
Copy link
Author

liboxun commented May 8, 2020

Here's the output I got so far:

2020-05-06 10:42:51,713 - pyscenic.cli.pyscenic - INFO - Creating modules.

2020-05-06 10:42:54,498 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2020-05-06 10:43:00,178 - pyscenic.utils - INFO - Calculating Pearson correlations.

2020-05-06 10:43:00,178 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
Dropout masking is currently set to [True].
/home2/s418610/.conda/envs/py37_res_GRN/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in greater
regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)
/home2/s418610/.conda/envs/py37_res_GRN/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in less
regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)

2020-05-06 10:43:29,853 - pyscenic.utils - INFO - Creating modules.

2020-05-06 10:45:26,430 - pyscenic.cli.pyscenic - INFO - Loading databases.

2020-05-06 10:45:26,434 - pyscenic.cli.pyscenic - INFO - Calculating regulons.

And it's been running 'Calculating regulons' since then.

@cflerin
Copy link
Contributor

cflerin commented May 8, 2020

Hi @liboxun ,

It should not take 2+ days to run this step. Depending on the number of processes used, I'd expect it to complete in under an hour at worst. I would suggest maybe stopping the process, and re-starting it. Also, are you using the same database files as in the tutorial?

@liboxun
Copy link
Author

liboxun commented May 8, 2020

Hi @cflerin ,

Thanks for the quick reply! Good to know.

I've submitted multiple jobs (with the same script), and it never ended within a day. I use 32 processes, as it's the number of cores of the HPC computer I use. Therefore re-starting seems not to solve the problem.

I believe I'm using the same databases as in the tutorial. Quoting the PBMC10k_SCENIC-protocol-CLI.ipynb:

ranking databases

f_db_glob = "/ddn1/vol1/staging/leuven/res_00001/databases/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/*feather"
f_db_names = ' '.join( glob.glob(f_db_glob) )

motif databases

f_motif_path = "/ddn1/vol1/staging/leuven/res_00001/databases/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl"

In comparison, the databases I'm using are downloaded from:

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather

https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl

To me it seems they match up.

@cflerin
Copy link
Contributor

cflerin commented May 18, 2020

Hi @liboxun , I think we solved your issue in the pySCENIC issue tracker, but for anyone else having the same issue, I'll leave this link to a list of recommendations that could potentially solve this:
aertslab/pySCENIC#142 (comment)

@cflerin cflerin closed this as completed May 18, 2020
@liboxun
Copy link
Author

liboxun commented May 20, 2020

Hi @cflerin ,

Yes, and thank you!

As a reference for anybody that might be having the same issue: for me personally running a Singularity image of pySCENIC instead of the CLI solved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants