-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory error on SLURM servers #158
Comments
I have not used pySCENIC on slurm, but this must be something specific to your HPC, I would guess. It's clearly exceeding some memory limitation (a per process limit perhaps?). For this dataset, I'd expect maybe 2 or 3GB used per process, which isn't much. Have you tried lowering the number of processes? You could also try the command line version instead of running it interactively (from a shell, check out If you just want to look at the TF-gene modules from the GRN without running the pruning step, you should be able to get it from the previous command:
but this won't provide any of the benefit of module pruning to generate regulons, of course. |
Thanks for the swift reply. I deliberately did not use any parallelization here in an attempt to try and find out what was going wrong, and since the memory tracking is showing me that memory consumption is actually rather low, I was thinking the error may originate from something else, rather than a true memory issue. I will look into the command line version. |
I am trying the CLI right now, where it starts running, but I seem to be getting an error related to auc. Any ideas?
|
It looks like your gene matrix contains human (HGNC gene symbols) gene names instead of mouse (MGI gene symbols), so in that case use the human databases. Hence the |
Thanks, working with the command line interface instead worked for me in avoiding the errors. You were right that I had selected the mouse database while I should have selected the human one. However, a lot of these |
Gene names for Human should be in HGNC gene symbols (uppercase): |
Thanks, that's indeed what I am using, but I am still getting abundant warnings. Is that expected? Here is my code
|
Could you post a section of your gene names? |
Sure, here's some gene names:
and TF names:
|
I think the issue stems from the fact that you have long non-coding RNAs in your expression matrix. Our databases only contain normal genes (mostly protein coding, there might be some pseudogenes in there too), but not lincRNAs, microRNAs, ...RNA. |
Thanks, so I suspect it would be better to remove these prior to running |
Yes remove them before running the first step. |
Hi, thanks for all the help so far. Here's the command I've used:
And the error:
|
Hi all, Thank you for the upates. |
Hi @koenvandenberge , The dev branch should now have a fix to skip modules with no genes overlapping the ranking database (this also would be solved when you removed non protein coding genes). But, sorry for not following up on your last issue (with
|
I have been using pySCENIC on a few datasets locally but have been looking into larger datasets that required me to move to an HPC infrastructure.
My dataset is not huge, around 10k cells and 17k genes. I have been able to successfully run the
grnboost2
step, and have saved the output for that.However, I keep running into issues when I try pruning the modules. I have an
OSError: [Errno 12] Cannot allocate memory
error, as I show below, even though the job does not exceed 8% of the available memory usage.The text was updated successfully, but these errors were encountered: