Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory error on SLURM servers #158

Closed
koenvandenberge opened this issue Apr 20, 2020 · 15 comments
Closed

memory error on SLURM servers #158

koenvandenberge opened this issue Apr 20, 2020 · 15 comments

Comments

@koenvandenberge
Copy link

I have been using pySCENIC on a few datasets locally but have been looking into larger datasets that required me to move to an HPC infrastructure.
My dataset is not huge, around 10k cells and 17k genes. I have been able to successfully run the grnboost2 step, and have saved the output for that.
However, I keep running into issues when I try pruning the modules. I have an OSError: [Errno 12] Cannot allocate memory error, as I show below, even though the job does not exceed 8% of the available memory usage.

  1. Have you experienced this before, and how should I deal with this?
  2. How can I derive regulons without pruning?
>>> df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
    num_workers, module_chunksize)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 397, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/multiprocessing.py", line 167, in get
    initializer=initialize_worker_process)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
>>> dbs
[FeatherRankingDatabase(name="mm9-tss-centered-10kb-7species.mc9nr"), FeatherRankingDatabase(name="mm9-500bp-upstream-7species.mc9nr")]
>>> len(modules)
6801
>>> MOTIF_ANNOTATIONS_FNAME
'/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/mouse/motifs-v9-nr.mgi-m0.001-o0.0.tbl'
@cflerin
Copy link
Contributor

cflerin commented Apr 20, 2020

I have not used pySCENIC on slurm, but this must be something specific to your HPC, I would guess. It's clearly exceeding some memory limitation (a per process limit perhaps?). For this dataset, I'd expect maybe 2 or 3GB used per process, which isn't much. Have you tried lowering the number of processes?

You could also try the command line version instead of running it interactively (from a shell, check out pyscenic ctx -h).

If you just want to look at the TF-gene modules from the GRN without running the pruning step, you should be able to get it from the previous command:

modules = list(modules_from_adjacencies(adjacencies, exprMat))

but this won't provide any of the benefit of module pruning to generate regulons, of course.

@koenvandenberge
Copy link
Author

Thanks for the swift reply. I deliberately did not use any parallelization here in an attempt to try and find out what was going wrong, and since the memory tracking is showing me that memory consumption is actually rather low, I was thinking the error may originate from something else, rather than a true memory issue. I will look into the command line version.

@koenvandenberge
Copy link
Author

I am trying the CLI right now, where it starts running, but I seem to be getting an error related to auc. Any ideas?

(scenic2) gandalf.koenvdberge$ pyscenic ctx -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/mouse/motifs-v9-nr.mgi-m0.001-o0.0.tbl" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenicResults/modules.dat" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/mm9-tss-centered-10kb-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather"  

/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}

2020-04-20 20:48:59,451 - pyscenic.cli.pyscenic - INFO - Loading modules.

2020-04-20 20:49:01,867 - pyscenic.cli.pyscenic - INFO - Loading databases.

2020-04-20 20:49:01,867 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
[                                        ] | 0% Completed | 25.1s
2020-04-20 20:49:29,803 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for EEF1D could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 27.3s
2020-04-20 20:49:31,903 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for EGR1 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 39.8s
2020-04-20 20:49:44,471 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for HNRNPC could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 42.5s
2020-04-20 20:49:47,100 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for HNRNPH3 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 47.4s
2020-04-20 20:49:52,063 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for LARP1 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 51.8s
Traceback (most recent call last):
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
    args.func(args)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    num_workers=args.num_workers)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
    num_workers, module_chunksize)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 397, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/multiprocessing.py", line 192, in get
    raise_exception=reraise, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 501, in get_async
    raise_exception(exc, tb)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/compatibility.py", line 112, in reraise
    raise exc
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 272, in execute_task
    result = _execute_task(task, data)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 252, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 252, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 253, in _execute_task
    return func(*args2)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in modules2df
    for module in modules])
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in <listcomp>
    for module in modules])
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 185, in module2df
    weighted_recovery=weighted_recovery)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 129, in module2features_auc1st_impl
    aucs = calc_aucs(df, db.total_genes, weights, auc_threshold)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/recovery.py", line 284, in aucs
    assert maxauc > 0
AssertionError

@ghuls
Copy link
Member

ghuls commented Apr 21, 2020

It looks like your gene matrix contains human (HGNC gene symbols) gene names instead of mouse (MGI gene symbols), so in that case use the human databases.

Hence the Less than 80% of the genes in Regulon for EEF1D could be mapped to mm9-tss-centered-10kb-7species.mc9nr warnings.

@koenvandenberge
Copy link
Author

Thanks, working with the command line interface instead worked for me in avoiding the errors.

You were right that I had selected the mouse database while I should have selected the human one. However, a lot of these Less than 80% of the genes messages keep popping up which results in a low number of TFs after pruning. Is there a way TFs must be defined, eg uppercase vs lowercase or something I could check?

@ghuls
Copy link
Member

ghuls commented May 2, 2020

Gene names for Human should be in HGNC gene symbols (uppercase):

@koenvandenberge
Copy link
Author

Thanks, that's indeed what I am using, but I am still getting abundant warnings. Is that expected?

Here is my code

pyscenic grn -t -m grnboost2 -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn" "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_nogRNA.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/tf_names.csv"

pyscenic ctx -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune.dat" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn.csv" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-500bp-upstream-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-tss-centered-10kb-7species.mc9nr.feather"

pyscenic aucell -t -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output3_aucell" "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_nogRNA.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune.dat"

@ghuls
Copy link
Member

ghuls commented May 4, 2020

Could you post a section of your gene names?

@koenvandenberge
Copy link
Author

Sure, here's some gene names:

 "TTTY17B"   "TTTY17C"   "TTTY17A"   "TTTY4C"    "TTTY4B"    "TTTY4"     "BPY2C"     "BPY2B"     "BPY2" "DAZ4"      "DAZ1"      "DAZ3"      "DAZ2"      "TTTY3B"    "TTTY3"     "CDY1B"     "CDY1"      "CSPG4P1Y" "GOLGA2P3Y" "GOLGA2P2Y"

and TF names:

HOXA9
ZFP128
ZFP853
NR1H2
NR1H3
NR1H4
NR1H5
NR1I2

@ghuls
Copy link
Member

ghuls commented May 4, 2020

I think the issue stems from the fact that you have long non-coding RNAs in your expression matrix. Our databases only contain normal genes (mostly protein coding, there might be some pseudogenes in there too), but not lincRNAs, microRNAs, ...RNA.
https://www.genenames.org/tools/search/#!/all?query=TTTY4C

@koenvandenberge
Copy link
Author

Thanks, so I suspect it would be better to remove these prior to running pySCENIC such that the regulons aren't 'contaminated' with non-protein coding genes.

@ghuls
Copy link
Member

ghuls commented May 4, 2020

Yes remove them before running the first step.

@koenvandenberge
Copy link
Author

Hi, thanks for all the help so far.
Restricting the analysis to protein-coding genes only indeed reduces the Less than 80% of the genes messages. I am interested in an analysis without pruning, which seems to run successfully to some point until I get the following error I couldn't diagnose the origin from.

Here's the command I've used:

pyscenic ctx -t -n -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_noPrune.csv" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" --num_workers 8 --expression_mtx_fname "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_codingGenes_noQuotes.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn.csv" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-500bp-upstream-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-tss-centered-10kb-7species.mc9nr.feather"

And the error:

[####################################### ] | 99% Completed | 20hr 14min  8.3s
[########################################] | 100% Completed | 20hr 14min  8.4s
Traceback (most recent call last):
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
    args.func(args)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    num_workers=args.num_workers)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 380, in find_features
    filter_for_annotation=False, **kwargs), base_url=motif_base_url)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/utils.py", line 316, in add_motif_url
    df[("Enrichment", COLUMN_NAME_MOTIF_URL)] = list(map(partial(urljoin, base=base_url), df.index.get_level_values(COLUMN_NAME_MOTIF_ID)))
TypeError: urljoin() got multiple values for argument 'base'

cflerin added a commit that referenced this issue Jun 3, 2020
- Previously such modules would cause an error, now these modules are
skipped.
- Related to #158, #177, #132, #85
@koenvandenberge
Copy link
Author

Hi all,
Thanks for looking into this; seeing the recent commits mentioning this issue, would you consider it safe to proceed or is work on a fix still ongoing?

Thank you for the upates.

cflerin added a commit that referenced this issue Jun 16, 2020
@cflerin
Copy link
Contributor

cflerin commented Jun 16, 2020

Hi @koenvandenberge ,

The dev branch should now have a fix to skip modules with no genes overlapping the ranking database (this also would be solved when you removed non protein coding genes). But, sorry for not following up on your last issue (with urljoin). It seems like it was a bug in the calling function for the no pruning method (which we very rarely use here). I've pushed another fix to the dev branch, which you could install via:

pip install git+https://github.com/aertslab/pySCENIC.git@dev

cflerin added a commit that referenced this issue Jul 17, 2020
- Previously such modules would cause an error, now these modules are
skipped.
- Related to #158, #177, #132, #85
@cflerin cflerin mentioned this issue Jul 17, 2020
Merged
cflerin added a commit that referenced this issue Mar 3, 2021
- Fixes #275
- Previously addressed in #158
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants