memory error on SLURM servers #158

koenvandenberge · 2020-04-20T17:49:10Z

I have been using pySCENIC on a few datasets locally but have been looking into larger datasets that required me to move to an HPC infrastructure.
My dataset is not huge, around 10k cells and 17k genes. I have been able to successfully run the grnboost2 step, and have saved the output for that.
However, I keep running into issues when I try pruning the modules. I have an OSError: [Errno 12] Cannot allocate memory error, as I show below, even though the job does not exceed 8% of the available memory usage.

Have you experienced this before, and how should I deal with this?
How can I derive regulons without pruning?

>>> df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
    num_workers, module_chunksize)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 397, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/multiprocessing.py", line 167, in get
    initializer=initialize_worker_process)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
>>> dbs
[FeatherRankingDatabase(name="mm9-tss-centered-10kb-7species.mc9nr"), FeatherRankingDatabase(name="mm9-500bp-upstream-7species.mc9nr")]
>>> len(modules)
6801
>>> MOTIF_ANNOTATIONS_FNAME
'/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/mouse/motifs-v9-nr.mgi-m0.001-o0.0.tbl'

The text was updated successfully, but these errors were encountered:

cflerin · 2020-04-20T19:11:34Z

I have not used pySCENIC on slurm, but this must be something specific to your HPC, I would guess. It's clearly exceeding some memory limitation (a per process limit perhaps?). For this dataset, I'd expect maybe 2 or 3GB used per process, which isn't much. Have you tried lowering the number of processes?

You could also try the command line version instead of running it interactively (from a shell, check out pyscenic ctx -h).

If you just want to look at the TF-gene modules from the GRN without running the pruning step, you should be able to get it from the previous command:

modules = list(modules_from_adjacencies(adjacencies, exprMat))

but this won't provide any of the benefit of module pruning to generate regulons, of course.

koenvandenberge · 2020-04-20T19:17:28Z

Thanks for the swift reply. I deliberately did not use any parallelization here in an attempt to try and find out what was going wrong, and since the memory tracking is showing me that memory consumption is actually rather low, I was thinking the error may originate from something else, rather than a true memory issue. I will look into the command line version.

koenvandenberge · 2020-04-21T03:51:32Z

I am trying the CLI right now, where it starts running, but I seem to be getting an error related to auc. Any ideas?

(scenic2) gandalf.koenvdberge$ pyscenic ctx -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/mouse/motifs-v9-nr.mgi-m0.001-o0.0.tbl" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenicResults/modules.dat" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/mm9-tss-centered-10kb-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/mm9-500bp-upstream-7species.mc9nr.feather"  

/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}

2020-04-20 20:48:59,451 - pyscenic.cli.pyscenic - INFO - Loading modules.

2020-04-20 20:49:01,867 - pyscenic.cli.pyscenic - INFO - Loading databases.

2020-04-20 20:49:01,867 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
[                                        ] | 0% Completed | 25.1s
2020-04-20 20:49:29,803 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for EEF1D could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 27.3s
2020-04-20 20:49:31,903 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for EGR1 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 39.8s
2020-04-20 20:49:44,471 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for HNRNPC could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 42.5s
2020-04-20 20:49:47,100 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for HNRNPH3 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 47.4s
2020-04-20 20:49:52,063 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for LARP1 could be mapped to mm9-tss-centered-10kb-7species.mc9nr. Skipping this module.
[                                        ] | 0% Completed | 51.8s
Traceback (most recent call last):
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
    args.func(args)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    num_workers=args.num_workers)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df
    num_workers, module_chunksize)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/base.py", line 397, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/multiprocessing.py", line 192, in get
    raise_exception=reraise, **kwargs)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 501, in get_async
    raise_exception(exc, tb)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/compatibility.py", line 112, in reraise
    raise exc
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 272, in execute_task
    result = _execute_task(task, data)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 252, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 252, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/dask/local.py", line 253, in _execute_task
    return func(*args2)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in modules2df
    for module in modules])
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in <listcomp>
    for module in modules])
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 185, in module2df
    weighted_recovery=weighted_recovery)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/transform.py", line 129, in module2features_auc1st_impl
    aucs = calc_aucs(df, db.total_genes, weights, auc_threshold)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/recovery.py", line 284, in aucs
    assert maxauc > 0
AssertionError

ghuls · 2020-04-21T09:32:40Z

It looks like your gene matrix contains human (HGNC gene symbols) gene names instead of mouse (MGI gene symbols), so in that case use the human databases.

Hence the Less than 80% of the genes in Regulon for EEF1D could be mapped to mm9-tss-centered-10kb-7species.mc9nr warnings.

koenvandenberge · 2020-05-02T20:31:57Z

Thanks, working with the command line interface instead worked for me in avoiding the errors.

You were right that I had selected the mouse database while I should have selected the human one. However, a lot of these Less than 80% of the genes messages keep popping up which results in a low number of TFs after pruning. Is there a way TFs must be defined, eg uppercase vs lowercase or something I could check?

ghuls · 2020-05-02T20:59:04Z

Gene names for Human should be in HGNC gene symbols (uppercase):

koenvandenberge · 2020-05-03T04:06:48Z

Thanks, that's indeed what I am using, but I am still getting abundant warnings. Is that expected?

Here is my code

pyscenic grn -t -m grnboost2 -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn" "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_nogRNA.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/tf_names.csv"

pyscenic ctx -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune.dat" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn.csv" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-500bp-upstream-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-tss-centered-10kb-7species.mc9nr.feather"

pyscenic aucell -t -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output3_aucell" "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_nogRNA.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_prune.dat"

ghuls · 2020-05-04T13:55:56Z

Could you post a section of your gene names?

koenvandenberge · 2020-05-04T15:09:57Z

Sure, here's some gene names:

 "TTTY17B"   "TTTY17C"   "TTTY17A"   "TTTY4C"    "TTTY4B"    "TTTY4"     "BPY2C"     "BPY2B"     "BPY2" "DAZ4"      "DAZ1"      "DAZ3"      "DAZ2"      "TTTY3B"    "TTTY3"     "CDY1B"     "CDY1"      "CSPG4P1Y" "GOLGA2P3Y" "GOLGA2P2Y"

and TF names:

HOXA9
ZFP128
ZFP853
NR1H2
NR1H3
NR1H4
NR1H5
NR1I2

ghuls · 2020-05-04T15:50:18Z

I think the issue stems from the fact that you have long non-coding RNAs in your expression matrix. Our databases only contain normal genes (mostly protein coding, there might be some pseudogenes in there too), but not lincRNAs, microRNAs, ...RNA.
https://www.genenames.org/tools/search/#!/all?query=TTTY4C

koenvandenberge · 2020-05-04T15:57:40Z

Thanks, so I suspect it would be better to remove these prior to running pySCENIC such that the regulons aren't 'contaminated' with non-protein coding genes.

ghuls · 2020-05-04T15:59:32Z

Yes remove them before running the first step.

koenvandenberge · 2020-05-26T18:28:41Z

Hi, thanks for all the help so far.
Restricting the analysis to protein-coding genes only indeed reduces the Less than 80% of the genes messages. I am interested in an analysis without pruning, which seems to run successfully to some point until I get the following error I couldn't diagnose the origin from.

Here's the command I've used:

pyscenic ctx -t -n -o "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output2_noPrune.csv" --annotations_fname "/accounts/campus/koenvdberge/Documents/SCENIC/motif2tf/human/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" --num_workers 8 --expression_mtx_fname "/accounts/campus/koenvdberge/Documents/CRISPRi/data/counts_codingGenes_noQuotes.csv" "/accounts/campus/koenvdberge/Documents/CRISPRi/pyScenic_cli/output1_grn.csv" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-500bp-upstream-7species.mc9nr.feather" "/accounts/campus/koenvdberge/Documents/SCENIC/cisTarget_databases/human/hg19-tss-centered-10kb-7species.mc9nr.feather"

And the error:

[####################################### ] | 99% Completed | 20hr 14min  8.3s
[########################################] | 100% Completed | 20hr 14min  8.4s
Traceback (most recent call last):
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
    args.func(args)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    num_workers=args.num_workers)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/prune.py", line 380, in find_features
    filter_for_annotation=False, **kwargs), base_url=motif_base_url)
  File "/accounts/campus/koenvdberge/.conda/envs/scenic2/lib/python3.7/site-packages/pyscenic/utils.py", line 316, in add_motif_url
    df[("Enrichment", COLUMN_NAME_MOTIF_URL)] = list(map(partial(urljoin, base=base_url), df.index.get_level_values(COLUMN_NAME_MOTIF_ID)))
TypeError: urljoin() got multiple values for argument 'base'

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

koenvandenberge · 2020-06-15T20:58:57Z

Hi all,
Thanks for looking into this; seeing the recent commits mentioning this issue, would you consider it safe to proceed or is work on a fix still ongoing?

Thank you for the upates.

- Fixes #158

cflerin · 2020-06-16T13:03:20Z

Hi @koenvandenberge ,

The dev branch should now have a fix to skip modules with no genes overlapping the ranking database (this also would be solved when you removed non protein coding genes). But, sorry for not following up on your last issue (with urljoin). It seems like it was a bug in the calling function for the no pruning method (which we very rarely use here). I've pushed another fix to the dev branch, which you could install via:

pip install git+https://github.com/aertslab/pySCENIC.git@dev

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

- Fixes #275 - Previously addressed in #158

cflerin mentioned this issue Jun 3, 2020

Empty regulons after cisTarget pruning step[results] #177

Closed

cflerin added a commit that referenced this issue Jun 3, 2020

cisTarget step: Check for modules with zero db overlap.

6c7f460

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

cflerin added a commit that referenced this issue Jun 16, 2020

Fix bug in motif url contruction

4ae96c7

- Fixes #158

cflerin added a commit that referenced this issue Jul 17, 2020

cisTarget step: Check for modules with zero db overlap.

18f5f4e

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

cflerin mentioned this issue Jul 17, 2020

Dev #186

Merged

cflerin closed this as completed in 107fa42 Jul 17, 2020

cflerin added a commit that referenced this issue Mar 3, 2021

Fix bug in motif url construction (pt 2)

fd2c951

- Fixes #275 - Previously addressed in #158

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory error on SLURM servers #158

memory error on SLURM servers #158

koenvandenberge commented Apr 20, 2020

cflerin commented Apr 20, 2020

koenvandenberge commented Apr 20, 2020

koenvandenberge commented Apr 21, 2020

ghuls commented Apr 21, 2020

koenvandenberge commented May 2, 2020

ghuls commented May 2, 2020

koenvandenberge commented May 3, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 4, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 4, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 26, 2020

koenvandenberge commented Jun 15, 2020

cflerin commented Jun 16, 2020

memory error on SLURM servers #158

memory error on SLURM servers #158

Comments

koenvandenberge commented Apr 20, 2020

cflerin commented Apr 20, 2020

koenvandenberge commented Apr 20, 2020

koenvandenberge commented Apr 21, 2020

ghuls commented Apr 21, 2020

koenvandenberge commented May 2, 2020

ghuls commented May 2, 2020

koenvandenberge commented May 3, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 4, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 4, 2020

ghuls commented May 4, 2020

koenvandenberge commented May 26, 2020

koenvandenberge commented Jun 15, 2020

cflerin commented Jun 16, 2020