AssertionError in prune2df #132

Matthias3033 · 2020-02-09T18:45:32Z

Hi,

I get the following error message when I use the function prune2df:

AssertionError Traceback (most recent call last)
in
3 # Calculate a list of enriched motifs and the corresponding target genes for all modules.
4 with ProgressBar():
----> 5 df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME_HS)
6
7 # Create regulons from this table of enriched motifs.

~/miniconda3/lib/python3.7/site-packages/pyscenic/prune.py in prune2df(rnkdbs, modules, motif_annotations_fname, rank_threshold, auc_threshold, nes_threshold, motif_similarity_fdr, orthologuous_identity_threshold, weighted_recovery, client_or_address, num_workers, module_chunksize, filter_for_annotation)
349 return _distributed_calc(rnkdbs, modules, motif_annotations_fname, transformation_func, aggregation_func,
350 motif_similarity_fdr, orthologuous_identity_threshold, client_or_address,
--> 351 num_workers, module_chunksize)
352
353

~/miniconda3/lib/python3.7/site-packages/pyscenic/prune.py in _distributed_calc(rnkdbs, modules, motif_annotations_fname, transform_func, aggregate_func, motif_similarity_fdr, orthologuous_identity_threshold, client_or_address, num_workers, module_chunksize)
298 if client_or_address == "dask_multiprocessing":
299 # ... via multiprocessing.
--> 300 return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
301 else:
302 # ... via dask.distributed framework.

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
154 dask.base.compute
155 """
--> 156 (result,) = compute(self, traverse=False, **kwargs)
157 return result
158

~/miniconda3/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
395 keys = [x.dask_keys() for x in collections]
396 postcomputes = [x.dask_postcompute() for x in collections]
--> 397 results = schedule(dsk, keys, **kwargs)
398 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
399

~/miniconda3/lib/python3.7/site-packages/dask/multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, **kwargs)
190 get_id=_process_get_id, dumps=dumps, loads=loads,
191 pack_exception=pack_exception,
--> 192 raise_exception=reraise, **kwargs)
193 finally:
194 if cleanup:

~/miniconda3/lib/python3.7/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
499 _execute_task(task, data) # Re-execute locally
500 else:
--> 501 raise_exception(exc, tb)
502 res, worker_id = loads(res_info)
503 state['cache'][key] = res

~/miniconda3/lib/python3.7/site-packages/dask/compatibility.py in reraise(exc, tb)
110 if exc.traceback is not tb:
111 raise exc.with_traceback(tb)
--> 112 raise exc
113
114 else:

~/miniconda3/lib/python3.7/site-packages/dask/local.py in execute_task()
270 try:
271 task, data = loads(task_info)
--> 272 result = _execute_task(task, data)
273 id = get_id()
274 result = dumps((result, id))

~/miniconda3/lib/python3.7/site-packages/dask/local.py in _execute_task()
250 elif istask(arg):
251 func, args = arg[0], arg[1:]
--> 252 args2 = [_execute_task(a, cache) for a in args]
253 return func(*args2)
254 elif not ishashable(arg):

~/miniconda3/lib/python3.7/site-packages/dask/local.py in ()
250 elif istask(arg):
251 func, args = arg[0], arg[1:]
--> 252 args2 = [_execute_task(a, cache) for a in args]
253 return func(*args2)
254 elif not ishashable(arg):

~/miniconda3/lib/python3.7/site-packages/dask/local.py in _execute_task()
251 func, args = arg[0], arg[1:]
252 args2 = [_execute_task(a, cache) for a in args]
--> 253 return func(*args2)
254 elif not ishashable(arg):
255 return arg

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in modules2df()
229 #TODO: Remove this restriction.
230 return pd.concat([module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func)
--> 231 for module in modules])
232
233

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in ()
229 #TODO: Remove this restriction.
230 return pd.concat([module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func)
--> 231 for module in modules])
232
233

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in module2df()
183 try:
184 df_annotated_features, rccs, rankings, genes, avg2stdrcc = module2features_func(db, module, motif_annotations,
--> 185 weighted_recovery=weighted_recovery)
186 except MemoryError:
187 LOGGER.error("Unable to process "{}" on database "{}" because ran out of memory. Stacktrace:".format(module.name, db.name))

~/miniconda3/lib/python3.7/site-packages/pyscenic/transform.py in module2features_auc1st_impl()
127 # Calculate recovery curves, AUC and NES values.
128 # For fast unweighted implementation so weights to None.
--> 129 aucs = calc_aucs(df, db.total_genes, weights, auc_threshold)
130 ness = (aucs - aucs.mean()) / aucs.std()
131

~/miniconda3/lib/python3.7/site-packages/pyscenic/recovery.py in aucs()
282 # for calculationg the maximum AUC.
283 maxauc = float((rank_cutoff+1) * y_max)
--> 284 assert maxauc > 0
285 return auc2d(rankings, weights, rank_cutoff, maxauc)

AssertionError:

As ranking database I use homo sapiens. I do not receive this error message when using Mus musculus for another data set. The error mentioned under issue 85 is not present here. Does anyone have an idea how to fix this error?

cflerin · 2020-02-10T12:39:02Z

Hi @Matthias3033 ,

Can you list the databases you are using here? From the error, it sounds like there were no genes found in database that overlap with your data.

Matthias3033 · 2020-02-10T13:27:04Z

Hi @cflerin,

these are the databases that I use:
FeatherRankingDatabase(name="hg19-tss-centered-10kb-10species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-10kb-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-5kb-10species.mc9nr"),
FeatherRankingDatabase(name="hg19-500bp-upstream-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-tss-centered-5kb-7species.mc9nr"),
FeatherRankingDatabase(name="hg19-500bp-upstream-10species.mc9nr")

cflerin · 2020-02-10T14:45:30Z

The databases look fine (although there's no need to use the 7-species when also using 10-species, but it won't cause issues). Are you also using the correct motif annotations file (for human)? How many genes in your expression matrix? And how many modules do you have?

Matthias3033 · 2020-02-10T15:19:34Z

I am using the correct motif file. The number of genes is 17098. How do I get the number of modules? (with len(modules) I get 4996)

cflerin · 2020-02-10T15:49:02Z

Just noticed:

186 except MemoryError:
187 LOGGER.error("Unable to process "{}" on database "{}" because ran out of memory.

which seems self-explanatory. You could try taking out the three 7-species databases and see if it works with the remaining databases.

Matthias3033 · 2020-02-10T16:58:41Z

Same error. I've also tried it with only one 7 species database - still the same error.

cflerin · 2020-02-11T08:26:26Z

How much memory do you have available on your machine? You could try reducing the number of processes that pyscenic is using...

Matthias3033 · 2020-02-11T09:54:33Z

How can I reduce the number of processes?

bramvds · 2020-02-11T12:12:43Z

Via CLI you have the parameter --num_workers N where N specifies the number of cores to use. Using the API for Jupyter notebooks, a similar parameter is available.

For the prune2df function (cistarget step) the parameter name is num_workers. For grnboost, I kindely refer you to the arboreto package documentation: https://github.com/tmoerman/arboreto . Briefly, you need to use a construct like this:

from pyscenic.prune import _prepare_client
from arboreto import grnboost2

client, shutdown_callback = _prepare_client('local_host', num_workers=12)
network = grnboost2(expression_data=ex_mtx, tf_names=tf_names, verbose=True, client_or_address=client)

Matthias3033 · 2020-02-11T14:45:48Z

How much memory do you have available on your machine? You could try reducing the number of processes that pyscenic is using...

I ideally have 120 gb of RAM, so the memory should normally not be a problem

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

Matthias3033 closed this as completed Feb 12, 2020

cflerin added a commit that referenced this issue Jun 3, 2020

cisTarget step: Check for modules with zero db overlap.

6c7f460

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

cflerin added a commit that referenced this issue Jul 17, 2020

cisTarget step: Check for modules with zero db overlap.

18f5f4e

- Previously such modules would cause an error, now these modules are skipped. - Related to #158, #177, #132, #85

cflerin mentioned this issue Jul 17, 2020

Dev #186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError in prune2df #132

AssertionError in prune2df #132

Matthias3033 commented Feb 9, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020 •

edited

Loading

cflerin commented Feb 11, 2020

Matthias3033 commented Feb 11, 2020 •

edited

Loading

bramvds commented Feb 11, 2020

Matthias3033 commented Feb 11, 2020 •

edited

Loading

AssertionError in prune2df #132

AssertionError in prune2df #132

Comments

Matthias3033 commented Feb 9, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020

cflerin commented Feb 10, 2020

Matthias3033 commented Feb 10, 2020 • edited Loading

cflerin commented Feb 11, 2020

Matthias3033 commented Feb 11, 2020 • edited Loading

bramvds commented Feb 11, 2020

Matthias3033 commented Feb 11, 2020 • edited Loading

Matthias3033 commented Feb 10, 2020 •

edited

Loading

Matthias3033 commented Feb 11, 2020 •

edited

Loading

Matthias3033 commented Feb 11, 2020 •

edited

Loading