-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in pyscenic ctx (CLI) #103
Comments
Dear Julien, Could you check that the gene symbols in your expression matrix are unique? Just to make sure that this is not causing your issue. Thanks for your help. Kindest regards, |
Dear @bramvds , Thanks for your reply. Best, |
Hi Julien, Is there something special with JUN expression across samples/cells? It is discovered by GENIE3/GRNBoost2 as a TF and/or target gene but does not appear in the gene-gene correlation matrix derived from the single-cell expression matrix. This is the cause of the error you get. Kindest regards, |
Hi Bram, Thanks for your reply. Best, |
Hi Bram, I have tried the same commands with another dataset to verify that the error wasn't due to this specific matrix. In fact I get exactly the same error, except that this time the problematic TF is not "JUN" but "SELENOP". Thanks for your help, Julien |
HI Julien, Just the be sure, could you check your adjacencies file (i.e. the output from the GRN step)? The extension of the file needs to match its format (if fields are separated by commas it needs to be 'csv', if the separator is tab then it should be 'tsv'). Moreover, the file should contain a header as first line:
Kindest regards, |
Hi Bram, Thanks for your reply. Anyway, I just managed to run the whole pipeline by using the Python tool with Jupyter, so I guess we can leave this problem unsolved especially if I'm the only person to have encountered it.. Thank you again for all your time! Best, Julien |
I think I've figured out what happened here after running into a similar issue recently. If there are genes present in the network output (adjacencies) that are missing from the gene expression matrix, then this |
Hi, I am trying to run pyscenic ctx from the output of My commands are as follows:
I am using the same input expression matrix for both commands. The error I get is as follows:
Do you have any suggestions as to how to fix this? Best, Conda environment:
|
@lc822 , what is the header of your
|
Yes it looks like that.
It appears to be tab delimited. |
Hi @cflerin, Just an observation, without knowing anything about the code implementation (part of a team working alongside @lc822 ), could the error be related to the header being used within the pandas hash function?
The reason I ask is that the error complains that a keyError is not found using "TF", but if I look at the tab file above it seems like TF forms part of the header and not a gene. |
Indeed, it seems like pandas is looking for a gene named "TF", which should be part of the header. Could you try renaming the file to end with This seems to be a bug in the arboreto script actually, which always uses tab as a separator, while you've requested the output to be comma-separated, and the |
Hi,
I'm trying to run the pyscenic CLI, I have already managed to run the grn step and I got an output "adjacencies.tsv", but when I proceed to the ctx step I have a KeyError.
My command is the following one:
pyscenic ctx --mode dask_multiprocessing --annotations_fname $RESOURCES_FOLDER"motifs-v9-nr.hgnc-m0.001-o0.0.tbl" --num_workers 8 --output $DATA_FOLDER"regulons.csv" --expression_mtx_fname $DATA_FOLDER"exp.csv" $DATA_FOLDER"adjacencies.tsv" $DATABASE_FOLDER"hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather" $DATABASE_FOLDER"hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather"
And the error message I get is this one:
2019-10-25 11:44:51,593 - pyscenic.cli.pyscenic - INFO - Creating modules.
2019-10-25 11:44:53,560 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.
2019-10-25 11:45:02,490 - pyscenic.utils - INFO - Calculating Pearson correlations.
2019-10-25 11:45:02,490 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
Dropout masking is currently set to [False].
Traceback (most recent call last):
File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'JUN'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/julien/anaconda3/bin/pyscenic", line 10, in
sys.exit(main())
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
args.func(args)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 133, in prune_targets_command
modules = adjacencies2modules(args)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 102, in adjacencies2modules
keep_only_activating=(args.all_modules != "yes"))
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 265, in modules_from_adjacencies
rho_threshold=rho_threshold, mask_dropouts=rho_mask_dropouts)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 136, in add_correlation
rhos = np.array([corr_mtx[s2][s1] for s1, s2 in zip(adjacencies.TF, adjacencies.target)])
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 136, in
rhos = np.array([corr_mtx[s2][s1] for s1, s2 in zip(adjacencies.TF, adjacencies.target)])
File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in getitem
indexer = self.columns.get_loc(key)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'JUN'
I wondered if it was an issue with the Pandas version (I had 0.23.4), so I upgraded to the latest one (0.25.2) but I still get the same error..
Thank you for your help!
Best,
Julien
The text was updated successfully, but these errors were encountered: