Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in scanpy_cello(); most specific cell type not created #13

Open
phjanssen opened this issue Dec 1, 2021 · 2 comments
Open

Comments

@phjanssen
Copy link

Hi,
Thank you for developing this very useful tool!
We encountered an error while trying to classify our cells with a pre-trained model. The prediction itself seems to work and we get the binary and probability output for the ontology terms added to the adata object if we set term_ids=True, however the selection of the 'most specific cell type' fails with an KeyError and the conversion to readable terms does also not work (no output at all if term_ids=False).
Any ideas why this could happen and how to fix it?
Thanks in advance,
Laura and Philipp

The command:

cello.scanpy_cello(
    adata, 
    'clusters',
    cello_resource_loc, 
    model_file=f'{model_prefix}.model.dill',
    term_ids=True
)

Output:

Found CellO resources at '/data/home/EBgrant/scRNA_run1/analysis/Laura/CellO/resources'.

Variable names are not unique. To make them unique, call `.var_names_make_unique`.

Transforming with PCA...
done.
Making predictions for each classifier...
Running solver on item 1/19...
Running solver on item 2/19...
Running solver on item 3/19...
Running solver on item 4/19...
Running solver on item 5/19...
Running solver on item 6/19...
Running solver on item 7/19...
Running solver on item 8/19...
Running solver on item 9/19...
Running solver on item 10/19...
Running solver on item 11/19...
Running solver on item 12/19...
Running solver on item 13/19...
Running solver on item 14/19...
Running solver on item 15/19...
Running solver on item 16/19...
Running solver on item 17/19...
Running solver on item 18/19...
Running solver on item 19/19...
Checking if any pre-trained model is compatible with this input dataset...

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator PCA from version 0.22.2.post1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.22.2.post1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

Of 24458 genes in the input file, 19107 were found in the training set of 58243 genes.
Of 24458 genes in the input file, 18496 were found in the training set of 31283 genes.
Using thresholds stored in /data/home/EBgrant/scRNA_run1/analysis/Laura/CellO/resources/trained_models/ir.10x_genes_thresholds.tsv
Binarizing classifications...
Mapping each sample to its predicted labels...
Computing the most-specific predicted labels...
Item 1 predicted to be "somatic cell (CL:0002371)"
Item 2 predicted to be "somatic cell (CL:0002371)"
Item 3 predicted to be "somatic cell (CL:0002371)"
Item 4 predicted to be "neuron associated cell (CL:0000095)"
Item 5 predicted to be "astrocyte (CL:0000127)"
Item 6 predicted to be "astrocyte (CL:0000127)"
Item 7 predicted to be "somatic cell (CL:0002371)"
Item 8 predicted to be "CNS neuron (sensu Vertebrata) (CL:0000117)"
Item 9 predicted to be "astrocyte (CL:0000127)"
Item 10 predicted to be "hepatocyte (CL:0000182)"
Item 11 predicted to be "neurecto-epithelial cell (CL:0000710)"
Item 12 predicted to be "neural cell (CL:0002319)"
Item 13 predicted to be "astrocyte (CL:0000127)"
Item 14 predicted to be "astrocyte (CL:0000127)"
Item 15 predicted to be "squamous epithelial cell (CL:0000076)"
Item 16 predicted to be "astrocyte (CL:0000127)"
Item 18 predicted to be "neuron associated cell (CL:0000095)"
Item 19 predicted to be "endothelial cell of umbilical vein (CL:0002618)"

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-1600827178e8> in <module>
      7     cello_resource_loc,
      8     model_file=f'{model_prefix}.model.dill',
----> 9     term_ids=True
     10 )

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/cello/scanpy_cello.py in cello(adata, clust_key, rsrc_loc, algo, out_prefix, model_file, log_dir, term_ids, remove_anatomical_subterms)
    206         adata.obs['Most specific cell type'] = [
    207             ou.cell_ontology().id_to_term[c].name
--> 208             for c in ms_results_df['most_specific_cell_type']
    209         ]
    210     else:

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/cello/scanpy_cello.py in <listcomp>(.0)
    206         adata.obs['Most specific cell type'] = [
    207             ou.cell_ontology().id_to_term[c].name
--> 208             for c in ms_results_df['most_specific_cell_type']
    209         ]
    210     else:

KeyError: ''
@Jon-bioinfo
Copy link

I also came across this error and in my case it's solved with #18 where some of my cells/clusters were unannotated so the lookup fails.

@mbernste
Copy link
Member

Hi, I apologize for the delay in fixing this issue. Would it be possible to send me the pre-trained model file and an expression matrix on which this fails?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants