-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Unable to determine gene collection #25
Comments
I used code: but got error: KeyError: 'Entrez_gene_ID' During handling of the above exception, another exception occurred: BiomartException Traceback (most recent call last) File ~/ENTER/lib/python3.9/site-packages/scanpy/queries/_queries.py:108, in biomart_annotations(org, attrs, host, use_cache) File ~/ENTER/lib/python3.9/site-packages/scanpy/queries/_queries.py:70, in simple_query(org, attrs, filters, host, use_cache) File ~/ENTER/lib/python3.9/site-packages/pybiomart/dataset.py:246, in Dataset.query(self, attributes, filters, only_unique, use_attr_names) BiomartException: Unknown attribute Entrez_gene_ID, check dataset attributes for a list of valid attributes. I don't know how to check mmusculus dataset attributes for the entrez_gene_ids. Could anyone help? This is for CellO cell type annotation which requires Entrez gene ids or HUGO gene symbols, could anyone help? Thanks |
I have used R biomaRt to find out the mmusculus dataset attributes. It is "entrezgene_id". So I got annot dataframe which contains entrez gene ids. But now I encountered another issue when I used: cello_data.var[annot.columns] = annot for map the my anndata (cello_data2) gene ids to entrez gene ids and add the entrez gene ids. I got the error: ValueError Traceback (most recent call last) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:3643, in DataFrame.setitem(self, key, value) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:3687, in DataFrame._setitem_array(self, key, value) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:3655, in DataFrame.setitem(self, key, value) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:3832, in DataFrame._set_item(self, key, value) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:4532, in DataFrame._sanitize_column(self, value) File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:10999, in _reindex_for_setitem(value, index)
File ~/ENTER/lib/python3.9/site-packages/pandas/core/frame.py:10994, in _reindex_for_setitem(value, index)
File ~/ENTER/lib/python3.9/site-packages/pandas/core/series.py:4672, in Series.reindex(self, *args, **kwargs) File ~/ENTER/lib/python3.9/site-packages/pandas/core/generic.py:4966, in NDFrame.reindex(self, *args, **kwargs) File ~/ENTER/lib/python3.9/site-packages/pandas/core/generic.py:4986, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy) File ~/ENTER/lib/python3.9/site-packages/pandas/core/generic.py:5032, in NDFrame._reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups) File ~/ENTER/lib/python3.9/site-packages/pandas/core/internals/managers.py:679, in BaseBlockManager.reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice, use_na_proxy) File ~/ENTER/lib/python3.9/site-packages/pandas/core/indexes/base.py:4107, in Index._validate_can_reindex(self, indexer) ValueError: cannot reindex on an axis with duplicate labels so, how to solve this? |
When I used the code: it extract the hgnc_symbol and entrezgene_id, then I set the index: this time it let me run the code successfully. It didn't shown the duplicate index error. but when I run the CellO code: cello.scanpy_cello( I don't know what to do now. Any suggestions? |
by the way, right now the cello_data3 head looks like:
Xkr4 4 False 0.000200 -0.470874 -1.075329 0 False NaN NaN somehow some genes don't have entrezgene ids. |
I receive the error message,
ValueError: Unable to determine gene collection. Please make sure the input dataset specifies either HUGO gene symbols or Entrez gene ID's.
W1_1.var.head()
gene_ids feature_types highly_variable means dispersions dispersions_norm n_cells mt rb n_cells_by_counts mean_counts pct_dropout_by_counts total_counts
Mrpl15 ENSMUSG00000033845 Gene Expression False 0.523846 1.410226 -0.317858 364 False False 364 0.125276 76.815287 196.682587
Lypla1 ENSMUSG00000025903 Gene Expression False 0.496954 1.360324 -0.604688 356 False False 356 0.117622 77.324841 184.666626
Tcea1 ENSMUSG00000033813 Gene Expression False 1.178549 1.267944 -0.668772 814 False False 814 0.410102 48.152866 643.860168
Atp6v1h ENSMUSG00000033793 Gene Expression False 0.555859 1.411481 -0.310645 389 False False 389 0.138000 75.222930 216.659286
Rb1cc1 ENSMUSG00000025907 Gene Expression False 1.298682 1.432109 0.011839 838 False False 838 0.449731 46.624204 706.077637
the gene_ids should be Entrez gene ids? I change the column name gene_ids to Entrez gene IDs or Gene stable ID or Entrez gene ids, but didn't work.
or what code should I use to map the gene ids to Ensembl BioMart (http://useast.ensembl.org/biomart) ?
Thanks
Ting
The text was updated successfully, but these errors were encountered: