Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoesNotExistError: 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/' does not exist #1313

Open
yabeykoon opened this issue Nov 13, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@yabeykoon
Copy link

yabeykoon commented Nov 13, 2024

I'm a student at UCSF trying to run SCVI reference mapping pipeline using reference datasets in cellxgene census. I'm running all the arguments in interative Jupyter notebook on Wynton computing cluster at UCSF. I'm able to download the cellxgene package to my python virtual environment on Wynton, but when I run the following argument:

`import warnings

import cellxgene_census
import scanpy

warnings.filterwarnings("ignore")

census_version = "2023-12-15"
census = cellxgene_census.open_soma(census_version=census_version)

emb_names = ["scvi", "geneformer"]

adata = cellxgene_census.get_anndata(
    census,
    organism="homo_sapiens",
    measurement_name="RNA",
    obs_value_filter="tissue_general == 'central nervous system'",
    obs_column_names=["cell_type"],
    obs_embeddings=emb_names,
)

census.close()`

I get the following error:

`---------------------------------------------------------------------------
DoesNotExistError                         Traceback (most recent call last)
Cell In[6], line 9
      6 warnings.filterwarnings("ignore")
      8 census_version = "2023-12-15"
----> 9 census = cellxgene_census.open_soma(census_version=census_version)
     11 emb_names = ["scvi", "geneformer"]
     13 adata = cellxgene_census.get_anndata(
     14     census,
     15     organism="homo_sapiens",
   (...)
     19     obs_embeddings=emb_names,
     20 )

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/cellxgene_census/_open.py:260, in open_soma(census_version, mirror, uri, tiledb_config, context)
    252     api_logger.info(
    253         f"The \"{census_version}\" release is currently {description['release_build']}. Specify "
    254         f"'census_version=\"{description['release_build']}\"' in future calls to open_soma() to ensure data "
    255         "consistency."
    256     )
    258 locator = _resolve_census_locator(description["soma"], selected_mirror)
--> 260 return _open_soma(locator, context)

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/cellxgene_census/_open.py:85, in _open_soma(locator, context)
     82 if locator["provider"] == "S3":
     83     context = context.replace(tiledb_config={"vfs.s3.region": locator.get("region")})
---> 85 return soma.open(locator["uri"], mode="r", soma_type=soma.Collection, context=context)

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/tiledbsoma/_factory.py:123, in open(uri, mode, soma_type, context, tiledb_timestamp)
     82 """Opens a TileDB SOMA object.
     83 
     84 Args:
   (...)
    120     Maturing.
    121 """
    122 context = _validate_soma_tiledb_context(context)
--> 123 obj: SOMAObject[_Wrapper] = _open_internal(  # type: ignore[valid-type]
    124     _tdb_handles.open, uri, mode, context, tiledb_timestamp
    125 )
    126 try:
    127     if soma_type:

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/tiledbsoma/_factory.py:154, in _open_internal(opener, uri, mode, context, timestamp)
    144 def _open_internal(
    145     opener: Callable[
    146         [str, options.OpenMode, SOMATileDBContext, Optional[OpenTimestamp]], _Wrapper
   (...)
    151     timestamp: Optional[OpenTimestamp],
    152 ) -> SOMAObject[_Wrapper]:
    153     """Lower-level open function for internal use only."""
--> 154     handle = opener(uri, mode, context, timestamp)
    155     try:
    156         return reify_handle(handle)

File ~/miniconda3/envs/scanpy/lib/python3.12/site-packages/tiledbsoma/_tdb_handles.py:77, in open(uri, mode, context, timestamp, clib_type)
     68 soma_object = clib.SOMAObject.open(
     69     uri=uri,
     70     mode=open_mode,
   (...)
     73     clib_type=clib_type,
     74 )
     76 if not soma_object:
---> 77     raise DoesNotExistError(f"{uri!r} does not exist")
     79 _type_to_class = {
     80     "somadataframe": DataFrameWrapper,
     81     "somadensendarray": DenseNDArrayWrapper,
   (...)
     85     "somameasurement": MeasurementWrapper,
     86 }
     88 try:

DoesNotExistError: 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/' does not exist
`

I expected the data to load as descibed in the tutorial - https://chanzuckerberg.github.io/cellxgene-census/notebooks/api_demo/census_access_maintained_embeddings.html#Storage-format

Environment

I'm running all arguments on Wynton HPC at UCSF.

`(base) [yabeykoon@dev1 ~]$ pip list
Package                   Version
------------------------- ----------------
aiobotocore               2.5.4
aiohappyeyeballs          2.4.0
aiohttp                   3.10.5
aioitertools              0.11.0
aiosignal                 1.3.1
anndata                   0.10.8
anyio                     4.3.0
archspec                  0.2.3
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
array_api_compat          1.8
arrow                     1.3.0
asciitree                 0.3.3
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
Babel                     2.14.0
beautifulsoup4            4.12.3
bleach                    6.1.0
boltons                   23.0.0
botocore                  1.31.17
brotlipy                  0.7.0
captum                    0.7.0
certifi                   2024.6.2
cffi                      1.15.1
charset-normalizer        2.0.4
click                     8.1.7
cloudpickle               3.0.0
colorcet                  3.1.0
comm                      0.2.2
conda                     24.5.0
conda-libmamba-solver     24.1.0
conda-package-handling    2.3.0
conda_package_streaming   0.10.0
contourpy                 1.2.0
cryptography              42.0.5
cycler                    0.12.1
dask                      2024.8.0
dask-expr                 1.1.10
dask-image                2024.5.3
datashader                0.16.3
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
distlib                   0.3.8
distributed               2024.8.0
distro                    1.9.0
docrep                    0.3.2
exceptiongroup            1.2.0
executing                 2.0.1
fasteners                 0.19
fastjsonschema            2.19.1
filelock                  3.13.1
fonttools                 4.47.0
fqdn                      1.5.1
frozendict                2.4.2
frozenlist                1.4.1
fsspec                    2023.6.0
geopandas                 1.0.1
get-annotations           0.1.2
h11                       0.14.0
h5py                      3.11.0
httpcore                  1.0.5
httpx                     0.27.0
idna                      3.4
igraph                    0.11.6
imageio                   2.35.1
importlib_metadata        7.1.0
importlib-resources       6.1.1
inflect                   7.3.1
intervene                 0.6.5
ipykernel                 6.29.4
ipython                   8.18.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.9.24
jsonpatch                 1.33
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.4
jupyter_server            2.13.0
jupyter_server_terminals  0.5.3
jupyterlab                4.1.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.4
kiwisolver                1.4.5
lazy_loader               0.4
legacy-api-wrap           1.4
leidenalg                 0.10.2
libmambapy                1.5.8
llvmlite                  0.43.0
locket                    1.0.0
mamba                     1.5.8
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.6
matplotlib-scalebar       0.8.1
mdurl                     0.1.2
menuinst                  2.1.1
mistune                   3.0.2
mizani                    0.9.3
more-itertools            10.4.0
mpmath                    1.3.0
msgpack                   1.0.8
multidict                 6.0.5
multipledispatch          1.0.0
multiscale_spatial_image  1.0.1
natsort                   8.4.0
nbclient                  0.10.0
nbconvert                 7.16.3
nbformat                  5.10.3
nest-asyncio              1.6.0
networkx                  3.2.1
notebook                  7.1.2
notebook_shim             0.2.4
numba                     0.60.0
numcodecs                 0.12.1
numpy                     2.0.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         9.1.0.70
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.20.5
nvidia-nvjitlink-cu12     12.6.20
nvidia-nvtx-cu12          12.1.105
ome-zarr                  0.9.0
omnipath                  1.0.8
overrides                 7.7.0
packaging                 23.2
pandas                    2.2.2
pandocfilters             1.5.1
param                     2.1.1
parso                     0.8.3
partd                     1.4.2
patchworklib              0.6.4
patsy                     0.5.6
pexpect                   4.9.0
pillow                    10.2.0
PIMS                      0.7
pip                       24.0
platformdirs              4.1.0
plotnine                  0.12.4
pluggy                    1.0.0
pooch                     1.8.2
prometheus_client         0.20.0
prompt-toolkit            3.0.43
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   17.0.0
pybedtools                0.9.1
pycosat                   0.6.4
pycparser                 2.21
pyct                      0.5.0
Pygments                  2.17.2
pynndescent               0.5.13
pyogrio                   0.9.0
pyOpenSSL                 24.0.0
pyparsing                 3.1.1
pyproj                    3.6.1
pysam                     0.22.0
PySocks                   1.7.1
python-dateutil           2.8.2
python-json-logger        2.0.7
pytz                      2023.3.post1
PyYAML                    6.0.1
pyzmq                     25.1.2
referencing               0.34.0
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rpds-py                   0.18.0
ruamel.yaml               0.17.21
ruamel.yaml.clib          0.2.8
ruamel-yaml-conda         0.15.100
s3fs                      2023.6.0
scanpy                    1.10.2
scikit-image              0.24.0
scikit-learn              1.5.1
scipy                     1.13.1
seaborn                   0.13.1
Send2Trash                1.8.2
session-info              1.0.0
setuptools                67.8.0
shapely                   2.0.6
six                       1.16.0
slicerator                1.1.0
sniffio                   1.3.1
sortedcontainers          2.4.0
soupsieve                 2.5
spatial_image             1.1.0
spatialdata               0.2.2
squidpy                   1.6.0
stack-data                0.6.3
statsmodels               0.14.2
stdlib-list               0.10.0
sympy                     1.13.2
tblib                     3.0.0
terminado                 0.18.1
texttable                 1.7.0
threadpoolctl             3.5.0
tifffile                  2024.8.10
tinycss2                  1.2.1
tomli                     2.0.1
toolz                     0.12.1
torch                     2.4.0
torch-geometric           2.6.0
torch_scatter             2.1.2+pt23cu121
torch_sparse              0.6.18+pt23cu121
tornado                   6.4
tqdm                      4.66.4
traitlets                 5.14.2
triton                    3.0.0
typeguard                 4.3.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.10.0
tzdata                    2023.4
umap-learn                0.5.6
uri-template              1.3.0
urllib3                   1.26.16
validators                0.33.0
virtualenv                20.25.0
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
wheel                     0.43.0
wrapt                     1.16.0
xarray                    2024.7.0
xarray-dataclasses        1.8.0
xarray-datatree           0.0.14
xarray-schema             0.0.3
xarray-spatial            0.4.0
yarl                      1.9.4
zarr                      2.18.2
zict                      3.0.0
zipp                      3.17.0
zstandard                 0.19.0`

@yabeykoon yabeykoon added the bug Something isn't working label Nov 13, 2024
@johnkerl
Copy link

If I'm not mistaken this is a duplicate of #1261 (comment)

A new 1.15.0 version of tiledbsoma (which cellxgene.census uses) is expected to be released in the next couple weeks, after which point the workaround from #1261 should no longer be necessary

@ivirshup
Copy link
Collaborator

@johnkerl, I'm not so sure if this is a duplicate since I believe that issue is specific to the R implementation.

@yabeykoon, would you mind running something like:

cellxgene_census.download_source_h5ad("8e47ed12-c658-4252-b126-381df8d52a3d", to_path="/tmp/data.h5ad")

I'm wondering if there is a firewall on your side. This command will try and download some data from the bucket using a different set of libraries, so whether it runs will help us narrow down what's going on.

@johnkerl
Copy link

@johnkerl, I'm not so sure if this is a duplicate since I believe that issue is specific to the R implementation.

Indeed, my apologies to have introduced any confusion!

@ivirshup
Copy link
Collaborator

ivirshup commented Jan 7, 2025

@yabeykoon, any chance you have had a chance to take another look here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants