Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] TileDB does not allow "" as the sole string-enum value #2859

Open
johnkerl opened this issue Aug 8, 2024 · 1 comment
Open

[python] TileDB does not allow "" as the sole string-enum value #2859

johnkerl opened this issue Aug 8, 2024 · 1 comment
Labels

Comments

@johnkerl
Copy link
Member

johnkerl commented Aug 8, 2024

Split out from #2858, after initial customer report at #2822.

Here is a repro script:
https://gist.github.com/johnkerl/d45f022d710842d36d1b9f29303ce466

Output:

----------------------------------------------------------------
ADATA OBJECT:

AnnData object with n_obs × n_vars = 16 × 4
    obs: 'cell_type'
    var: 'means'

----------------------------------------------------------------
INGESTING TO tiledbsoma-io-empty-string-enum:
Traceback (most recent call last):
  File "/Users/johnkerl/git/TileDB-Inc/cloud-dev-temp/debug/tiledbsoma-nullables/./tiledbsoma-io-write-empty-string-enum.py", line 122, in <module>
    tiledbsoma.io.from_anndata(suri, adata, measurement_name="RNA")
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/io/ingest.py", line 511, in from_anndata
    with _write_dataframe(
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/io/ingest.py", line 1162, in _write_dataframe
    return _write_dataframe_impl(
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/io/ingest.py", line 1235, in _write_dataframe_impl
    _write_arrow_table(
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/io/ingest.py", line 1134, in _write_arrow_table
    handle.write(arrow_table, platform_config=tiledb_write_options)
  File "/Users/johnkerl/git/single-cell-data/TileDB-SOMA/apis/python/src/tiledbsoma/_dataframe.py", line 466, in write
    clib_dataframe.write(batch, sort_coords or False)
RuntimeError: Enumeration: Unable to extend an enumeration without a data buffer.

Notes:

If the input-data column has been created this way then all is well:

"cell_type": pd.Categorical(np.array([""], dtype=str), categories=[""]),

If the input-data column has been created this way then we get the crash:

"cell_type": pd.Categorical(np.array([""], dtype=str),
@johnkerl johnkerl changed the title [python] TileDB-SOMA-Py does not allow only "" as string-enum value in some cases [python] TileDB-SOMA does not allow only "" as sole string-enum value Aug 16, 2024
@johnkerl
Copy link
Member Author

johnkerl commented Aug 16, 2024

This is s core defect. I've filed [sc-53027] with the TileDB Core team.

@johnkerl johnkerl changed the title [python] TileDB-SOMA does not allow only "" as sole string-enum value [python] TileDB does not allow only "" as sole string-enum value Aug 16, 2024
@johnkerl johnkerl changed the title [python] TileDB does not allow only "" as sole string-enum value [python] TileDB does not allow "" as the sole string-enum value Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant