Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SparseNDArray incorrect shape #1327

Closed
ivirshup opened this issue May 3, 2023 · 4 comments
Closed

[Bug] SparseNDArray incorrect shape #1327

ivirshup opened this issue May 3, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request needs-discussion

Comments

@ivirshup
Copy link
Collaborator

ivirshup commented May 3, 2023

Describe the bug

SparseNDArrays seem to report their shape incorrectly (at least from expected value).

To Reproduce
Provide a code example and any sample input data (e.g. an H5AD) as an attachment to reproduce this behavior.

import anndata as ad
from scipy import sparse

from tiledbsoma.io import from_anndata
import tiledbsoma as soma

adata = ad.AnnData(sparse.random(100, 100, format="csr", density=.1))
from_anndata("test-shape", adata, "test-shape")
# 'test-shape'

exp = soma.Experiment.open('test-shape')
exp.ms["test-shape"].X["data"].shape
# (9223372036854773760, 9223372036854773760)

I would have expected the array to be 100 x 100

Versions (please complete the following information):
tiledbsoma.version 1.2.1
TileDB-Py tiledb.version() (0, 21, 2)
TileDB core version 2.15.1
libtiledbsoma version() libtiledbsoma=;libtiledb=2.15.0
python version 3.10.10.final.0
OS version Darwin 20.6.0

@johnkerl
Copy link
Member

johnkerl commented May 3, 2023

@ivirshup thanks!

There's been quite a bit of discussion here and this is by (current) design. The shape is the capacity; the dimensions of what's actually there are obs.count and var.count.

That said, I know this is counter-intuitive, as we are using .shape in a non-standard way here ...

At the very least, there's a documentation opportunity here.

Beyond that, to consider changing the semantics of the .shape accessor, will take a multi-party discussion.

@johnkerl johnkerl added enhancement New feature or request needs-discussion labels May 3, 2023
@ivirshup
Copy link
Collaborator Author

ivirshup commented May 3, 2023

Thanks for the quick response! Out of curiosity, what's the definition of "capacity" here?

Maybe .maxshape (like h5py) would be appropriate here?

I would note that this also carries through to the in-memory representations:

exp.ms["test-shape"].X["data"].read().coos().concat().to_scipy()
# <9223372036854773760x9223372036854773760 sparse matrix of type '<class 'numpy.float64'>'
#  	with 1000 stored elements in COOrdinate format>

@johnkerl
Copy link
Member

johnkerl commented May 3, 2023

@ivirshup yes, by "capacity" and "max shape" I think we mean the same thing.

If you create a SparseNDArray with shape=(100,100) then what you can write to it always must have both coordinates in the 0..99 (inclusive) range. This is a use-case for "immutable" snapshot/cell-census/cell-atlas behavior.

If you create a SparseNDArray with default shape then you can write various coordinates to it later. This is a use-case for "mutable" behavior wherein you'll write some data to the array today, and write more tomorrow, for example.

@johnkerl
Copy link
Member

johnkerl commented Jun 5, 2023

Discussion is moved to the tracking issue #1445.

@johnkerl johnkerl closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-discussion
Projects
None yet
Development

No branches or pull requests

2 participants