-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python/r] Nullability tracking (parent) #2858
Comments
One question related to the case where the Arrow schema is provided. How would one specify It doesn't seem like the Python API implementation for Repro script:
prints
|
Hi @mdylan2 ! Great question! At the moment I'm typing up issues for the Python |
A bit more info @mdylan2 : re https://gist.github.com/johnkerl/3a7473dc24974bcc47f7b8257a19bbdb
So this is a bug of ours -- I'll isolate it -- thanks again for the repro script! |
@mdylan2 found it -- this will be a quick fix -- more tomorrow! |
@mdylan2 the issue is #2869 with PR #2868. This fix will go out with TileDB-SOMA 1.13.1 (if we do one) or else 1.14.0. I've now established that the workaround for now is to set metadata like
Please let me know if this resolves everything for you in your |
That worked, thank you @johnkerl! |
@mdylan2 -- update at #2857 (comment) -- regarding how to set up nullable booleans (same goes for ints/floats too I believe, & will test explicitly) -- at the point in time when you set up your For the path using Also, there's more work to do here -- which I'll track on this current issue -- even for things which are not bugs, involving:
|
Context
This is split out from #2822. #2822 had a couple questions: one was answered there conclusivel, and the other turns out to be multi-faceted. This issue tracks the second.
Also note nullability for all attribute/column types is well-supported in TileDB Core; bugs here are strictly at the TileDB-SOMA level.
Purpose
Characterize and isolate nullability-related issues within TileDB-SOMA.
Individual issues will be split out, prioritized, assigned, and scheduled.
Coverage matrix
What does "null" mean in source data:
None
,pd.NA
,math.nan
NA
""
-- this is not "null" in any sense, but, I'll track it here: [python] TileDB does not allow""
as the sole string-enum value #2859. That's labeled a Python PR but the issue may express itself at the R API as well; this needs to be validated.Surfaces to check:
nullable=True
in all cases where we shouldWho writes, and with from what source formats:
tiledbsoma.Experiment.add_new_dataframe
pa.field
nullabilities inDataFrame.create
#2869pa.field
nullabilities inDataFrame.create
#2868tiledbsoma.io.from_anndata
/from_h5ad
SOMACollection$add_new_dataframe
NA
to 0 on writes -- my hunch is this should throw on thewrite
but we can discuss this -- in particular, on discussion with R users to find what the cultural expectation is in the R community.from_seurat
Column types:
tiledbsoma.io
[python] TileDB-SOMA-Py should mapNone
,pd.NA
, andmath.nan
in string columns to null values #2861tiledbsoma.Experiment.add_new_dataframe
) -- needs a separate issue""
casetiledbsoma.io.from_anndata/from_h5ad
: [python] TileDB-SOMA-Py incorrectly writes nullable-boolean columns? [not a bug] #2857 -- this is handled correctly as described in [python/r] Nullability tracking (parent) #2858tiledbsoma.io.from_anndata
case it's crucial that the user's AnnData object has nullable booleans expressed in the right way for Pandas before they hand it to usmath.nan
should probably staymath.nan
(it is a floating-point value) -- although I believe TileDB Core usesmath.nan
for null-fill (I need to check) so this would be a moot pointNA
should probably map to TileDB Core floating-point nullReferences
The text was updated successfully, but these errors were encountered: