-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[H5WasmProvider + MatrixVis]
Support (u)int64 data inside compound datasets
#1503
Conversation
axelboc
commented
Oct 5, 2023
- Fix Extension does not know how to serialize a BigInt vscode-h5web#15
- Fix Support viewing h5 files that are written with Pandas vscode-h5web#30
@@ -67,10 +66,13 @@ export class H5WasmApi extends DataProviderApi { | |||
throw new Error('Compression filter not supported'); | |||
} | |||
|
|||
// h5wasm returns integers for bool and BigInt for (u)int64 | |||
// So we use to_array instead to have bool and numbers resp. | |||
if (hasBoolType(dataset) || hasInt64Type(dataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the check for BooleanType
, because according to my tests, h5wasm does return actual booleans (and not 0/1). This was introduced in #1179.
I think what happened is that when h5wasm@0.4.3 started converting 0/1 to actual booleans, we thought it was doing that only via h5wDataset.to_array
, when in fact it does it with h5wDataset.value
as well.
You can run this branch and try this file in the h5wasm demo: bool.h5.tar.gz to test that the MatrixVis
renders as expected with boolean scalar, 1D and 2D datasets.
/* h5wasm returns bigints for (u)int64 dtypes, so we use `to_array` to get numbers instead. | ||
* We do this only for datasets that are supported by at least one visualization (other than `RawVis`), | ||
* so for (u)int64 scalars/arrays, and for compound datasets with at least one (u)int64 field (`MatrixVis`). */ | ||
if (hasInt64Type(dataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this solution is less than ideal overall, since the provider has to know which dtypes may lead to issues in the visualizations. Indeed, there are other cases where a dataset may contain int64 values (e.g. ArrayType
—i.e. the DType
not the shape;— and of course nested ArrayType
and CompoundType
) but we only really care about the ones that are supported by our visualizations.
Long term, supporting bigint
in the visualizations would be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term, supporting bigint in the visualizations would be ideal.
Yeah, that would be less bound to a specific provider as well.
const rawValue = h5wDataset.to_array(); | ||
|
||
// `to_array` returns nested JS arrays for nD datasets, so we need to re-flatten them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big downside of going through to_array
. Would be better if this method had a flatten
parameter, but I'd still prefer supporting bigint
in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
/* h5wasm returns bigints for (u)int64 dtypes, so we use `to_array` to get numbers instead. | ||
* We do this only for datasets that are supported by at least one visualization (other than `RawVis`), | ||
* so for (u)int64 scalars/arrays, and for compound datasets with at least one (u)int64 field (`MatrixVis`). */ | ||
if (hasInt64Type(dataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term, supporting bigint in the visualizations would be ideal.
Yeah, that would be less bound to a specific provider as well.