Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

Closed
max-sixty opened this issue Apr 21, 2021 · 2 comments
Closed

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

max-sixty opened this issue Apr 21, 2021 · 2 comments
Labels
good first issue Good for newcomers

Comments

@max-sixty
Copy link

Currently Data Viewer works with a few objects — pandas' DataFrames, numpy arrays, TF & PyTorch tensors. It's v impressive! Would it be possible to add more types of objects?

It would be great to get xarray in there too (disclaimer: I'm a core dev). A couple of approaches:

  1. Look for objects with __array__ methods and call that to return a numpy array. This extension doesn't need to know anything about the array for this to work, but we only get numpy functionality.
  2. Add actual xarray support, including the ability to select labels, as well as indices

Here's an example of np.asarray & __array__ working:

In [6]: import xarray as xr

In [11]: da = xr.DataArray(np.random.rand(2,3,4), dims=list('abc'), coords=dict(a=['x','y'], b=['m','n','o']))

In [12]: da
Out[12]:
<xarray.DataArray (a: 2, b: 3, c: 4)>
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])
Coordinates:
  * a        (a) <U1 'x' 'y'
  * b        (b) <U1 'm' 'n' 'o'
Dimensions without coordinates: c

In [13]: np.asarray(da)
Out[13]:
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])

In [14]: da.__array__()   # same as above
Out[14]:
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])
@joyceerhl
Copy link
Contributor

joyceerhl commented Apr 22, 2021

Would it be possible to add more types of objects?

Yes, and we welcome PRs 😊

Add actual xarray support, including the ability to select labels, as well as indices

I might be misreading this, is the proposal to add custom UI for manipulating xarray objects?

Look for objects with array methods and call that to return a numpy array.

FWIW, if you wanted to pursue this approach, here are the changes to make:

In this function:

def _VSCODE_convertToDataFrame(df, start=None, end=None):
vartype = type(df)
if isinstance(df, list):
df = _VSCODE_pd.DataFrame(df).iloc[start:end]
elif isinstance(df, _VSCODE_pd.Series):
df = _VSCODE_pd.Series.to_frame(df).iloc[start:end]
elif isinstance(df, dict):
df = _VSCODE_pd.Series(df)
df = _VSCODE_pd.Series.to_frame(df).iloc[start:end]
elif hasattr(df, "toPandas"):
df = df.toPandas().iloc[start:end]
elif (
hasattr(vartype, "__name__") and vartype.__name__ in _VSCODE_allowedTensorTypes
):
df = _VSCODE_convertTensorToDataFrame(df, start, end)
elif hasattr(vartype, "__name__") and vartype.__name__ == "ndarray":
df = _VSCODE_convertNumpyArrayToDataFrame(df, start, end)
add a case for xarray DataArrays:

    elif hasattr(df, "__array__") and hasattr(vartype, "__name__") and vartype.__name__ == "DataArray":
        df = _VSCODE_convertNumpyArrayToDataFrame(df.__array__(), start, end)

(Depending on how __array__ is implemented it may be better to evaluate df[start:end].__array__() instead)

And add 'DataArray' to the following three lists of supported types:

  1. const DataViewableTypes: Set<string> = new Set<string>([
    'DataFrame',
    'list',
    'dict',
    'ndarray',
    'Series',
    'Tensor',
    'EagerTensor'
    ]);
  2. const DataViewableTypes: Set<string> = new Set<string>([
    'DataFrame',
    'list',
    'dict',
    'ndarray',
    'Series',
    'Tensor',
    'EagerTensor'
    ]);
  3. const SliceableTypes: Set<string> = new Set<string>(['ndarray', 'Tensor', 'EagerTensor']);

This should suffice to get support for viewing xarrays in the data viewer, including slicing xarrays.

@vandyliu
Copy link
Contributor

vandyliu commented Jun 2, 2021

Solved by #6027

@vandyliu vandyliu closed this as completed Jun 2, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants