Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

max-sixty · 2021-04-21T21:58:06Z

Currently Data Viewer works with a few objects — pandas' DataFrames, numpy arrays, TF & PyTorch tensors. It's v impressive! Would it be possible to add more types of objects?

It would be great to get xarray in there too (disclaimer: I'm a core dev). A couple of approaches:

Look for objects with __array__ methods and call that to return a numpy array. This extension doesn't need to know anything about the array for this to work, but we only get numpy functionality.
Add actual xarray support, including the ability to select labels, as well as indices

Here's an example of np.asarray & __array__ working:

In [6]: import xarray as xr

In [11]: da = xr.DataArray(np.random.rand(2,3,4), dims=list('abc'), coords=dict(a=['x','y'], b=['m','n','o']))

In [12]: da
Out[12]:
<xarray.DataArray (a: 2, b: 3, c: 4)>
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])
Coordinates:
  * a        (a) <U1 'x' 'y'
  * b        (b) <U1 'm' 'n' 'o'
Dimensions without coordinates: c

In [13]: np.asarray(da)
Out[13]:
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])

In [14]: da.__array__()   # same as above
Out[14]:
array([[[0.91948492, 0.20846841, 0.6842219 , 0.16919421],
        [0.34729456, 0.59948279, 0.63013757, 0.82908032],
        [0.18657731, 0.24992941, 0.87905069, 0.93960824]],

       [[0.12899889, 0.30758554, 0.7182392 , 0.76364721],
        [0.68879206, 0.81600394, 0.10102972, 0.13856853],
        [0.43866809, 0.60140471, 0.47634698, 0.37118161]]])

The text was updated successfully, but these errors were encountered:

joyceerhl · 2021-04-22T01:41:57Z

Would it be possible to add more types of objects?

Yes, and we welcome PRs 😊

Add actual xarray support, including the ability to select labels, as well as indices

I might be misreading this, is the proposal to add custom UI for manipulating xarray objects?

Look for objects with array methods and call that to return a numpy array.

FWIW, if you wanted to pursue this approach, here are the changes to make:

In this function:

vscode-jupyter/pythonFiles/vscode_datascience_helpers/dataframes/vscodeDataFrame.py

Lines 91 to 107 in 70a6e7b

 def _VSCODE_convertToDataFrame(df, start=None, end=None): 

 vartype = type(df) 

 if isinstance(df, list): 

 df = _VSCODE_pd.DataFrame(df).iloc[start:end] 

 elif isinstance(df, _VSCODE_pd.Series): 

 df = _VSCODE_pd.Series.to_frame(df).iloc[start:end] 

 elif isinstance(df, dict): 

 df = _VSCODE_pd.Series(df) 

 df = _VSCODE_pd.Series.to_frame(df).iloc[start:end] 

 elif hasattr(df, "toPandas"): 

 df = df.toPandas().iloc[start:end] 

 elif ( 

 hasattr(vartype, "__name__") and vartype.__name__ in _VSCODE_allowedTensorTypes 

 ): 

 df = _VSCODE_convertTensorToDataFrame(df, start, end) 

 elif hasattr(vartype, "__name__") and vartype.__name__ == "ndarray": 

 df = _VSCODE_convertNumpyArrayToDataFrame(df, start, end)

add a case for xarray DataArrays:

    elif hasattr(df, "__array__") and hasattr(vartype, "__name__") and vartype.__name__ == "DataArray":
        df = _VSCODE_convertNumpyArrayToDataFrame(df.__array__(), start, end)

(Depending on how __array__ is implemented it may be better to evaluate df[start:end].__array__() instead)

And add 'DataArray' to the following three lists of supported types:

vscode-jupyter/src/client/datascience/jupyter/debuggerVariables.ts

Lines 24 to 32 in 70a6e7b

const DataViewableTypes: Set<string> = new Set<string>([

'DataFrame',

'list',

'dict',

'ndarray',

'Series',

'Tensor',

'EagerTensor'

]);
vscode-jupyter/src/client/datascience/jupyter/kernelVariables.ts

Lines 39 to 47 in 70a6e7b

const DataViewableTypes: Set<string> = new Set<string>([

'DataFrame',

'list',

'dict',

'ndarray',

'Series',

'Tensor',

'EagerTensor'

]);
vscode-jupyter/src/datascience-ui/data-explorer/mainPanel.tsx

Line 40 in 70a6e7b

const SliceableTypes: Set<string> = new Set<string>(['ndarray', 'Tensor', 'EagerTensor']);

This should suffice to get support for viewing xarrays in the data viewer, including slicing xarrays.

vandyliu · 2021-06-02T21:57:35Z

Solved by #6027

max-sixty added the enhancement label Apr 21, 2021

greazer added the good first issue Good for newcomers label Apr 22, 2021

joyceerhl mentioned this issue Apr 23, 2021

Data Viewer option does not exist microsoft/vscode-python#15527

Closed

vandyliu mentioned this issue May 28, 2021

Adding xarray DataArrays to Data Viewer #6027

Merged

9 tasks

vandyliu closed this as completed Jun 2, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

max-sixty commented Apr 21, 2021

joyceerhl commented Apr 22, 2021 •

edited

Loading

vandyliu commented Jun 2, 2021

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

Adding additional objects to Data Viewer; e.g. xarray arrays? #5590

Comments

max-sixty commented Apr 21, 2021

joyceerhl commented Apr 22, 2021 • edited Loading

vandyliu commented Jun 2, 2021

joyceerhl commented Apr 22, 2021 •

edited

Loading