Implement .transform method to apply dim transforms across containers #3932

poplarShift · 2019-08-27T21:05:08Z

I think it could be very much worthwhile to implement data aggregation using dim transforms from the ground up (i.e. in each interface). This would give us any sort of multidimensional transforms on HexTiles such as proposed in #3636 etc. for free.

The API would not be changed, but the following would become possible:

from holoviews import dim, opts
import numpy as np
import holoviews as hv
hv.extension('bokeh')

xx = np.arange(20).reshape(5, 4)
xx2 = np.c_[xx[:, :2], np.arange(20, 50, 3).reshape(5, 2)]
x = np.r_[xx, xx2]

kdims = ['x', 'y']
vdims = ['z', 'a']
e = hv.Points(np.r_[x, x], kdims, vdims).opts(color='z', colorbar=True)
ds = hv.Dataset(e)

def myfunc(z, a):
    return np.sum(z)+np.sum(a)

dim_transform = dim('z', myfunc, dim('a'))
print(e)
print(e.aggregate(function=dim_transform).data)

yielding (note how the vdims z, a are condensed into one vdim z by the summation

:Points   [x,y]   (z)
[[  0   1  96]
 [  4   5 136]
 [  8   9 176]
 [ 12  13 216]
 [ 16  17 256]]

I'm happy to work on this but wanted to hear what your thoughts are first.

The text was updated successfully, but these errors were encountered:

philippjfr · 2019-09-13T09:30:39Z

This is an interesting proposal, my main concern is that aggregate generally retains all vdims and it's not clear how it would know to drop 'a' but keep 'z' in your example. Originally I had envisioned adding another method for this and was hoping to call it apply but that is now reserved for something else.

philippjfr · 2019-09-13T09:31:50Z

The spelling for apply would have been:

points.apply(z=dim_transform)

i.e. you'd tell it to create a new dimension or overwrite an existing one using the keyword argument.

poplarShift · 2019-10-02T20:38:17Z

I like that API and I think we could build on that.

How about we call it .transform? Then .aggregate with a dim transform would be roughly equivalent to .groupby(kdims).transform(...).

Proposal

API

.transform(v=dim_transform) just returns the applied dim transform.
.groupby(...).transform(v=dim_transform) would return the dim transform applied to each part of the resulting container.

Further issues

We can only handle dim_transforms with scalar output in this way, unless we want to specify an argument that takes the function signature. The current dim syntax does not seem to allow specifying an hv.Dimension name for the output of the operation. If we implemented some way of making that known to the dim transform, we could even go .groupby(...).transform(dim_transform) and holoviews would automatically know which dimensions to insert!

Example

Let me outline with the following example:

a = pd.DataFrame(dict(x=np.array(range(10))%2, y=np.array(range(10))%3, z=np.array(range(10))%4, u=np.array(range(10))%5))
a = hv.Dataset(a, ['x', 'y'], ['z', 'u'])

Now, a.groupby(['x']) returns

:HoloMap   [x]
   :Dataset   [y]   (z,u)

and a.aggregate('x', function=np.sum).data is

   x  z   u
0  0  4  10
1  1  9  10

So far, so good.
Now,

def myfunc(z, a):
    return np.sum(z)+np.sum(a)

dim_transform = dim('z', myfunc, dim('u'))

Hence,

a.groupby(['x']).traverse(lambda e: dim_transform.apply(e), hv.Dataset)

returns [14, 19] (simply the sums of the z and the u column from the groupby above, 4+10 and 9+10).

Then
a.transform(v=dim_transform) would just return the applied dim transform:
hv.Dataset([33])

a.groupby('x').transform(v=dim_transform) would return the dim transform applied to each part of the container.

    x  v
0  0  14
1  1  19

My proposal can almost be implemented with some simple monkey patching already (modulo the insertion into the existing hv.Dataset):

def transform_ds(ds, dim_transform):
    return dim_transform.apply(ds)

hv.Dataset.transform = transform_ds

def transform_nd(ds, dim_transform):
    return ds.traverse(lambda e: dim_transform.apply(e), lambda e: issubclass(type(e), hv.Dataset))

hv.core.NdMapping.transform = transform_nd

github-actions · 2024-10-24T01:48:40Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

poplarShift changed the title ~~Enable .aggregate on datasets using dim transforms~~ Implement .transform method to apply dim transforms across containers Oct 2, 2019

poplarShift mentioned this issue Oct 31, 2019

Multi-dimensional dim transforms on data sets #4080

Merged

4 tasks

philippjfr added this to the v1.13.0 milestone Mar 9, 2020

philippjfr closed this as completed Mar 9, 2020

github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement .transform method to apply dim transforms across containers #3932

Implement .transform method to apply dim transforms across containers #3932

poplarShift commented Aug 27, 2019 •

edited

Loading

philippjfr commented Sep 13, 2019

philippjfr commented Sep 13, 2019

poplarShift commented Oct 2, 2019

github-actions bot commented Oct 24, 2024

Implement .transform method to apply dim transforms across containers #3932

Implement .transform method to apply dim transforms across containers #3932

Comments

poplarShift commented Aug 27, 2019 • edited Loading

philippjfr commented Sep 13, 2019

philippjfr commented Sep 13, 2019

poplarShift commented Oct 2, 2019

Proposal

API

Further issues

Example

github-actions bot commented Oct 24, 2024

poplarShift commented Aug 27, 2019 •

edited

Loading