Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement .transform method to apply dim transforms across containers #3932

Closed
poplarShift opened this issue Aug 27, 2019 · 4 comments
Closed
Milestone

Comments

@poplarShift
Copy link
Collaborator

poplarShift commented Aug 27, 2019

I think it could be very much worthwhile to implement data aggregation using dim transforms from the ground up (i.e. in each interface). This would give us any sort of multidimensional transforms on HexTiles such as proposed in #3636 etc. for free.

The API would not be changed, but the following would become possible:

from holoviews import dim, opts
import numpy as np
import holoviews as hv
hv.extension('bokeh')

xx = np.arange(20).reshape(5, 4)
xx2 = np.c_[xx[:, :2], np.arange(20, 50, 3).reshape(5, 2)]
x = np.r_[xx, xx2]

kdims = ['x', 'y']
vdims = ['z', 'a']
e = hv.Points(np.r_[x, x], kdims, vdims).opts(color='z', colorbar=True)
ds = hv.Dataset(e)

def myfunc(z, a):
    return np.sum(z)+np.sum(a)

dim_transform = dim('z', myfunc, dim('a'))
print(e)
print(e.aggregate(function=dim_transform).data)

yielding (note how the vdims z, a are condensed into one vdim z by the summation

:Points   [x,y]   (z)
[[  0   1  96]
 [  4   5 136]
 [  8   9 176]
 [ 12  13 216]
 [ 16  17 256]]

I'm happy to work on this but wanted to hear what your thoughts are first.

@philippjfr
Copy link
Member

This is an interesting proposal, my main concern is that aggregate generally retains all vdims and it's not clear how it would know to drop 'a' but keep 'z' in your example. Originally I had envisioned adding another method for this and was hoping to call it apply but that is now reserved for something else.

@philippjfr
Copy link
Member

The spelling for apply would have been:

points.apply(z=dim_transform)

i.e. you'd tell it to create a new dimension or overwrite an existing one using the keyword argument.

@poplarShift
Copy link
Collaborator Author

I like that API and I think we could build on that.

How about we call it .transform? Then .aggregate with a dim transform would be roughly equivalent to .groupby(kdims).transform(...).

Proposal

API

.transform(v=dim_transform) just returns the applied dim transform.
.groupby(...).transform(v=dim_transform) would return the dim transform applied to each part of the resulting container.

Further issues

We can only handle dim_transforms with scalar output in this way, unless we want to specify an argument that takes the function signature. The current dim syntax does not seem to allow specifying an hv.Dimension name for the output of the operation. If we implemented some way of making that known to the dim transform, we could even go .groupby(...).transform(dim_transform) and holoviews would automatically know which dimensions to insert!

Example

Let me outline with the following example:

a = pd.DataFrame(dict(x=np.array(range(10))%2, y=np.array(range(10))%3, z=np.array(range(10))%4, u=np.array(range(10))%5))
a = hv.Dataset(a, ['x', 'y'], ['z', 'u'])

Now, a.groupby(['x']) returns

:HoloMap   [x]
   :Dataset   [y]   (z,u)

and a.aggregate('x', function=np.sum).data is

   x  z   u
0  0  4  10
1  1  9  10

So far, so good.
Now,

def myfunc(z, a):
    return np.sum(z)+np.sum(a)

dim_transform = dim('z', myfunc, dim('u'))

Hence,

a.groupby(['x']).traverse(lambda e: dim_transform.apply(e), hv.Dataset)

returns [14, 19] (simply the sums of the z and the u column from the groupby above, 4+10 and 9+10).

Then
a.transform(v=dim_transform) would just return the applied dim transform:
hv.Dataset([33])

a.groupby('x').transform(v=dim_transform) would return the dim transform applied to each part of the container.

    x  v
0  0  14
1  1  19

My proposal can almost be implemented with some simple monkey patching already (modulo the insertion into the existing hv.Dataset):

def transform_ds(ds, dim_transform):
    return dim_transform.apply(ds)

hv.Dataset.transform = transform_ds

def transform_nd(ds, dim_transform):
    return ds.traverse(lambda e: dim_transform.apply(e), lambda e: issubclass(type(e), hv.Dataset))

hv.core.NdMapping.transform = transform_nd

@poplarShift poplarShift changed the title Enable .aggregate on datasets using dim transforms Implement .transform method to apply dim transforms across containers Oct 2, 2019
@philippjfr philippjfr added this to the v1.13.0 milestone Mar 9, 2020
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants