Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Support UDF expression for referencing self in assign #12936

Closed
mroeschke opened this issue Mar 13, 2023 · 1 comment · Fixed by #14142
Closed

[FEA]: Support UDF expression for referencing self in assign #12936

mroeschke opened this issue Mar 13, 2023 · 1 comment · Fixed by #14142
Labels
0 - Waiting on Author Waiting for author to respond to review feature request New feature or request Python Affects Python cuDF API.

Comments

@mroeschke
Copy link
Contributor

Is your feature request related to a problem? Please describe.
While working on a pandas to cudf workflow comparison, I noticed that calling assign with a lambda that references the current DataFrame is not supported

Describe the solution you'd like

In [34]: df = pd.DataFrame({"a": [1]})

In [35]: df.assign(b=lambda x: x["a"] + 1)
Out[35]:
   a  b
0  1  2

In [36]: cu_df = cudf.DataFrame({"a": [1]})

In [37]: cu_df.assign(b=lambda x: x["a"] + 1)
TypeError: 'function' object is not iterable

During handling of the above exception, another exception occurred:

ValueError: Unsupported dtype object

During handling of the above exception, another exception occurred:

TypeError: 'function' object is not iterable

Describe alternatives you've considered
Just assigning the column outside the lambda. Unfortunately it's not conducive to method chaining use cases

Additional context
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html?highlight=assign#pandas.DataFrame.assign

@mroeschke mroeschke added feature request New feature or request Needs Triage Need team to review and classify labels Mar 13, 2023
@wence-
Copy link
Contributor

wence- commented Mar 15, 2023

Probably just

diff --git a/python/cudf/cudf/core/dataframe.py b/python/cudf/cudf/core/dataframe.py
index e50c324a8f..b17ba765f7 100644
--- a/python/cudf/cudf/core/dataframe.py
+++ b/python/cudf/cudf/core/dataframe.py
@@ -1470,14 +1470,21 @@ class DataFrame(IndexedFrame, Serializable, GetAttrGetItemMixin):
         2  2  5
         """
         new_df = cudf.DataFrame(index=self.index.copy())
+
+        def make_col(col):
+            if callable(col):
+                return col(self)
+            else:
+                return col
+
         for name, col in self._data.items():
             if name in kwargs:
-                new_df[name] = kwargs.pop(name)
+                new_df[name] = make_col(kwargs.pop(name))
             else:
                 new_df._data[name] = col.copy()
 
         for k, v in kwargs.items():
-            new_df[k] = v
+            new_df[k] = make_col(v)
         return new_df
 
     @classmethod

Can you give that a try?

@GregoryKimball GregoryKimball added 0 - Waiting on Author Waiting for author to respond to review Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 7, 2023
wence- added a commit to wence-/cudf that referenced this issue Sep 21, 2023
While here, change the way the initial copied frame is constructed:
callables are allowed to refer to columns already in the dataframe,
even if they overwrite them.

- Closes rapidsai#12936
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2023
While here, change the way the initial copied frame is constructed:
callables are allowed to refer to columns already in the dataframe,
even if they overwrite them.

- Closes #12936

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #14142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Waiting on Author Waiting for author to respond to review feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants