-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Allow using numpy in DataFrame.eval
and DataFrame.query
via @-notation
#58057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add additional check to make sure that ndim of the provided variable is an int. This makes sure that the ndim guard doesn't trigger if it is something else (e.g., a function in case of a numpy). This allow for using @np in df.eval and df.query methods.
DataFrame.eval
and DataFrame.query
via @-notation
Can you have a look at the CI errors please, your test might be failing. |
Thanks for the review @Aloqeely . I've managed to make all the CIs green. Unfortunately, I've encountered different bug when doing so, filed as #58069 . Which is the reason I had to skip checking for series name in the tests. Strictly speaking this PR and #58069 are unrelated -- one can easily encounter the bug in current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! A few minor requests.
expected = np.floor(df["a"]) | ||
tm.assert_series_equal(expected, res, check_names=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you set the proper name on expected
and then remove check_names=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I can't. The issue is that df.eval(@np.floor(a))
will have different name
s in Linux unittests compared to Win/MacOS unittest. Meaning if I just remove check_names
and:
- leave as-is, the Linux unittests will fail
- change the name of
expected
to"a"
, the Win & MacOS unittests will fail
Please see my previous comment and/or #58069 for a little bit more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a further thought, we can sort of work around the issue by explicitly stating the engine
, which is arguably something we should do anyway. I've updated the PR accordingly -- and also moved the test into more fitting test file. Please take a look and let me know what you think.
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -325,6 +325,7 @@ Bug fixes | |||
- Fixed bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`) | |||
- Fixed bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`) | |||
- Fixed bug in :meth:`DataFrame.cumsum` which was raising ``IndexError`` if dtype is ``timedelta64[ns]`` (:issue:`57956`) | |||
- Fixed bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using numpy via ``@`` notation. (:issue:`58041`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if "when using numpy" is clear - what do you think of doing something like
when using NumPy attributes via
@
notation, e.g.df.eval("@np.floor(a)")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as per suggestion, thanks
Also move the test to more appropriate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @domsmrz |
…@-notation (pandas-dev#58057) * Allow @np to be used within df.eval and df.query Add additional check to make sure that ndim of the provided variable is an int. This makes sure that the ndim guard doesn't trigger if it is something else (e.g., a function in case of a numpy). This allow for using @np in df.eval and df.query methods. * Add whatsnew * Fix typo Co-authored-by: Abdulaziz Aloqeely <52792999+Aloqeely@users.noreply.github.com> * Test: skip checking names due to inconsistencies between OSes * Elaborate futher on whatsnew message * Fix the test by explicitly specifing engine. Also move the test to more appropriate file. --------- Co-authored-by: Abdulaziz Aloqeely <52792999+Aloqeely@users.noreply.github.com>
Added type annotations to new arguments/methods/functions.(not applicable)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.