Skip to content

BUG: Allow using numpy in DataFrame.eval and DataFrame.query via @-notation #58057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 8, 2024

Conversation

domsmrz
Copy link
Contributor

@domsmrz domsmrz commented Mar 28, 2024

domsmrz and others added 3 commits March 28, 2024 23:45
Add additional check to make sure that ndim of the provided
variable is an int. This makes sure that the ndim guard doesn't
trigger if it is something else (e.g., a function in case of a
numpy). This allow for using @np in df.eval and df.query methods.
@domsmrz domsmrz changed the title Np eval fix BUG: Allow using numpy in DataFrame.eval and DataFrame.query via @-notation Mar 28, 2024
@Aloqeely
Copy link
Member

Can you have a look at the CI errors please, your test might be failing.

Co-authored-by: Abdulaziz Aloqeely <52792999+Aloqeely@users.noreply.github.com>
@domsmrz
Copy link
Contributor Author

domsmrz commented Mar 29, 2024

Thanks for the review @Aloqeely . I've managed to make all the CIs green. Unfortunately, I've encountered different bug when doing so, filed as #58069 . Which is the reason I had to skip checking for series name in the tests. Strictly speaking this PR and #58069 are unrelated -- one can easily encounter the bug in current main even without accepting the PR, as explained in the bug description. That being said I can see how accepting this PR may nudge users to actually encounter the bug a bit more often. Let me know what would you like to do with this PR.

@mroeschke mroeschke requested a review from rhshadrach April 1, 2024 17:55
@mroeschke mroeschke added the expressions pd.eval, query label Apr 1, 2024
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few minor requests.

Comment on lines 744 to 745
expected = np.floor(df["a"])
tm.assert_series_equal(expected, res, check_names=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set the proper name on expected and then remove check_names=False.

Copy link
Contributor Author

@domsmrz domsmrz Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't. The issue is that df.eval(@np.floor(a)) will have different names in Linux unittests compared to Win/MacOS unittest. Meaning if I just remove check_names and:

  • leave as-is, the Linux unittests will fail
  • change the name of expected to "a", the Win & MacOS unittests will fail

Please see my previous comment and/or #58069 for a little bit more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a further thought, we can sort of work around the issue by explicitly stating the engine, which is arguably something we should do anyway. I've updated the PR accordingly -- and also moved the test into more fitting test file. Please take a look and let me know what you think.

@@ -325,6 +325,7 @@ Bug fixes
- Fixed bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
- Fixed bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
- Fixed bug in :meth:`DataFrame.cumsum` which was raising ``IndexError`` if dtype is ``timedelta64[ns]`` (:issue:`57956`)
- Fixed bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using numpy via ``@`` notation. (:issue:`58041`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if "when using numpy" is clear - what do you think of doing something like

when using NumPy attributes via @ notation, e.g. df.eval("@np.floor(a)")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as per suggestion, thanks

Also move the test to more appropriate file.
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke added this to the 3.0 milestone Apr 8, 2024
@mroeschke mroeschke merged commit f3ddd2b into pandas-dev:main Apr 8, 2024
46 checks passed
@mroeschke
Copy link
Member

Thanks @domsmrz

@domsmrz domsmrz deleted the np-eval-fix branch April 9, 2024 08:35
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…@-notation (pandas-dev#58057)

* Allow @np to be used within df.eval and df.query

Add additional check to make sure that ndim of the provided
variable is an int. This makes sure that the ndim guard doesn't
trigger if it is something else (e.g., a function in case of a
numpy). This allow for using @np in df.eval and df.query methods.

* Add whatsnew

* Fix typo

Co-authored-by: Abdulaziz Aloqeely <52792999+Aloqeely@users.noreply.github.com>

* Test: skip checking names due to inconsistencies between OSes

* Elaborate futher on whatsnew message

* Fix the test by explicitly specifing engine.

Also move the test to more appropriate file.

---------

Co-authored-by: Abdulaziz Aloqeely <52792999+Aloqeely@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions pd.eval, query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Numpy not directly usable in DataFrame.eval and DataFrame.query
4 participants