Skip to content

Conversation

jorisvandenbossche
Copy link
Member

This PR tries to gather some checks about whether numexpr can be used or not which can be done upfront on the DataFrame level, before dispatching to the array_ops.
For example, for a scalar right operand, we can check it only once if it's compatible with numexpr usage. Or for ArrayManager, we can check the size constraint once upfront (since each array has the same size). For those cases, this avoids overhead in expressions.evaluate.

To this end, I created a new can_use_numexpr that starts to gather some checks (there might be more that can be added), call this up front in DataFrame._dispatch_frame_op (this could IMO be bit cleaner with #39772, but the if/else checks inside _dispatch_frame_op work as well). And then the result of can_use_numexpr is passed through to the array_ops.

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 23, 2021
@jorisvandenbossche
Copy link
Member Author

There failing tests related to warning/error not being raised for boolean arithmetic ops, but (more easily) fixing those tests depends on #41161.
And I would also first like to see #40463 merged for better test coverage.

----------
op : operator
size : int
dtypes : list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list[DtypeObj]?

else:
use_numexpr = expressions.can_use_numexpr(func, None, None, right)

array_op = ops.get_array_op(func, use_numexpr=use_numexpr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the get_array_op on L6832 still needed?


if isinstance(self._mgr, ArrayManager):
use_numexpr = expressions.can_use_numexpr(
func, self.shape[1], (right.dtype,), None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does self.shape[1] correspond to the size of some array?

else:
use_numexpr = expressions.can_use_numexpr(
func, None, (right.dtype,), None
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ~8 lines (x4) seem like a reasonable cope for an AM/BM method



def _na_arithmetic_op(left, right, op, is_cmp: bool = False):
def _na_arithmetic_op(left, right, op, is_cmp: bool = False, use_numexpr=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotate, docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use_numexpr be keyword-only

@jbrockmendel
Copy link
Member

any estimates for perf impact?

@simonjayhawkins simonjayhawkins added the Performance Memory or execution speed performance label May 25, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 25, 2021
@simonjayhawkins
Copy link
Member

There failing tests related to warning/error not being raised for boolean arithmetic ops, but (more easily) fixing those tests depends on #41161.
And I would also first like to see #40463 merged for better test coverage.

#41161 and #40463 have been merged. can you address @jbrockmendel comments.

Parameters
----------
op : operator
size : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int or None?


# required min elements (otherwise we are adding overhead)
if a.size > _MIN_ELEMENTS:
if isinstance(b, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the place where this is moved from has a comment # can never use numexpr. not a huge loss since the comment isn't that informative, but at some point we should get a helpful comment about why it cant be used

@mroeschke
Copy link
Member

Thanks for the PR but appears to have gone stale. Closing to clear the queue, but feel free to reopen if you find the time to revisit this PR and the comments.

@mroeschke mroeschke closed this Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance Refactor Internal refactoring of code Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants