-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods #23060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23060 +/- ##
=======================================
Coverage 92.16% 92.16%
=======================================
Files 166 166
Lines 51224 51224
=======================================
Hits 47212 47212
Misses 4012 4012
Continue to review full report at Codecov.
|
Woops, accidentally pushed some unrelated commits collecting arithmetic tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. just a couple of questions / comments.
""" | ||
# Note: we use iloc to access columns for compat with cases | ||
# with non-unique columns. | ||
import pandas.core.computation.expressions as expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be imported at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but I think this is a run-time import to make import pandas as pd
faster
if own_default == other_default: | ||
# TOOD: won't this evaluate as False if both are np.nan? | ||
fill_value = own_default | ||
elif np.isnan(own_default) and not np.isnan(other_default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be isna checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought so, but the module-level docstring says only float64 is supported, so I kept the behavior as-is. I think the overall takeaway is that this isn't especially well-maintained, and we should all look forward to Sparse EA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be amendednow that that Sparse EA is here? (followup ok too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend holding off on changing it.
- SparseDataFrame may be going away, so why bother.
- We may have to change the
default_fill_value
if we want its type to match that ofsp_values
(Require the dtype of SparseArray.fill_value and sp_values.dtype to match #23124 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
'floatcol': np.random.randn(10), | ||
'stringcol': list(tm.rands(10))}) | ||
df.loc[np.random.rand(len(df)) > 0.5, 'dates2'] = pd.NaT | ||
ops = {'gt': 'lt', 'lt': 'gt', 'ge': 'le', 'le': 'ge', 'eq': 'eq', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should parameterize if you can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, the point of collecting these arithmetic tests is to parametrize/fixturize and especially de-duplicate them in an upcoming pass.
# DataFrame | ||
assert df.eq(df).values.all() | ||
assert not df.ne(df).values.any() | ||
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs paramaterization!
with tm.assert_raises_regex(ValueError, msg): | ||
f(ndim_5) | ||
|
||
# Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull lthis out to a separatate, parameterized test (future PR is ok for these, though since you are moving around, maybe better here)
lambda x: tm.makeFloatSeries(), | ||
True) | ||
]) | ||
@pytest.mark.parametrize('opname', ['add', 'sub', 'mul', 'floordiv', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally switch to our operators fixture
# == and !=, inequalities should raise | ||
result = x == y | ||
expected = pd.DataFrame({col: x[col] == y[col] | ||
for col in x.columns}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this (next pass ok)
If it will help, I can separate out the unrelated test parts of this. There is a bunch of test cleanup to do, and already a healthy number of test-touching PRs in play. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment, but can be a followup, ping on green.
if own_default == other_default: | ||
# TOOD: won't this evaluate as False if both are np.nan? | ||
fill_value = own_default | ||
elif np.isnan(own_default) and not np.isnan(other_default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be amendednow that that Sparse EA is here? (followup ok too)
@TomAugspurger any idea about this? I have no clue |
also happy to merge this and followup later on sparse refactorings (prob better). |
Ping |
can you rebase. the isort is playing havoc :> |
thanks @jbrockmendel nice as always! |
… SparseDataFrame methods (pandas-dev#23060)
… SparseDataFrame methods (pandas-dev#23060)
… SparseDataFrame methods (pandas-dev#23060)
No description provided.