-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataFrame] Implemented nunique, skew #1995
Conversation
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! Just a few minor comments.
python/ray/dataframe/dataframe.py
Outdated
@@ -604,7 +604,8 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, | |||
return DataFrameGroupBy(self, by, axis, level, as_index, sort, | |||
group_keys, squeeze, **kwargs) | |||
|
|||
def sum(self, axis=None, skipna=True, level=None, numeric_only=None): | |||
def sum(self, axis=None, skipna=True, level=None, numeric_only=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add **kwargs
to header too.
python/ray/dataframe/dataframe.py
Outdated
@@ -604,7 +604,8 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, | |||
return DataFrameGroupBy(self, by, axis, level, as_index, sort, | |||
group_keys, squeeze, **kwargs) | |||
|
|||
def sum(self, axis=None, skipna=True, level=None, numeric_only=None): | |||
def sum(self, axis=None, skipna=True, level=None, numeric_only=None, | |||
min_count=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Pandas docs: "New in version 0.22.0: Added with the default being 1. This means the sum or product of an all-NA or empty series is NaN."
python/ray/dataframe/dataframe.py
Outdated
return df.prod(axis=axis, skipna=skipna, level=level, | ||
numeric_only=numeric_only, min_count=min_count) | ||
|
||
return self._arithmetic_helper(remote_func, axis, level) | ||
|
||
def product(self, axis=None, skipna=None, level=None, numeric_only=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to call prod above instead of duplicating code.
python/ray/dataframe/dataframe.py
Outdated
"""Perform a product across the DataFrame. | ||
|
||
Args: | ||
axis (int): The axis to product on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs complete method docs. Seen in a few more functions in this PR.
|
||
@pytest.fixture | ||
def test_prod(ray_df, pandas_df): | ||
assert(ray_df.prod().sort_index().equals(pandas_df.prod().sort_index())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the ray_df_equals_pandas
helper.
You may want to separate changes out from those of #1994, so when it comes time we can merge both PRs at the same time. |
I took out the changes from #1994 as well |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor testing comments; generally, looks really good! Many of the early comments were addressed in #1994 after code was moved over.
@@ -2205,14 +2203,14 @@ def test_pow(): | |||
test_inter_df_math("pow", simple=False) | |||
|
|||
|
|||
def test_prod(): | |||
def test_prod(ray_df, pandas_df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes should be moved to #1994
ray_df.nunique() | ||
@pytest.fixture | ||
def test_nunique(ray_df, pandas_df): | ||
assert(ray_df_equals_pandas(ray_df.nunique(), pandas_df.nunique())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test both axis here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks @hsubbaraj!
Test FAILed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Passes private-travis. Merged, thanks @hsubbaraj! |
* master: (21 commits) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) add pthread linking (ray-project#1986) [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984) ...
* master: (25 commits) [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973) Clean up syntax for supported Python versions. (ray-project#1963) [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956) [DataFrame] Fix dtypes (ray-project#1930) keep_dims -> keepdims (ray-project#1980) ...
* master: [DataFrame] Add direct pandas imports for MVP (ray-project#1960) Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007) Expand local_dir in Trial init (ray-project#2013) Fixing ascii error for Python2 (ray-project#2009) [DataFrame] Implements df.update (ray-project#1997) [DataFrame] Implements df.as_matrix (ray-project#2001) [DataFrame] Implement quantile (ray-project#1992) [DataFrame] Impement sort_values and sort_index (ray-project#1977) [DataFrame] Implement rank (ray-project#1991) [DataFrame] Implemented prod, product, added test suite (ray-project#1994) [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941) [DataFrame] Implement diff (ray-project#1996) [DataFrame] Implemented nunique, skew (ray-project#1995) [DataFrame] Implements filter and dropna (ray-project#1959) [DataFrame] Implements df.pipe (ray-project#1999) [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
Implemented nunique, skew