Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implemented nunique, skew #1995

Merged
merged 10 commits into from
May 4, 2018

Conversation

hsubbaraj
Copy link
Contributor

Implemented nunique, skew

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5165/
Test PASSed.

@devin-petersohn devin-petersohn changed the title Implemented nunique, skew [DataFrame] Implemented nunique, skew May 3, 2018
Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Just a few minor comments.

@@ -604,7 +604,8 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
return DataFrameGroupBy(self, by, axis, level, as_index, sort,
group_keys, squeeze, **kwargs)

def sum(self, axis=None, skipna=True, level=None, numeric_only=None):
def sum(self, axis=None, skipna=True, level=None, numeric_only=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add **kwargs to header too.

@@ -604,7 +604,8 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
return DataFrameGroupBy(self, by, axis, level, as_index, sort,
group_keys, squeeze, **kwargs)

def sum(self, axis=None, skipna=True, level=None, numeric_only=None):
def sum(self, axis=None, skipna=True, level=None, numeric_only=None,
min_count=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Pandas docs: "New in version 0.22.0: Added with the default being 1. This means the sum or product of an all-NA or empty series is NaN."

return df.prod(axis=axis, skipna=skipna, level=level,
numeric_only=numeric_only, min_count=min_count)

return self._arithmetic_helper(remote_func, axis, level)

def product(self, axis=None, skipna=None, level=None, numeric_only=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to call prod above instead of duplicating code.

"""Perform a product across the DataFrame.

Args:
axis (int): The axis to product on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs complete method docs. Seen in a few more functions in this PR.


@pytest.fixture
def test_prod(ray_df, pandas_df):
assert(ray_df.prod().sort_index().equals(pandas_df.prod().sort_index()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the ray_df_equals_pandas helper.

@p-yang
Copy link
Contributor

p-yang commented May 4, 2018

You may want to separate changes out from those of #1994, so when it comes time we can merge both PRs at the same time.

@hsubbaraj
Copy link
Contributor Author

I took out the changes from #1994 as well

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5192/
Test PASSed.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor testing comments; generally, looks really good! Many of the early comments were addressed in #1994 after code was moved over.

@@ -2205,14 +2203,14 @@ def test_pow():
test_inter_df_math("pow", simple=False)


def test_prod():
def test_prod(ray_df, pandas_df):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be moved to #1994

ray_df.nunique()
@pytest.fixture
def test_nunique(ray_df, pandas_df):
assert(ray_df_equals_pandas(ray_df.nunique(), pandas_df.nunique()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test both axis here.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Copy link
Member

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @hsubbaraj!

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5202/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5203/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5207/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5219/
Test PASSed.

@devin-petersohn devin-petersohn merged commit a58629f into ray-project:master May 4, 2018
@devin-petersohn
Copy link
Member

Passes private-travis. Merged, thanks @hsubbaraj!

alok added a commit to alok/ray that referenced this pull request May 8, 2018
* master: (21 commits)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  add pthread linking (ray-project#1986)
  [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master: (25 commits)
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master:
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants