Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Implements df.update #1997

Merged
merged 4 commits into from
May 7, 2018

Conversation

SaladRaider
Copy link
Contributor

What do these changes do?

Implements df.update

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5167/
Test PASSed.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function requires more nuanced handling. This will require logic similar to the join function implemented by @devin-petersohn.

"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")

if join != 'left': # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This # pragma: no cover is an artifact from pandas integration with code coverage tools. Unnecessary for our case.


def _update_part(x, y, axis, level):
new_df = x.copy()
new_df.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inner partition index does not match outer index in our architecture. This will not work here, you will need to handle the difference in the outer and inner index when calling the underlying pandas update function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I noticed in the pandas source, that they align the indexes up before performing the update. Do you think it would be best for me to wait unitl reindex/reindex_like get's merged in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, i'll look at devin's join to see if I can do it without using reindex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a _reindex_helper in utils. It is only used for join currently.

return new_df

result = self._operator_helper(
_update_part, other, 0, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other here can potentially be an entire DataFrame which is potentially a huge object, this should not be passed to each remote call.

Copy link
Contributor

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

return x

self._inter_df_op_helper(
lambda x, y: update_helper(x, y),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass update_helper directly here; does not need a lambda.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5245/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5249/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5253/
Test PASSed.

@devin-petersohn devin-petersohn merged commit 1f82a46 into ray-project:master May 7, 2018
@devin-petersohn
Copy link
Member

Passes on private-travis. Merged, thanks @SaladRaider!

alok added a commit to alok/ray that referenced this pull request May 8, 2018
* master: (21 commits)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  add pthread linking (ray-project#1986)
  [DataFrame] Add layer of abstraction to allow OID instantiation (ray-project#1984)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master: (25 commits)
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
  Clean up syntax for supported Python versions. (ray-project#1963)
  [DataFrame] Implements mode, to_datetime, and get_dummies (ray-project#1956)
  [DataFrame] Fix dtypes (ray-project#1930)
  keep_dims -> keepdims (ray-project#1980)
  ...
alok added a commit to alok/ray that referenced this pull request May 9, 2018
* master:
  [DataFrame] Add direct pandas imports for MVP (ray-project#1960)
  Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (ray-project#2007)
  Expand local_dir in Trial init (ray-project#2013)
  Fixing ascii error for Python2 (ray-project#2009)
  [DataFrame] Implements df.update (ray-project#1997)
  [DataFrame] Implements df.as_matrix (ray-project#2001)
  [DataFrame] Implement quantile (ray-project#1992)
  [DataFrame] Impement sort_values and sort_index (ray-project#1977)
  [DataFrame] Implement rank (ray-project#1991)
  [DataFrame] Implemented prod, product, added test suite (ray-project#1994)
  [DataFrame] Implemented __setitem__, select_dtypes, and astype (ray-project#1941)
  [DataFrame] Implement diff (ray-project#1996)
  [DataFrame] Implemented nunique, skew (ray-project#1995)
  [DataFrame] Implements filter and dropna (ray-project#1959)
  [DataFrame] Implements df.pipe (ray-project#1999)
  [DataFrame] Apply() for Lists and Dicts (ray-project#1973)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants