Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49948][PS][CONNECT] Add parameter "precision" to pandas on Spark box plot #48445

Closed
wants to merge 2 commits into from

Conversation

xinrong-meng
Copy link
Member

What changes were proposed in this pull request?

Add parameter "precision" to pandas on Spark box plot.

Why are the changes needed?

Previously, the box method used **kwds, allowing precision to be passed implicitly. Now, adding precision directly to the signature ensures clarity and explicit control, improving usability.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

precision: scalar, default = 0.01
This argument is used by pandas-on-Spark to compute approximate statistics
for building a boxplot. Use *smaller* values to get more precise
statistics (matplotlib-only).
statistics (matplotlib-only).cccccbdvtdlhreffhieutnkglfeibhferhfctieuiiln
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Contributor

@zhengruifeng zhengruifeng Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matplotlib-only

after this change, will it also take effect in plotly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t sure why we added “matplotlib-only” in the first place, because it works with Plotly as shown here.
Removed.

@xinrong-meng
Copy link
Member Author

Merged to master, thank you!

@@ -841,7 +841,7 @@ def barh(self, x=None, y=None, **kwargs):
elif isinstance(self.data, DataFrame):
return self(kind="barh", x=x, y=y, **kwargs)

def box(self, **kwds):
def box(self, precision=0.01, **kwds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, I think we'd better still hide precision in **kwds, because it is a spark-sepcific implementation details.

we can document it in under the **kwds docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! Created https://issues.apache.org/jira/browse/SPARK-50001 for that.

himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…rk box plot

### What changes were proposed in this pull request?
Add parameter "precision" to pandas on Spark box plot.

### Why are the changes needed?
Previously, the box method used **kwds, allowing precision to be passed implicitly. Now, adding precision directly to the signature ensures clarity and explicit control, improving usability.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48445 from xinrong-meng/ps_box.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
xinrong-meng added a commit that referenced this pull request Oct 21, 2024
…wargs for box plots

### What changes were proposed in this pull request?
Adjust "precision" to be kwargs for box plots in both Pandas on Spark and PySpark.

### Why are the changes needed?
Per discussion here (#48445 (comment)), precision is Spark-specific implementation detail, so we wanted to keep “precision” as part of kwargs for box plots.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48513 from xinrong-meng/precision.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Oct 22, 2024
…wargs for box plots

### What changes were proposed in this pull request?
Adjust "precision" to be kwargs for box plots in both Pandas on Spark and PySpark.

### Why are the changes needed?
Per discussion here (apache#48445 (comment)), precision is Spark-specific implementation detail, so we wanted to keep “precision” as part of kwargs for box plots.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48513 from xinrong-meng/precision.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants