Skip to content

Commit

Permalink
[SPARK-50001][PYTHON][PS][CONNECT] Adjust "precision" to be part of k…
Browse files Browse the repository at this point in the history
…wargs for box plots

### What changes were proposed in this pull request?
Adjust "precision" to be kwargs for box plots in both Pandas on Spark and PySpark.

### Why are the changes needed?
Per discussion here (apache#48445 (comment)), precision is Spark-specific implementation detail, so we wanted to keep “precision” as part of kwargs for box plots.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48513 from xinrong-meng/precision.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
  • Loading branch information
xinrong-meng authored and ericm-db committed Oct 22, 2024
1 parent 2db37dc commit 58c902a
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 16 deletions.
15 changes: 7 additions & 8 deletions python/pyspark/pandas/plot/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -841,7 +841,7 @@ def barh(self, x=None, y=None, **kwargs):
elif isinstance(self.data, DataFrame):
return self(kind="barh", x=x, y=y, **kwargs)

def box(self, precision=0.01, **kwds):
def box(self, **kwds):
"""
Make a box plot of the DataFrame columns.
Expand All @@ -857,12 +857,11 @@ def box(self, precision=0.01, **kwds):
Parameters
----------
precision: scalar, default = 0.01
This argument is used by pandas-on-Spark to compute approximate statistics
for building a boxplot. Use *smaller* values to get more precise
statistics.
**kwds : optional
Additional keyword arguments are documented in
**kwds : dict, optional
Extra arguments to `precision`: refer to a float that is used by
pandas-on-Spark to compute approximate statistics for building a
boxplot. The default value is 0.01. Use smaller values to get more
precise statistics. Additional keyword arguments are documented in
:meth:`pyspark.pandas.Series.plot`.
Returns
Expand Down Expand Up @@ -901,7 +900,7 @@ def box(self, precision=0.01, **kwds):
from pyspark.pandas import DataFrame, Series

if isinstance(self.data, (Series, DataFrame)):
return self(kind="box", precision=precision, **kwds)
return self(kind="box", **kwds)

def hist(self, bins=10, **kwds):
"""
Expand Down
13 changes: 5 additions & 8 deletions python/pyspark/sql/plot/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,9 +359,7 @@ def pie(self, x: str, y: str, **kwargs: Any) -> "Figure":
)
return self(kind="pie", x=x, y=y, **kwargs)

def box(
self, column: Union[str, List[str]], precision: float = 0.01, **kwargs: Any
) -> "Figure":
def box(self, column: Union[str, List[str]], **kwargs: Any) -> "Figure":
"""
Make a box plot of the DataFrame columns.
Expand All @@ -377,11 +375,10 @@ def box(
----------
column: str or list of str
Column name or list of names to be used for creating the boxplot.
precision: float, default = 0.01
This argument is used by pyspark to compute approximate statistics
for building a boxplot.
**kwargs
Additional keyword arguments.
Extra arguments to `precision`: refer to a float that is used by
pyspark to compute approximate statistics for building a boxplot.
The default value is 0.01. Use smaller values to get more precise statistics.
Returns
-------
Expand All @@ -404,7 +401,7 @@ def box(
>>> df.plot.box(column="math_score") # doctest: +SKIP
>>> df.plot.box(column=["math_score", "english_score"]) # doctest: +SKIP
"""
return self(kind="box", column=column, precision=precision, **kwargs)
return self(kind="box", column=column, **kwargs)

def kde(
self,
Expand Down

0 comments on commit 58c902a

Please sign in to comment.