[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717

Yikun · 2021-11-26T03:18:02Z

What changes were proposed in this pull request?

Bump minimum pandas version to 1.0.5 (or a better version)

Why are the changes needed?

Initial discussion from SPARK-37465 and #34314 (comment) .

Does this PR introduce any user-facing change?

Yes, bump pandas minimun version.

How was this patch tested?

PySpark test passed with pandas v1.0.5.

Yikun · 2021-11-26T03:23:18Z

Just to start the discussion, by using below sql according [1], we can get the all download stat of Pandas in last 3 months.

SELECT
  file.version AS file_version,
  COUNT(*) AS num_downloads,
FROM `the-psf.pypi.file_downloads`
WHERE file.project = 'pandas'
AND 
  -- Only query the last 3 months of history
  DATE(timestamp)
    BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH)
    AND CURRENT_DATE()
GROUP BY `file_version`
ORDER BY `num_downloads` DESC

Here is the Top 20 data, about 77% of the overall data, complete result can be found in here:

	version	number	percent
1	0.25.3	35149221	14.28%
2	1.1.5	28722806	11.67%
3	1.3.4	20944236	8.51%
4	1.3.3	16861573	6.85%
5	0.24.2	13235233	5.38%
6	1.0.5	9201989	3.74%
7	1.3.2	`9077326`	3.69%
8	1.2.5	7902532	3.21%
9	1.2.4	5754284	2.34%
10	1.1.4	5710439	2.32%
11	1.1.0	4760847	1.93%
12	1.1.2	4621441	1.88%
13	1.2.3	4607043	1.87%
14	1.0.3	4601230	1.87%
15	0.23.4	4251044	1.73%
16	0.25.0	3862673	1.57%
17	1.2.1	2952346	1.20%
18	1.0.1	2690006	1.09%
19	0.22.0	2680710	1.09%
20	1.2.0	2645339	1.07%
21	0.24.1	2635411	1.07%

There are more than 60+% users downloaded the 1.x version in last 3 months
There are 26+% users downloaded version v0.23.2 to v1.0

[1] https://packaging.python.org/guides/analyzing-pypi-package-downloads/

HyukjinKwon · 2021-11-26T03:26:55Z

cc @ueshin @xinrong-databricks @itholic FYI

HyukjinKwon · 2021-11-26T03:27:52Z

Let's also update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst#dependencies

SparkQA · 2021-11-26T03:49:26Z

Test build #145645 has finished for PR 34717 at commit 7986d55.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-26T04:09:57Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50117/

SparkQA · 2021-11-26T04:56:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50117/

SparkQA · 2021-11-27T01:38:07Z

Test build #145671 has finished for PR 34717 at commit e521b76.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-27T01:51:14Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50141/

SparkQA · 2021-11-27T02:29:32Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50141/

srowen

Seems OK to go ahead and require the stable-r 1.x release

HyukjinKwon

+1. cc @ueshin @BryanCutler @viirya @xinrong-databricks @itholic FYI

rshkv · 2021-11-28T17:57:16Z

I noticed that IntegralExtensionOpsTest.test_invert fails on Pandas 1.0.0 and succeeds on 1.0.1 (error below). So maybe safer to recommend that version. Otherwise everything seems to work with 1.0.0.

Test failure

======================================================================
ERROR [0.404s]: test_invert (pyspark.pandas.tests.data_type_ops.test_num_ops.IntegralExtensionOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 122, in assertPandasEqual
    **kwargs
  File "/opt/pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/_testing.py", line 1137, in assert_series_equal
    assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File "/opt/pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/_testing.py", line 772, in assert_attr_equal
    raise_assert_detail(obj, msg, left_attr, right_attr)
  File "/opt/pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/_testing.py", line 915, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  object
[right]: Int8

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_num_ops.py", line 498, in test_invert
    self.check_extension(~pser, ~psser)
  File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/testing_utils.py", line 248, in check_extension
    self.assert_eq(left, right)
  File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 223, in assert_eq
    self.assertPandasEqual(lobj, robj, check_exact=check_exact)
  File "/home/circleci/project/python/pyspark/testing/pandasutils.py", line 130, in assertPandasEqual
    raise AssertionError(msg) from e
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  object
[right]: Int8

Left:
dtype: object
object

Right:
dtype: Int8
Int8

itholic · 2021-11-29T00:07:50Z

I noticed that IntegralExtensionOpsTest.test_invert fails on Pandas 1.0.0 and succeeds on 1.0.1 (error below). So maybe safer to recommend that version. Otherwise everything seems to work with 1.0.0.

Yeah, it seems the bug in pandas 1.0.0.

pandas 1.0.0

>>> pser = pd.Series([1, 2, 3, None], dtype="Int8")
>>> pser
0       1
1       2
2       3
3    <NA>
dtype: Int8
>>> ~pser
0      -2
1      -3
2      -4
3    <NA>
dtype: object  # this should've been `Int8`

Resolved in pandas 1.0.1.

pandas 1.0.1

>>> pser = pd.Series([1, 2, 3, None], dtype="Int8")
>>> pser
0       1
1       2
2       3
3    <NA>
dtype: Int8
>>> ~pser
0      -2
1      -3
2      -4
3    <NA>
dtype: Int8

For addressing this,

manually separate this test for pandas 1.0.0.
set the minimum pandas to version 1.0.1.

Not sure which way is better, but I think we can just go with 2 if there is no reason to stick with the 1.0.0.

srowen · 2021-11-29T00:09:44Z

Yeah just require 1.0.1 for this reason

xinrong-meng · 2021-11-29T01:23:23Z

Thanks @Yikun, how do you think about bumping to 1.0.1?

Yikun · 2021-11-29T01:43:42Z

Sure, thanks for your suggestion, I'd like to update. and I added a simple test to install pandas v1.0.1 ~~and run test on #34730 , wait for the result.~~

: (, Update: pandas only publish ubuntu wheel after v1.2....we have to install many deps, otherwise it would be failed when using pip install pandas==1.0.1,so I just install in my local env (macos, x86, yes have the 1.0.1 wheel) and running pip install 'pandas==1.0.1' and python/run-tests --modules=pyspark-pandas,pyspark-pandas-slow --parallelism=2 --python-executable=python3 to test it.

and looks like there were some testcase are failed like:

Test failure

======================================================================
ERROR: test_astype (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py", line 204, in test_astype
    self.assert_eq(pser.astype(int), psser.astype(int))
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 224, in assert_eq
    robj = self._to_pandas(right)
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 245, in _to_pandas
    return obj.to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1588, in to_pandas
    return self._to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1594, in _to_pandas
    return self._to_internal_pandas().copy()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 6349, in _to_internal_pandas
    return self._psdf._internal.to_pandas_frame[self.name]
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/utils.py", line 584, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/internal.py", line 1049, in to_pandas_frame
    pdf = sdf.toPandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/sql/pandas/conversion.py", line 185, in toPandas
    pdf = pd.DataFrame(columns=tmp_column_names).astype(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 239, in init_dict
    val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1449, in construct_1d_arraylike_from_scalar
    dtype = dtype.dtype
AttributeError: type object 'object' has no attribute 'dtype'

----------------------------------------------------------------------

Yikun · 2021-11-29T09:49:44Z

Complete all pyspark-pandas test with:

python/run-tests --modules=pyspark-pandas --parallelism=2 --python-executable=python3

Serveral test cases failed (4 cases failed due to same issue) in 1.0.1 due to AttributeError: type object 'object' has no attribute 'dtype' and passed with pandas v1.0.5 (It might be fixed in pandas-dev/pandas#34667).

Test failure details

======================================================================
ERROR: test_astype (pyspark.pandas.tests.data_type_ops.test_categorical_ops.CategoricalOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py", line 204, in test_astype
    self.assert_eq(pser.astype(int), psser.astype(int))
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 224, in assert_eq
    robj = self._to_pandas(right)
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 245, in _to_pandas
    return obj.to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1588, in to_pandas
    return self._to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 1594, in _to_pandas
    return self._to_internal_pandas().copy()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/series.py", line 6349, in _to_internal_pandas
    return self._psdf._internal.to_pandas_frame[self.name]
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/utils.py", line 584, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/internal.py", line 1049, in to_pandas_frame
    pdf = sdf.toPandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/sql/pandas/conversion.py", line 185, in toPandas
    pdf = pd.DataFrame(columns=tmp_column_names).astype(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 239, in init_dict
    val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1449, in construct_1d_arraylike_from_scalar
    dtype = dtype.dtype
AttributeError: type object 'object' has no attribute 'dtype'

======================================================================
ERROR: test_read_csv (pyspark.pandas.tests.test_csv.CsvTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/test_csv.py", line 151, in test_read_csv
    check(usecols=[])
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/test_csv.py", line 138, in check
    self.assert_eq(expected, actual, almost=True)
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 224, in assert_eq
    robj = self._to_pandas(right)
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 245, in _to_pandas
    return obj.to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/frame.py", line 4856, in to_pandas
    return self._to_pandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/frame.py", line 4862, in _to_pandas
    return self._internal.to_pandas_frame.copy()
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/utils.py", line 584, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/internal.py", line 1049, in to_pandas_frame
    pdf = sdf.toPandas()
  File "/Users/jiangyikun/spark/spark/python/pyspark/sql/pandas/conversion.py", line 185, in toPandas
    pdf = pd.DataFrame(columns=tmp_column_names).astype(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 239, in init_dict
    val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1449, in construct_1d_arraylike_from_scalar
    dtype = dtype.dtype
AttributeError: type object 'object' has no attribute 'dtype'

======================================================================
ERROR: test_kde_plot (pyspark.pandas.tests.plot.test_frame_plot_plotly.DataFramePlotPlotlyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py", line 262, in test_kde_plot
    actual = psdf.plot.kde(bw_method=5, ind=3)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/core.py", line 946, in kde
    return self(kind="kde", bw_method=bw_method, ind=ind, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/core.py", line 498, in __call__
    return plot_backend.plot_pandas_on_spark(plot_data, kind=kind, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/plotly.py", line 44, in plot_pandas_on_spark
    return plot_kde(data, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/plotly.py", line 202, in plot_kde
    pd.DataFrame(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 69, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 322, in _homogenize
    val = sanitize_array(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/construction.py", line 465, in sanitize_array
    subarr = construct_1d_arraylike_from_scalar(value, len(index), dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1461, in construct_1d_arraylike_from_scalar
    subarr = np.empty(length, dtype=dtype)
TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

======================================================================
ERROR: test_kde_plot (pyspark.pandas.tests.plot.test_series_plot_plotly.SeriesPlotPlotlyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py", line 231, in test_kde_plot
    actual = psdf.a.plot.kde(bw_method=5, ind=3)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/core.py", line 946, in kde
    return self(kind="kde", bw_method=bw_method, ind=ind, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/core.py", line 498, in __call__
    return plot_backend.plot_pandas_on_spark(plot_data, kind=kind, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/plotly.py", line 44, in plot_pandas_on_spark
    return plot_kde(data, **kwargs)
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/plot/plotly.py", line 202, in plot_kde
    pd.DataFrame(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 435, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 69, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 322, in _homogenize
    val = sanitize_array(
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/construction.py", line 465, in sanitize_array
    subarr = construct_1d_arraylike_from_scalar(value, len(index), dtype)
  File "/Users/jiangyikun/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1461, in construct_1d_arraylike_from_scalar
    subarr = np.empty(length, dtype=dtype)
TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

At this time, I prefer to update to 1.0.5, I'm going to run pyspark-pandas-slow now.

Yikun · 2021-11-29T11:41:08Z

There are only a precision error of pyspark-pandas-slow testcase, we could add almost flag:

Test failure details

======================================================================
FAIL: test_mad (pyspark.pandas.tests.test_series.SeriesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jiangyikun/spark/spark/python/pyspark/pandas/tests/test_series.py", line 2235, in test_mad
    self.assert_eq(pser.mad(), psser.mad())
  File "/Users/jiangyikun/spark/spark/python/pyspark/testing/pandasutils.py", line 240, in assert_eq
    self.assertEqual(lobj, robj)
AssertionError: 21.555555555555554 != 21.555555555555557

----------------------------------------------------------------------

Yikun · 2021-11-29T11:54:50Z

As a conclusion in here:

before v1.0.1: test failed due to [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717 (comment) [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717 (comment)
before v1.0.5: test failed due to [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717 (comment)
v1.0.5: There are only a precision problem, so I just added almost=True to assert_eq to fix the testcase

So, I bump minimum pandas version to v1.0.5, the v1.0.5 is also the latest version of Pandas verions 1.0.

Ready for review. : )

SparkQA · 2021-11-29T12:55:25Z

Test build #145720 has finished for PR 34717 at commit 7b1de6d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-29T14:07:25Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50189/

SparkQA · 2021-11-29T14:50:15Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50189/

viirya

Seems okay. One comment about the doc.

viirya · 2021-11-29T17:14:57Z

python/docs/source/user_guide/sql/arrow_pandas.rst

@@ -387,7 +387,7 @@ working with timestamps in ``pandas_udf``\s to get the best performance, see
 Recommended Pandas and PyArrow Versions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-For usage with pyspark.sql, the minimum supported versions of Pandas is 0.23.2 and PyArrow is 1.0.0.
+For usage with pyspark.sql, the minimum supported versions of Pandas is 1.0.5 and PyArrow is 1.0.0.


Should we mention there are some issues with versions like 1.0.0, 1.0.1?

How about:

For usage with pyspark.sql, the minimum supported versions of Pandas is 1.0.5 and PyArrow is 1.0.0. Lower versions (such as there are some known issues under with v1.0.0, v1.0.1, see more in link) or higher versions may be used, however, compatibility and data correctness can not be guaranteed and should be verified by the user.

Maybe need more suggestion from native speaker. T_T, and if it's necessary we could do it in next commits in this PR or followup.

python/pyspark/pandas/tests/test_series.py

BryanCutler

LGTM, I think v1.0.5 is a reasonable minimum

itholic · 2021-11-30T01:23:48Z

LGTM if remaining comments are resolved.

SparkQA · 2021-11-30T03:58:01Z

Test build #145744 has finished for PR 34717 at commit 054905f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-30T04:19:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50215/

SparkQA · 2021-11-30T05:04:52Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50215/

HyukjinKwon · 2021-11-30T05:13:25Z

Merged to master.

github-actions bot added the PYTHON label Nov 26, 2021

sarutak mentioned this pull request Nov 26, 2021

[SPARK-37465][Python][WIP] Raise minimum supported Pandas version to 1.0.0 #34724

Closed

Yikun force-pushed the pandas-min-version branch from 7986d55 to e521b76 Compare November 27, 2021 01:01

github-actions bot added CORE SQL labels Nov 27, 2021

Yikun changed the title ~~Bump minimum pandas version to 1.0.0~~ [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.0 Nov 27, 2021

Yikun marked this pull request as ready for review November 27, 2021 02:46

srowen reviewed Nov 27, 2021

View reviewed changes

HyukjinKwon reviewed Nov 28, 2021

View reviewed changes

Yikun force-pushed the pandas-min-version branch from e521b76 to 7b1de6d Compare November 29, 2021 11:45

Yikun changed the title ~~[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.0~~ [SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 Nov 29, 2021

viirya reviewed Nov 29, 2021

View reviewed changes

ueshin reviewed Nov 29, 2021

View reviewed changes

python/pyspark/pandas/tests/test_series.py Outdated Show resolved Hide resolved

BryanCutler approved these changes Nov 30, 2021

View reviewed changes

itholic approved these changes Nov 30, 2021

View reviewed changes

HyukjinKwon approved these changes Nov 30, 2021

View reviewed changes

Bump minimum pandas version to 1.0.5

054905f

Yikun force-pushed the pandas-min-version branch from 7b1de6d to 054905f Compare November 30, 2021 03:23

dongjoon-hyun approved these changes Nov 30, 2021

View reviewed changes

HyukjinKwon closed this in 3657703 Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717

[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717

Yikun commented Nov 26, 2021 •

edited

Loading

Yikun commented Nov 26, 2021 •

edited

Loading

HyukjinKwon commented Nov 26, 2021

HyukjinKwon commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 27, 2021

SparkQA commented Nov 27, 2021

SparkQA commented Nov 27, 2021

srowen left a comment

HyukjinKwon left a comment

rshkv commented Nov 28, 2021 •

edited

Loading

itholic commented Nov 29, 2021

srowen commented Nov 29, 2021

xinrong-meng commented Nov 29, 2021

Yikun commented Nov 29, 2021 •

edited

Loading

Yikun commented Nov 29, 2021 •

edited

Loading

Yikun commented Nov 29, 2021

Yikun commented Nov 29, 2021 •

edited

Loading

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

viirya left a comment

viirya Nov 29, 2021

Yikun Nov 30, 2021 •

edited

Loading

BryanCutler left a comment

itholic commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

HyukjinKwon commented Nov 30, 2021

[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717

[SPARK-37465][PYTHON] Bump minimum pandas version to 1.0.5 #34717

Conversation

Yikun commented Nov 26, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Yikun commented Nov 26, 2021 • edited Loading

HyukjinKwon commented Nov 26, 2021

HyukjinKwon commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 26, 2021

SparkQA commented Nov 27, 2021

SparkQA commented Nov 27, 2021

SparkQA commented Nov 27, 2021

srowen left a comment

Choose a reason for hiding this comment

HyukjinKwon left a comment

Choose a reason for hiding this comment

rshkv commented Nov 28, 2021 • edited Loading

itholic commented Nov 29, 2021

srowen commented Nov 29, 2021

xinrong-meng commented Nov 29, 2021

Yikun commented Nov 29, 2021 • edited Loading

Yikun commented Nov 29, 2021 • edited Loading

Yikun commented Nov 29, 2021

Yikun commented Nov 29, 2021 • edited Loading

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

SparkQA commented Nov 29, 2021

viirya left a comment

Choose a reason for hiding this comment

viirya Nov 29, 2021

Choose a reason for hiding this comment

Yikun Nov 30, 2021 • edited Loading

Choose a reason for hiding this comment

BryanCutler left a comment

Choose a reason for hiding this comment

itholic commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

SparkQA commented Nov 30, 2021

HyukjinKwon commented Nov 30, 2021

Yikun commented Nov 26, 2021 •

edited

Loading

Yikun commented Nov 26, 2021 •

edited

Loading

rshkv commented Nov 28, 2021 •

edited

Loading

Yikun commented Nov 29, 2021 •

edited

Loading

Yikun commented Nov 29, 2021 •

edited

Loading

Yikun commented Nov 29, 2021 •

edited

Loading

Yikun Nov 30, 2021 •

edited

Loading