Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing resample with a different timezone #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

attilapiros
Copy link
Owner

TEST

@attilapiros attilapiros force-pushed the test-resample-with-tz branch 2 times, most recently from 8b231e7 to 630bbce Compare August 8, 2023 00:05
@attilapiros
Copy link
Owner Author

attilapiros commented Aug 8, 2023

failed with:

Running tests...
----------------------------------------------------------------------
timezone: UTC
  test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests) ... Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

[Stage 0:>                                                          (0 + 2) / 2]

[Stage 0:=============================>                             (1 + 1) / 2]

                                                                                
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:649: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
FAIL (26.468s)
  test_missing (pyspark.pandas.tests.test_resample.ResampleTests) ... ok (0.133s)
  test_resample_error (pyspark.pandas.tests.test_resample.ResampleTests) ... ok (2.493s)
  test_resample_on (pyspark.pandas.tests.test_resample.ResampleTests) ... /__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
ok (1.922s)
  test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests) ... /__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/__w/spark/spark/python/pyspark/pandas/groupby.py:893: FutureWarning: Default value of `numeric_only` will be changed to `False` instead of `True` in 4.0.0.
  warnings.warn(
/__w/spark/spark/python/pyspark/pandas/utils.py:1021: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
  warnings.warn(message, PandasAPIOnSparkAdviceWarning)
FAIL (4.219s)

======================================================================
FAIL [26.468s]: test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 269, in test_dataframe_resample
    self._test_resample(self.pdf4, self.psdf4, ["11H", "21D"], "left", None, "mean")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 171, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
Left:
                            A         B
A    float64
B    float64
dtype: object
Right:
                            A         B
A    float64
B    float64
dtype: object

======================================================================
FAIL [4.219s]: test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 276, in test_series_resample
    self._test_resample(self.pdf3.A, self.psdf3.A, ["1001H"], "right", "right", "sum")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 228, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
Left:
Freq: 1001H
float64
Right:
float64

----------------------------------------------------------------------
Ran 5 tests in 35.235s

FAILED (failures=2)

Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.pandas.tests.test_resample.ResampleTests-20230808005957.xml

@attilapiros
Copy link
Owner Author

After setting the conf "spark.sql.timestampType" to"TIMESTAMP_NTZ":

======================================================================
FAIL [31.935s]: test_dataframe_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 269, in test_dataframe_resample
    self._test_resample(self.pdf4, self.psdf4, ["11H", "21D"], "left", None, "mean")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 171, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
Left:
                            A         B
A    float64
B    float64
dtype: object
Right:
                            A         B
A    float64
B    float64
dtype: object

======================================================================
FAIL [4.803s]: test_series_resample (pyspark.pandas.tests.test_resample.ResampleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 276, in test_series_resample
    self._test_resample(self.pdf3.A, self.psdf3.A, ["1001H"], "right", "right", "sum")
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_resample.py", line 259, in _test_resample
    self.assert_eq(
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 457, in assert_eq
    _assert_pandas_almost_equal(lobj, robj)
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 228, in _assert_pandas_almost_equal
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
Left:
Freq: 1001H
float64
Right:
float64

----------------------------------------------------------------------
Ran 5 tests in 42.031s

FAILED (failures=2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant