GH-33321: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 #35656

danepitkin · 2023-05-17T22:33:39Z

Do not coerce temporal types to nanosecond when pandas >= 2.0 is imported, since pandas now supports s/ms/us time units.

This PR adds support for the following Arrow -> Pandas conversions, which previously all defaulted to datetime64[ns] or datetime64[ns, <TZ>]:

date32 -> datetime64[ms]
date64 -> datetime64[ms]
datetime64[s] -> datetime64[s]
datetime64[ms] -> datetime64[ms]
datetime64[us] -> datetime64[us]
datetime64[s, <TZ>] -> datetime64[s, <TZ>]
datetime64[ms, <TZ>] -> datetime64[ms, <TZ>]
datetime64[us, <TZ>] -> datetime64[us, <TZ>]

Rationale for this change

Pandas 2.0 introduces proper support for temporal types.

Are these changes tested?

Yes. Pytests added and updated.

Are there any user-facing changes?

Yes, arrow-to-pandas default conversion behavior will change when users have pandas >= 2.0, but a legacy option is exposed to provide backwards compatibility.

This PR includes breaking changes to public APIs.

Closes: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 #33321

github-actions · 2023-05-17T22:33:58Z

Closes: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 #33321

danepitkin · 2023-05-17T22:35:38Z

I'm looking for early feedback to see if this is the right approach. There are many test cases that will need updating, but I didn't want to tackle them yet in case we take a different approach.

jorisvandenbossche

Yes, that approach looks good, and is actually simpler than I thought it would be since we already control this with the single option switch (for the code, the tests will indeed get a bit messier).

I think one question is whether we want to make that option public through the to_pandas API, so people could still override it to get nanoseconds if they want (to get back the pre-pandas-2.0 behaviour).

danepitkin · 2023-05-24T19:19:01Z

Yes, that approach looks good, and is actually simpler than I thought it would be since we already control this with the single option switch (for the code, the tests will indeed get a bit messier).

I think one question is whether we want to make that option public through the to_pandas API, so people could still override it to get nanoseconds if they want (to get back the pre-pandas-2.0 behaviour).

I'll expose this! I agree its best to allow continued use of the legacy behavior for awhile.

jorisvandenbossche · 2023-05-25T08:44:37Z

python/pyarrow/types.pxi

- _Type_DATE64: np.dtype('datetime64[ns]'),
- _Type_TIMESTAMP: np.dtype('datetime64[ns]'),
- _Type_DURATION: np.dtype('timedelta64[ns]'),
+ _Type_DATE32: np.dtype('datetime64[D]'),


pandas only supports the range of second to nanoseconds, so for dates we should maybe still default to datetime64[s]? (otherwise I assume this conversion would happen anyway on the pandas side)

Thank you! Numpy supports [D]ay, but pandas does not.

One thing I found is that Parquet only support [ms], [us], and [ns]. So now several pyarrow dataset tests are failing because datasets with [D]ay units are being converted to [ms] units. I'm somewhat inclined to convert date32 to [ms] by default so we don't have to add a conversion from [ms] -> [s] when doing a parquet roundtrip. Or.. we just let it happen and modify the tests.

This wasn't a problem before when everything was coerced to [ns], which parquet supports.

I'm somewhat inclined to convert date32 to [ms] by default so we don't have to add a conversion from [ms] -> [s] when doing a parquet roundtrip

Yes, that sounds as a good idea (then it also gives the same for date32 vs date64)

several pyarrow dataset tests are failing because datasets with [D]ay units are being converted to [ms] units

Can you point to which test is failing? Because this is about conversion from pyarrow to pandas, right? (not arrow<->parquet roundtrip, which should be able to preserve our date32 type because we store the arrow schema)

The current tests manipulate the types so test cases pass, but these were the tests that originally were failling (the tests in there current state are a bit of a mess right now, I need to go back and clean them up once the implementation actually appears to work properly) :

FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions[True] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions[False] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_schema[True] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_schema[False] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_index_name[True] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_index_name[False] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_no_partitions[True] - AssertionError: Attributes of DataFrame.iloc[:, 3] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_no_partitions[False] - AssertionError: Attributes of DataFrame.iloc[:, 3] (column name="date") are different

@pytest.mark.filterwarnings("ignore:'ParquetDataset.schema:FutureWarning") def _test_write_to_dataset_with_partitions(base_path, use_legacy_dataset=True, filesystem=None, schema=None, index_name=None): import pandas as pd import pandas.testing as tm import pyarrow.parquet as pq # ARROW-1400 output_df = pd.DataFrame({'group1': list('aaabbbbccc'), 'group2': list('eefeffgeee'), 'num': list(range(10)), 'nan': [np.nan] * 10, 'date': np.arange('2017-01-01', '2017-01-11', dtype='datetime64[D]')}) output_df["date"] = output_df["date"] cols = output_df.columns.tolist() partition_by = ['group1', 'group2'] output_table = pa.Table.from_pandas(output_df, schema=schema, safe=False, preserve_index=False) pq.write_to_dataset(output_table, base_path, partition_by, filesystem=filesystem, use_legacy_dataset=use_legacy_dataset) metadata_path = os.path.join(str(base_path), '_common_metadata') if filesystem is not None: with filesystem.open(metadata_path, 'wb') as f: pq.write_metadata(output_table.schema, f) else: pq.write_metadata(output_table.schema, metadata_path) # ARROW-2891: Ensure the output_schema is preserved when writing a # partitioned dataset dataset = pq.ParquetDataset(base_path, filesystem=filesystem, validate_schema=True, use_legacy_dataset=use_legacy_dataset) # ARROW-2209: Ensure the dataset schema also includes the partition columns if use_legacy_dataset: with pytest.warns(FutureWarning, match="'ParquetDataset.schema'"): dataset_cols = set(dataset.schema.to_arrow_schema().names) else: # NB schema property is an arrow and not parquet schema dataset_cols = set(dataset.schema.names) assert dataset_cols == set(output_table.schema.names) input_table = dataset.read(use_pandas_metadata=True) input_df = input_table.to_pandas() # Read data back in and compare with original DataFrame # Partitioned columns added to the end of the DataFrame when read input_df_cols = input_df.columns.tolist() assert partition_by == input_df_cols[-1 * len(partition_by):] input_df = input_df[cols] # Partitioned columns become 'categorical' dtypes for col in partition_by: output_df[col] = output_df[col].astype('category') # if schema is None and Version(pd.__version__) >= Version("2.0.0"): # output_df['date'] = output_df['date'].astype('datetime64[ms]') > tm.assert_frame_equal(output_df, input_df) E AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different E E Attribute "dtype" are different E [left]: datetime64[s] E [right]: datetime64[ms]

I think those test failures are related to the fact that, with our defaults, parquet doesn't support nanoseconds, and we actually don't try to preserve the unit when roundtripping from arrow<->parquet:

In [1]: table = pa.table({"col": pa.array([1, 2, 3], pa.timestamp("s")).cast(pa.timestamp("ns"))}) In [2]: import pyarrow.parquet as pq In [3]: pq.write_table(table, "test_nanoseconds.parquet") In [4]: pq.read_table("test_nanoseconds.parquet") Out[4]: pyarrow.Table col: timestamp[us] ---- col: [[1970-01-01 00:00:01.000000,1970-01-01 00:00:02.000000,1970-01-01 00:00:03.000000]]

So starting with an arrow table with nanoseconds, the result has microseconds (even though we actually could preserve the original unit, because we store the original arrow schema in the parquet metadata. Although that would not be a zero copy restoration, in contrast to for example restoring the timezone, or restoring duration from int64, which is done in ApplyOriginalStorageMetadata)

So this means that whenever we start with nanoseconds, we get back microseconds after roundtrip to parquet. And then if the roundtrip actually started from pandas using nanoseconds, we now also get microseconds in the pandas result (while before we still got nanoseconds since we forced using that in the arrow->pandas conversion step) ..

python/pyarrow/types.pxi

python/pyarrow/array.pxi

danepitkin · 2023-06-06T23:01:33Z

Current update: the tests failing locally for me are 1) parquet dataset roundtrips where date32 days are converted to milliseconds instead of seconds because seconds are not supported in parquet and 2) all TZ-aware timestamps are defaulted to nanoseconds (aka I need to add support for other time units in c++).

For (1), I mentioned in another comment that we can convert date32 to millisecond instead of second. For (2), I just need to add support, but it's going to grow this PR even larger unfortunately..

Edit: For (1), I think its actually fine to keep date32 as [s]econd. It's a known limitation that parquet does not support this unit type.

jorisvandenbossche · 2023-06-08T12:59:26Z

For (2), I just need to add support, but it's going to grow this PR even larger unfortunately..

If PR size is a concern, this is also something that could be done as a precursor. It's actually already an issue that shows in conversion to numpy as well:

# no timezone -> this preserves the unit
>>> pa.array([1, 2, 3], pa.timestamp('us')).to_numpy()
array(['1970-01-01T00:00:00.000001', '1970-01-01T00:00:00.000002',
       '1970-01-01T00:00:00.000003'], dtype='datetime64[us]')

# with timezone -> always converts to nanoseconds
>>> pa.array([1, 2, 3], pa.timestamp('us', tz="Europe/Brussels")).to_numpy()
...
ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True

>>> pa.array([1, 2, 3], pa.timestamp('us', tz="Europe/Brussels")).to_numpy(zero_copy_only=False)
array(['1970-01-01T00:00:00.000001000', '1970-01-01T00:00:00.000002000',
       '1970-01-01T00:00:00.000003000'], dtype='datetime64[ns]')

While this could also be perfectly zero-copy to microseconds in the case with a timezone (we just return the underlying UTC values anyway)

jorisvandenbossche · 2023-06-08T18:02:53Z

For the tz aware update, that also influences the (currently untested) to_numpy behaviour, which you can test with the following change:

--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -211,9 +211,10 @@ def test_to_numpy_writable():
         arr.to_numpy(zero_copy_only=True, writable=True)
 
 
+@pytest.mark.parametrize('tz', [None, "UTC"])
 @pytest.mark.parametrize('unit', ['s', 'ms', 'us', 'ns'])
-def test_to_numpy_datetime64(unit):
-    arr = pa.array([1, 2, 3], pa.timestamp(unit))
+def test_to_numpy_datetime64(unit, tz):
+    arr = pa.array([1, 2, 3], pa.timestamp(unit, tz=tz))
     expected = np.array([1, 2, 3], dtype="datetime64[{}]".format(unit))
     np_arr = arr.to_numpy()
     np.testing.assert_array_equal(np_arr, expected)

danepitkin · 2023-06-08T18:35:50Z

For the tz aware update, that also influences the (currently untested) to_numpy behaviour, which you can test with the following change:

--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -211,9 +211,10 @@ def test_to_numpy_writable():
         arr.to_numpy(zero_copy_only=True, writable=True)
 
 
+@pytest.mark.parametrize('tz', [None, "UTC"])
 @pytest.mark.parametrize('unit', ['s', 'ms', 'us', 'ns'])
-def test_to_numpy_datetime64(unit):
-    arr = pa.array([1, 2, 3], pa.timestamp(unit))
+def test_to_numpy_datetime64(unit, tz):
+    arr = pa.array([1, 2, 3], pa.timestamp(unit, tz=tz))
     expected = np.array([1, 2, 3], dtype="datetime64[{}]".format(unit))
     np_arr = arr.to_numpy()
     np.testing.assert_array_equal(np_arr, expected)

Thank you! Updated and it passes out of the gate 🎉

python/pyarrow/src/arrow/python/arrow_to_pandas.cc

jorisvandenbossche · 2023-06-08T18:17:40Z

python/pyarrow/src/arrow/python/arrow_to_pandas.cc

+ int64_t* out_values = this->GetBlockColumnStart(rel_placement);
+ if (type == Type::DATE32) {
+ // Convert from days since epoch to datetime64[ms]
+ ConvertDatetimeLikeNanos<int32_t, 86400000L>(*data, out_values);


I am wondering that if we do such naive multiplication here, we can get overflow errors for out-of-bounds timestamps (but this is already the case for the current code converting to nanoseconds, as well, to be clear)

Technically, I think this specific multiplication is fine. INT32_MAX * 86400000 = 1.8554259e+17, while INT64_MAX = 9.223372e+18.

Ah, yes, when milliseconds is the target this is probably fine. For the nanoseconds case, though, this gives wrong results. Opened #36084 about that.

jorisvandenbossche · 2023-06-08T18:19:19Z

python/pyarrow/tests/parquet/common.py

@@ -176,8 +176,8 @@ def alltypes_sample(size=10000, seed=0, categorical=False):
 # TODO(wesm): Test other timestamp resolutions now that arrow supports
 # them
 'datetime': np.arange("2016-01-01T00:00:00.001", size,
- dtype='datetime64[ms]').astype('datetime64[ns]'),
- 'timedelta': np.arange(0, size, dtype="timedelta64[ns]"),
+ dtype='datetime64[ms]'),


We can maybe keep both original ns and new ms resolution? (to test both)

Quickly enabling it does open up a small can of worms:

FAILED pyarrow/tests/parquet/test_data_types.py::test_parquet_2_0_roundtrip[None-True] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_data_types.py::test_parquet_2_0_roundtrip[None-False] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_data_types.py::test_parquet_2_0_roundtrip[1000-True] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_data_types.py::test_parquet_2_0_roundtrip[1000-False] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_schema[True] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_and_schema[False] - AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different FAILED pyarrow/tests/parquet/test_metadata.py::test_parquet_metadata_api - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_metadata.py::test_compare_schemas - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_pandas.py::test_pandas_parquet_custom_metadata - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_pandas.py::test_pandas_parquet_column_multiindex[True] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_pandas.py::test_pandas_parquet_column_multiindex[False] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_pandas.py::test_pandas_parquet_2_0_roundtrip_read_pandas_no_index_written[True] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_pandas.py::test_pandas_parquet_2_0_roundtrip_read_pandas_no_index_written[False] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_parquet_file.py::test_iter_batches_columns_reader[300] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_parquet_file.py::test_iter_batches_columns_reader[1000] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_parquet_file.py::test_iter_batches_columns_reader[1300] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001 FAILED pyarrow/tests/parquet/test_parquet_file.py::test_iter_batches_reader[1000] - pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1451606400001000001

Maybe better for a follow up PR?

Ok, I am adding and fixing the tests. Just needed to remove coercion to ms in the test cases.

Good suggestion!

python/pyarrow/tests/test_pandas.py

jorisvandenbossche · 2023-06-08T18:36:09Z

python/pyarrow/tests/test_pandas.py

@@ -1202,7 +1211,7 @@ def test_table_convert_date_as_object(self):
 df_datetime = table.to_pandas(date_as_object=False)
 df_object = table.to_pandas()

- tm.assert_frame_equal(df.astype('datetime64[ns]'), df_datetime,
+ tm.assert_frame_equal(df.astype('datetime64[ms]'), df_datetime,


Do we have coverage for testing that it stays nanoseconds if you specify coerce_temporal_nanoseconds=True?

Not yet, will add!

danepitkin · 2023-07-06T14:51:34Z

@github-actions crossbow submit -g integration

github-actions · 2023-07-06T14:54:27Z

Revision: 6ffb5e5

Submitted crossbow builds: ursacomputing/crossbow @ actions-40508e3899

Task	Status
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest
test-conda-python-3.10-pandas-nightly
test-conda-python-3.10-spark-master
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-pandas-upstream_devel
test-conda-python-3.8-pandas-1.0
test-conda-python-3.8-spark-v3.1.2
test-conda-python-3.9-pandas-latest
test-conda-python-3.9-spark-v3.2.0

python/pyarrow/array.pxi

jorisvandenbossche · 2023-07-07T09:12:18Z

python/pyarrow/tests/parquet/test_dataset.py

+ # Arrow to Pandas v2 will convert date32 to [ms]. Pandas v1 will always
+ # silently coerce to [ns] due to non-[ns] support.
+ expected_date_type = 'datetime64[ms]'


This comment is not fully correct, I think (when converting the pandas dataframe to pyarrow, we actually don't have date32, but timestamp type). But then I also don't understand how this test is passing ..

So what actually happens with pandas 2.x: when we create a DataFrame with datetime64[D], that gets converted to datetime64[s] (closest supported resolution to "D"). Then roundtripping to parquet turns that into "ms" (because "s" is not supported by Parquet)

With older pandas this gets converted to datetime64[ns], will come back from Parquet as "us", and converted back to "ns" when converting to pandas. But this astype("datetime64[ms]") essentially doesn't do anything, i.e. pandas does preserve the "ns" because it doesn't support "ms", and hence the test also passes for older pandas.

Maybe it's simpler to just test with a DataFrame of nanoseconds, which now works the same with old and new pandas, and then we don't have to add any comment or astype.

Maybe it's simpler to just test with a DataFrame of nanoseconds, which now works the same with old and new pandas, and then we don't have to add any comment or astype.

Hmm, trying that out locally fails (but only with the non-legacy code path), and digging in, it seems that we are still writing Parquet v 1 files with the dataset API ...
Will open a separate issue and PR to quickly fix that separately.

…t version of 2.6

jorisvandenbossche

I pushed a small clean-up on top of my PR to fix the Parquet version for the dataset writer. Further checked all changes to the parquet tests as well, and all looks good!

jorisvandenbossche · 2023-07-07T12:27:20Z

@github-actions crossbow submit -g integration

jorisvandenbossche · 2023-07-07T12:28:59Z

And with this PR and the Parquet v2.6 update combined, the failures in the dask builds are now much smaller (just one failure that was testing that a timestamp would overflow by being casted to nanoseconds)

github-actions · 2023-07-07T12:30:01Z

Revision: a6487c2

Submitted crossbow builds: ursacomputing/crossbow @ actions-5c8f3f6cad

Task	Status
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest
test-conda-python-3.10-pandas-nightly
test-conda-python-3.10-spark-master
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-pandas-upstream_devel
test-conda-python-3.8-pandas-1.0
test-conda-python-3.8-spark-v3.1.2
test-conda-python-3.9-pandas-latest
test-conda-python-3.9-spark-v3.2.0

jorisvandenbossche · 2023-07-07T14:46:16Z

Thanks @danepitkin!

danepitkin · 2023-07-07T14:58:55Z

Thanks @jorisvandenbossche for the collaboration and support!

kou · 2023-07-09T20:51:03Z

python/pyarrow/tests/test_pandas.py

+ [pd.period_range("2012-01-01", periods=3, freq="D").array,
+ pd.interval_range(1, 4).array])


@jorisvandenbossche @danepitkin We can't use pd here because pandas may not be available.
This causes an error on "no pandas" environment: https://github.com/apache/arrow/actions/runs/5496447565/jobs/10016477233
This PR's CI succeeded because our "Without Pandas" job installed pandas implicitly. It has been fixed by #36542.

Could you open an issue for this and fix this?

Fixed -> #36586

conbench-apache-arrow · 2023-07-15T06:59:13Z

Conbench analyzed the 6 benchmark runs on commit 4f56aba3.

There were 7 benchmark results indicating a performance regression:

Commit Run on arm64-t4g-linux-compute at 2023-07-07 16:29:54Z
- params=num_cols:8/is_partial:0/real_time, source=cpp-micro, suite=arrow-ipc-read-write-benchmark
Commit Run on arm64-m6g-linux-compute at 2023-07-07 16:29:36Z
- params=32768, source=cpp-micro, suite=parquet-encoding-benchmark
and 5 more (see the report linked below)

The full Conbench report has more details.

danepitkin requested a review from AlenkaF as a code owner May 17, 2023 22:33

github-actions bot added Component: Python awaiting review Awaiting review labels May 17, 2023

danepitkin requested a review from jorisvandenbossche May 18, 2023 13:46

jorisvandenbossche reviewed May 23, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 23, 2023

danepitkin marked this pull request as draft May 23, 2023 22:47

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 24, 2023

jorisvandenbossche reviewed May 25, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 25, 2023

danepitkin force-pushed the danepitkin/arrow-33321-pd-ns branch from bf472be to f0f8cba Compare June 5, 2023 19:48

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 5, 2023

danepitkin force-pushed the danepitkin/arrow-33321-pd-ns branch from 37e120c to 0fb1c51 Compare June 6, 2023 22:51

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 8, 2023

jorisvandenbossche added the Breaking Change Includes a breaking change to the API label Jun 8, 2023

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 8, 2023

jorisvandenbossche reviewed Jun 8, 2023

View reviewed changes

danepitkin force-pushed the danepitkin/arrow-33321-pd-ns branch from a65301e to 6ffb5e5 Compare July 6, 2023 14:50

small fixes

8509052

jorisvandenbossche reviewed Jul 7, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jul 7, 2023

jorisvandenbossche added 2 commits July 7, 2023 14:13

apacheGH-36537: [Python] Ensure dataset writer follows default Parque…

ed02603

…t version of 2.6

small test clean-up

a6487c2

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 7, 2023

jorisvandenbossche approved these changes Jul 7, 2023

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jul 7, 2023

jorisvandenbossche marked this pull request as ready for review July 7, 2023 14:44

jorisvandenbossche merged commit 4f56aba into apache:main Jul 7, 2023
12 checks passed

jorisvandenbossche removed the awaiting merge Awaiting merge label Jul 7, 2023

kou reviewed Jul 9, 2023

View reviewed changes

github-actions bot added the awaiting changes Awaiting changes label Jul 9, 2023

mariosasko mentioned this pull request Aug 23, 2023

PyArrow 13 CI fixes huggingface/datasets#6175

Merged

AlenkaF mentioned this pull request Sep 5, 2023

to_pandas(date_as_object=False) returns different dtype in pyarrow 13.0.0 #37545

Closed

seanslma mentioned this pull request Oct 10, 2023

[Python] pyarrow 13.0.0 converted datetime64[ns] to datetime64[us] when using pd.read_parquet #38171

Closed

amoeba mentioned this pull request Nov 22, 2023

[Dev] Prompt whether an issue should be labeled as Breaking Change when merging #38841

Open

AnandInguva mentioned this pull request Nov 27, 2023

Bump Pyarrow version to include 14.0.0 apache/beam#29536

Merged

3 tasks

		[pd.period_range("2012-01-01", periods=3, freq="D").array,
		pd.interval_range(1, 4).array])

GH-33321: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 #35656

GH-33321: [Python] Support converting to non-nano datetime64 for pandas >= 2.0 #35656

Conversation

danepitkin commented May 17, 2023 • edited by jorisvandenbossche Loading

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented May 17, 2023

danepitkin commented May 17, 2023

jorisvandenbossche left a comment

Choose a reason for hiding this comment

danepitkin commented May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Jun 6, 2023 • edited Loading

jorisvandenbossche commented Jun 8, 2023

jorisvandenbossche commented Jun 8, 2023

danepitkin commented Jun 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Jul 6, 2023

github-actions bot commented Jul 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jul 7, 2023

jorisvandenbossche commented Jul 7, 2023

github-actions bot commented Jul 7, 2023

jorisvandenbossche commented Jul 7, 2023

danepitkin commented Jul 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Jul 15, 2023

danepitkin commented May 17, 2023 •

edited by jorisvandenbossche

Loading

danepitkin commented May 24, 2023 •

edited

Loading

danepitkin commented Jun 6, 2023 •

edited

Loading