Skip to content

Commit be71a4d

Browse files
Merge branch 'main' into fix/type_coercion_for_unobserved_categories
2 parents b3520d1 + 888b6bc commit be71a4d

File tree

222 files changed

+3117
-2376
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

222 files changed

+3117
-2376
lines changed

.circleci/config.yml

-4
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,6 @@ jobs:
7272
no_output_timeout: 30m # Sometimes the tests won't generate any output, make sure the job doesn't get killed by that
7373
command: |
7474
pip3 install cibuildwheel==2.15.0
75-
# When this is a nightly wheel build, allow picking up NumPy 2.0 dev wheels:
76-
if [[ "$IS_SCHEDULE_DISPATCH" == "true" || "$IS_PUSH" != 'true' ]]; then
77-
export CIBW_ENVIRONMENT="PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
78-
fi
7975
cibuildwheel --prerelease-pythons --output-dir wheelhouse
8076
8177
environment:

.github/workflows/code-checks.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ jobs:
8585
echo "PYTHONPATH=$PYTHONPATH" >> $GITHUB_ENV
8686
if: ${{ steps.build.outcome == 'success' && always() }}
8787

88-
- name: Typing + pylint
88+
- name: Typing
8989
uses: pre-commit/action@v3.0.1
9090
with:
9191
extra_args: --verbose --hook-stage manual --all-files

.github/workflows/wheels.yml

+1-14
Original file line numberDiff line numberDiff line change
@@ -139,27 +139,14 @@ jobs:
139139
shell: bash -el {0}
140140
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
141141

142-
- name: Build normal wheels
143-
if: ${{ (env.IS_SCHEDULE_DISPATCH != 'true' || env.IS_PUSH == 'true') }}
142+
- name: Build wheels
144143
uses: pypa/cibuildwheel@v2.17.0
145144
with:
146145
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
147146
env:
148147
CIBW_PRERELEASE_PYTHONS: True
149148
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
150149

151-
- name: Build nightly wheels (with NumPy pre-release)
152-
if: ${{ (env.IS_SCHEDULE_DISPATCH == 'true' && env.IS_PUSH != 'true') }}
153-
uses: pypa/cibuildwheel@v2.17.0
154-
with:
155-
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
156-
env:
157-
# The nightly wheels should be build witht he NumPy 2.0 pre-releases
158-
# which requires the additional URL.
159-
CIBW_ENVIRONMENT: PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
160-
CIBW_PRERELEASE_PYTHONS: True
161-
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
162-
163150
- name: Set up Python
164151
uses: mamba-org/setup-micromamba@v1
165152
with:

.pre-commit-config.yaml

+6-31
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ ci:
1616
autofix_prs: false
1717
autoupdate_schedule: monthly
1818
# manual stage hooks
19-
skip: [pylint, pyright, mypy]
19+
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.3.1
22+
rev: v0.3.4
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -30,16 +30,10 @@ repos:
3030
files: ^pandas
3131
exclude: ^pandas/tests
3232
args: [--select, "ANN001,ANN2", --fix-only, --exit-non-zero-on-fix]
33-
- id: ruff
34-
name: ruff-use-pd_array-in-core
35-
alias: ruff-use-pd_array-in-core
36-
files: ^pandas/core/
37-
exclude: ^pandas/core/api\.py$
38-
args: [--select, "ICN001", --exit-non-zero-on-fix]
3933
- id: ruff-format
4034
exclude: ^scripts
4135
- repo: https://github.com/jendrikseipp/vulture
42-
rev: 'v2.10'
36+
rev: 'v2.11'
4337
hooks:
4438
- id: vulture
4539
entry: python scripts/run_vulture.py
@@ -73,31 +67,12 @@ repos:
7367
- id: fix-encoding-pragma
7468
args: [--remove]
7569
- id: trailing-whitespace
76-
- repo: https://github.com/pylint-dev/pylint
77-
rev: v3.0.1
78-
hooks:
79-
- id: pylint
80-
stages: [manual]
81-
args: [--load-plugins=pylint.extensions.redefined_loop_name, --fail-on=I0021]
82-
- id: pylint
83-
alias: redefined-outer-name
84-
name: Redefining name from outer scope
85-
files: ^pandas/
86-
exclude: |
87-
(?x)
88-
^pandas/tests # keep excluded
89-
|/_testing/ # keep excluded
90-
|^pandas/util/_test_decorators\.py # keep excluded
91-
|^pandas/_version\.py # keep excluded
92-
|^pandas/conftest\.py # keep excluded
93-
args: [--disable=all, --enable=redefined-outer-name]
94-
stages: [manual]
9570
- repo: https://github.com/PyCQA/isort
96-
rev: 5.12.0
71+
rev: 5.13.2
9772
hooks:
9873
- id: isort
9974
- repo: https://github.com/asottile/pyupgrade
100-
rev: v3.15.0
75+
rev: v3.15.2
10176
hooks:
10277
- id: pyupgrade
10378
args: [--py39-plus]
@@ -116,7 +91,7 @@ repos:
11691
hooks:
11792
- id: sphinx-lint
11893
- repo: https://github.com/pre-commit/mirrors-clang-format
119-
rev: v17.0.6
94+
rev: v18.1.2
12095
hooks:
12196
- id: clang-format
12297
files: ^pandas/_libs/src|^pandas/_libs/include

asv_bench/asv.conf.json

+1
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
// pip (with all the conda available packages installed first,
4242
// followed by the pip installed packages).
4343
"matrix": {
44+
"pip+build": [],
4445
"Cython": ["3.0"],
4546
"matplotlib": [],
4647
"sqlalchemy": [],

asv_bench/benchmarks/categoricals.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def setup(self):
2424
self.codes = np.tile(range(len(self.categories)), N)
2525

2626
self.datetimes = pd.Series(
27-
pd.date_range("1995-01-01 00:00:00", periods=N / 10, freq="s")
27+
pd.date_range("1995-01-01 00:00:00", periods=N // 10, freq="s")
2828
)
2929
self.datetimes_with_nat = self.datetimes.copy()
3030
self.datetimes_with_nat.iloc[-1] = pd.NaT

asv_bench/benchmarks/frame_methods.py

+24
Original file line numberDiff line numberDiff line change
@@ -862,4 +862,28 @@ def time_last_valid_index(self, dtype):
862862
self.df.last_valid_index()
863863

864864

865+
class Update:
866+
def setup(self):
867+
rng = np.random.default_rng()
868+
self.df = DataFrame(rng.uniform(size=(1_000_000, 10)))
869+
870+
idx = rng.choice(range(1_000_000), size=1_000_000, replace=False)
871+
self.df_random = DataFrame(self.df, index=idx)
872+
873+
idx = rng.choice(range(1_000_000), size=100_000, replace=False)
874+
cols = rng.choice(range(10), size=2, replace=False)
875+
self.df_sample = DataFrame(
876+
rng.uniform(size=(100_000, 2)), index=idx, columns=cols
877+
)
878+
879+
def time_to_update_big_frame_small_arg(self):
880+
self.df.update(self.df_sample)
881+
882+
def time_to_update_random_indices(self):
883+
self.df_random.update(self.df_sample)
884+
885+
def time_to_update_small_frame_big_arg(self):
886+
self.df_sample.update(self.df)
887+
888+
865889
from .pandas_vb_common import setup # noqa: F401 isort:skip

asv_bench/benchmarks/timeseries.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def setup(self, index_type):
2929
"dst": date_range(
3030
start="10/29/2000 1:00:00", end="10/29/2000 1:59:59", freq="s"
3131
),
32-
"repeated": date_range(start="2000", periods=N / 10, freq="s").repeat(10),
32+
"repeated": date_range(start="2000", periods=N // 10, freq="s").repeat(10),
3333
"tz_aware": date_range(start="2000", periods=N, freq="s", tz="US/Eastern"),
3434
"tz_local": date_range(
3535
start="2000", periods=N, freq="s", tz=dateutil.tz.tzlocal()

ci/code_checks.sh

+3-49
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8383
-i "pandas.DataFrame.__iter__ SA01" \
8484
-i "pandas.DataFrame.assign SA01" \
8585
-i "pandas.DataFrame.at_time PR01" \
86-
-i "pandas.DataFrame.axes SA01" \
87-
-i "pandas.DataFrame.backfill PR01,SA01" \
8886
-i "pandas.DataFrame.bfill SA01" \
8987
-i "pandas.DataFrame.columns SA01" \
9088
-i "pandas.DataFrame.copy SA01" \
@@ -99,12 +97,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9997
-i "pandas.DataFrame.kurt RT03,SA01" \
10098
-i "pandas.DataFrame.kurtosis RT03,SA01" \
10199
-i "pandas.DataFrame.last_valid_index SA01" \
102-
-i "pandas.DataFrame.mask RT03" \
103100
-i "pandas.DataFrame.max RT03" \
104101
-i "pandas.DataFrame.mean RT03,SA01" \
105102
-i "pandas.DataFrame.median RT03,SA01" \
106103
-i "pandas.DataFrame.min RT03" \
107-
-i "pandas.DataFrame.pad PR01,SA01" \
108104
-i "pandas.DataFrame.plot PR02,SA01" \
109105
-i "pandas.DataFrame.pop SA01" \
110106
-i "pandas.DataFrame.prod RT03" \
@@ -119,19 +115,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
119115
-i "pandas.DataFrame.sparse.to_dense SA01" \
120116
-i "pandas.DataFrame.std PR01,RT03,SA01" \
121117
-i "pandas.DataFrame.sum RT03" \
122-
-i "pandas.DataFrame.swapaxes PR01,SA01" \
123118
-i "pandas.DataFrame.swaplevel SA01" \
124119
-i "pandas.DataFrame.to_feather SA01" \
125120
-i "pandas.DataFrame.to_markdown SA01" \
126121
-i "pandas.DataFrame.to_parquet RT03" \
127-
-i "pandas.DataFrame.to_period SA01" \
128-
-i "pandas.DataFrame.to_timestamp SA01" \
129-
-i "pandas.DataFrame.tz_convert SA01" \
130-
-i "pandas.DataFrame.tz_localize SA01" \
131-
-i "pandas.DataFrame.unstack RT03" \
132-
-i "pandas.DataFrame.value_counts RT03" \
133122
-i "pandas.DataFrame.var PR01,RT03,SA01" \
134-
-i "pandas.DataFrame.where RT03" \
135123
-i "pandas.DatetimeIndex.ceil SA01" \
136124
-i "pandas.DatetimeIndex.date SA01" \
137125
-i "pandas.DatetimeIndex.day SA01" \
@@ -165,11 +153,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
165153
-i "pandas.DatetimeTZDtype SA01" \
166154
-i "pandas.DatetimeTZDtype.tz SA01" \
167155
-i "pandas.DatetimeTZDtype.unit SA01" \
168-
-i "pandas.ExcelFile PR01,SA01" \
169-
-i "pandas.ExcelFile.parse PR01,SA01" \
170-
-i "pandas.ExcelWriter SA01" \
171-
-i "pandas.Float32Dtype SA01" \
172-
-i "pandas.Float64Dtype SA01" \
173156
-i "pandas.Grouper PR02,SA01" \
174157
-i "pandas.HDFStore.append PR01,SA01" \
175158
-i "pandas.HDFStore.get SA01" \
@@ -226,7 +209,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
226209
-i "pandas.Index.to_list RT03" \
227210
-i "pandas.Index.union PR07,RT03,SA01" \
228211
-i "pandas.Index.unique RT03" \
229-
-i "pandas.Index.value_counts RT03" \
230212
-i "pandas.Index.view GL08" \
231213
-i "pandas.Int16Dtype SA01" \
232214
-i "pandas.Int32Dtype SA01" \
@@ -400,7 +382,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
400382
-i "pandas.Series.list.flatten SA01" \
401383
-i "pandas.Series.list.len SA01" \
402384
-i "pandas.Series.lt PR07,SA01" \
403-
-i "pandas.Series.mask RT03" \
404385
-i "pandas.Series.max RT03" \
405386
-i "pandas.Series.mean RT03,SA01" \
406387
-i "pandas.Series.median RT03,SA01" \
@@ -477,17 +458,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
477458
-i "pandas.Series.to_frame SA01" \
478459
-i "pandas.Series.to_list RT03" \
479460
-i "pandas.Series.to_markdown SA01" \
480-
-i "pandas.Series.to_period SA01" \
481461
-i "pandas.Series.to_string SA01" \
482-
-i "pandas.Series.to_timestamp RT03,SA01" \
483462
-i "pandas.Series.truediv PR07" \
484-
-i "pandas.Series.tz_convert SA01" \
485-
-i "pandas.Series.tz_localize SA01" \
486-
-i "pandas.Series.unstack SA01" \
487463
-i "pandas.Series.update PR07,SA01" \
488-
-i "pandas.Series.value_counts RT03" \
489464
-i "pandas.Series.var PR01,RT03,SA01" \
490-
-i "pandas.Series.where RT03" \
491465
-i "pandas.SparseDtype SA01" \
492466
-i "pandas.Timedelta PR07,SA01" \
493467
-i "pandas.Timedelta.as_unit SA01" \
@@ -681,60 +655,40 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
681655
-i "pandas.core.groupby.DataFrameGroupBy.__iter__ RT03,SA01" \
682656
-i "pandas.core.groupby.DataFrameGroupBy.agg RT03" \
683657
-i "pandas.core.groupby.DataFrameGroupBy.aggregate RT03" \
684-
-i "pandas.core.groupby.DataFrameGroupBy.apply RT03" \
685658
-i "pandas.core.groupby.DataFrameGroupBy.boxplot PR07,RT03,SA01" \
686-
-i "pandas.core.groupby.DataFrameGroupBy.cummax RT03" \
687-
-i "pandas.core.groupby.DataFrameGroupBy.cummin RT03" \
688-
-i "pandas.core.groupby.DataFrameGroupBy.cumprod RT03" \
689-
-i "pandas.core.groupby.DataFrameGroupBy.cumsum RT03" \
690-
-i "pandas.core.groupby.DataFrameGroupBy.filter RT03,SA01" \
659+
-i "pandas.core.groupby.DataFrameGroupBy.filter SA01" \
691660
-i "pandas.core.groupby.DataFrameGroupBy.get_group RT03,SA01" \
692661
-i "pandas.core.groupby.DataFrameGroupBy.groups SA01" \
693662
-i "pandas.core.groupby.DataFrameGroupBy.hist RT03" \
694663
-i "pandas.core.groupby.DataFrameGroupBy.indices SA01" \
695664
-i "pandas.core.groupby.DataFrameGroupBy.max SA01" \
696-
-i "pandas.core.groupby.DataFrameGroupBy.mean RT03" \
697665
-i "pandas.core.groupby.DataFrameGroupBy.median SA01" \
698666
-i "pandas.core.groupby.DataFrameGroupBy.min SA01" \
699667
-i "pandas.core.groupby.DataFrameGroupBy.nth PR02" \
700-
-i "pandas.core.groupby.DataFrameGroupBy.nunique RT03,SA01" \
668+
-i "pandas.core.groupby.DataFrameGroupBy.nunique SA01" \
701669
-i "pandas.core.groupby.DataFrameGroupBy.ohlc SA01" \
702670
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02,SA01" \
703671
-i "pandas.core.groupby.DataFrameGroupBy.prod SA01" \
704-
-i "pandas.core.groupby.DataFrameGroupBy.rank RT03" \
705-
-i "pandas.core.groupby.DataFrameGroupBy.resample RT03" \
706672
-i "pandas.core.groupby.DataFrameGroupBy.sem SA01" \
707-
-i "pandas.core.groupby.DataFrameGroupBy.skew RT03" \
708673
-i "pandas.core.groupby.DataFrameGroupBy.sum SA01" \
709-
-i "pandas.core.groupby.DataFrameGroupBy.transform RT03" \
710674
-i "pandas.core.groupby.SeriesGroupBy.__iter__ RT03,SA01" \
711675
-i "pandas.core.groupby.SeriesGroupBy.agg RT03" \
712676
-i "pandas.core.groupby.SeriesGroupBy.aggregate RT03" \
713-
-i "pandas.core.groupby.SeriesGroupBy.apply RT03" \
714-
-i "pandas.core.groupby.SeriesGroupBy.cummax RT03" \
715-
-i "pandas.core.groupby.SeriesGroupBy.cummin RT03" \
716-
-i "pandas.core.groupby.SeriesGroupBy.cumprod RT03" \
717-
-i "pandas.core.groupby.SeriesGroupBy.cumsum RT03" \
718-
-i "pandas.core.groupby.SeriesGroupBy.filter PR01,RT03,SA01" \
677+
-i "pandas.core.groupby.SeriesGroupBy.filter PR01,SA01" \
719678
-i "pandas.core.groupby.SeriesGroupBy.get_group RT03,SA01" \
720679
-i "pandas.core.groupby.SeriesGroupBy.groups SA01" \
721680
-i "pandas.core.groupby.SeriesGroupBy.indices SA01" \
722681
-i "pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing SA01" \
723682
-i "pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing SA01" \
724683
-i "pandas.core.groupby.SeriesGroupBy.max SA01" \
725-
-i "pandas.core.groupby.SeriesGroupBy.mean RT03" \
726684
-i "pandas.core.groupby.SeriesGroupBy.median SA01" \
727685
-i "pandas.core.groupby.SeriesGroupBy.min SA01" \
728686
-i "pandas.core.groupby.SeriesGroupBy.nth PR02" \
729687
-i "pandas.core.groupby.SeriesGroupBy.ohlc SA01" \
730688
-i "pandas.core.groupby.SeriesGroupBy.plot PR02,SA01" \
731689
-i "pandas.core.groupby.SeriesGroupBy.prod SA01" \
732-
-i "pandas.core.groupby.SeriesGroupBy.rank RT03" \
733-
-i "pandas.core.groupby.SeriesGroupBy.resample RT03" \
734690
-i "pandas.core.groupby.SeriesGroupBy.sem SA01" \
735-
-i "pandas.core.groupby.SeriesGroupBy.skew RT03" \
736691
-i "pandas.core.groupby.SeriesGroupBy.sum SA01" \
737-
-i "pandas.core.groupby.SeriesGroupBy.transform RT03" \
738692
-i "pandas.core.resample.Resampler.__iter__ RT03,SA01" \
739693
-i "pandas.core.resample.Resampler.ffill RT03" \
740694
-i "pandas.core.resample.Resampler.get_group RT03,SA01" \

doc/redirects.csv

-1
Original file line numberDiff line numberDiff line change
@@ -1422,7 +1422,6 @@ reference/api/pandas.Series.transpose,pandas.Series.T
14221422
reference/api/pandas.Index.transpose,pandas.Index.T
14231423
reference/api/pandas.Index.notnull,pandas.Index.notna
14241424
reference/api/pandas.Index.tolist,pandas.Index.to_list
1425-
reference/api/pandas.arrays.PandasArray,pandas.arrays.NumpyExtensionArray
14261425
reference/api/pandas.core.groupby.DataFrameGroupBy.backfill,pandas.core.groupby.DataFrameGroupBy.bfill
14271426
reference/api/pandas.core.groupby.GroupBy.backfill,pandas.core.groupby.DataFrameGroupBy.bfill
14281427
reference/api/pandas.core.resample.Resampler.backfill,pandas.core.resample.Resampler.bfill

doc/source/getting_started/install.rst

+2
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,8 @@ SciPy 1.10.0 computation Miscellaneous stati
269269
xarray 2022.12.0 computation pandas-like API for N-dimensional data
270270
========================= ================== =============== =============================================================
271271

272+
.. _install.excel_dependencies:
273+
272274
Excel files
273275
^^^^^^^^^^^
274276

doc/source/getting_started/intro_tutorials/02_read_write.rst

+6
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ strings (``object``).
111111

112112
My colleague requested the Titanic data as a spreadsheet.
113113

114+
.. note::
115+
If you want to use :func:`~pandas.to_excel` and :func:`~pandas.read_excel`,
116+
you need to install an Excel reader as outlined in the
117+
:ref:`Excel files <install.excel_dependencies>` section of the
118+
installation documentation.
119+
114120
.. ipython:: python
115121
116122
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

doc/source/user_guide/basics.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -476,15 +476,15 @@ For example:
476476
.. ipython:: python
477477
478478
df
479-
df.mean(0)
480-
df.mean(1)
479+
df.mean(axis=0)
480+
df.mean(axis=1)
481481
482482
All such methods have a ``skipna`` option signaling whether to exclude missing
483483
data (``True`` by default):
484484

485485
.. ipython:: python
486486
487-
df.sum(0, skipna=False)
487+
df.sum(axis=0, skipna=False)
488488
df.sum(axis=1, skipna=True)
489489
490490
Combined with the broadcasting / arithmetic behavior, one can describe various
@@ -495,8 +495,8 @@ standard deviation of 1), very concisely:
495495
496496
ts_stand = (df - df.mean()) / df.std()
497497
ts_stand.std()
498-
xs_stand = df.sub(df.mean(1), axis=0).div(df.std(1), axis=0)
499-
xs_stand.std(1)
498+
xs_stand = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
499+
xs_stand.std(axis=1)
500500
501501
Note that methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod`
502502
preserve the location of ``NaN`` values. This is somewhat different from

0 commit comments

Comments
 (0)