@@ -9,6 +9,89 @@ including other versions of pandas.
99{{ header }}
1010
1111.. ---------------------------------------------------------------------------
12+
13+ .. _whatsnew_220.upcoming_changes :
14+
15+ Upcoming changes in pandas 3.0
16+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+ pandas 3.0 will bring two bigger changes to the default behavior of pandas.
19+
20+ Copy-on-Write
21+ ^^^^^^^^^^^^^
22+
23+ The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There
24+ won't be an option to keep the current behavior enabled. The new behavioral semantics are
25+ explained in the :ref: `user guide about Copy-on-Write <copy_on_write >`.
26+
27+ The new behavior can be enabled since pandas 2.0 with the following option:
28+
29+ .. code-block :: ipython
30+
31+ pd.options.mode.copy_on_write = True
32+
33+ This change brings different changes in behavior in how pandas operates with respect to
34+ copies and views. Some of these changes allow a clear deprecation, like the changes in
35+ chained assignment. Other changes are more subtle and thus, the warnings are hidden behind
36+ an option that can be enabled in pandas 2.2.
37+
38+ .. code-block :: ipython
39+
40+ pd.options.mode.copy_on_write = "warn"
41+
42+ This mode will warn in many different scenarios that aren't actually relevant to
43+ most queries. We recommend exploring this mode, but it is not necessary to get rid
44+ of all of these warnings. The :ref: `migration guide <copy_on_write.migration_guide >`
45+ explains the upgrade process in more detail.
46+
47+ Dedicated string data type (backed by Arrow) by default
48+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
49+
50+ Historically, pandas represented string columns with NumPy object data type. This
51+ representation has numerous problems, including slow performance and a large memory
52+ footprint. This will change in pandas 3.0. pandas will start inferring string columns
53+ as a new ``string `` data type, backed by Arrow, which represents strings contiguous in memory. This brings
54+ a huge performance and memory improvement.
55+
56+ Old behavior:
57+
58+ .. code-block :: ipython
59+
60+ In [1]: ser = pd.Series(["a", "b"])
61+ Out[1]:
62+ 0 a
63+ 1 b
64+ dtype: object
65+
66+ New behavior:
67+
68+
69+ .. code-block :: ipython
70+
71+ In [1]: ser = pd.Series(["a", "b"])
72+ Out[1]:
73+ 0 a
74+ 1 b
75+ dtype: string
76+
77+ The string data type that is used in these scenarios will mostly behave as NumPy
78+ object would, including missing value semantics and general operations on these
79+ columns.
80+
81+ This change includes a few additional changes across the API:
82+
83+ - Currently, specifying ``dtype="string" `` creates a dtype that is backed by Python strings
84+ which are stored in a NumPy array. This will change in pandas 3.0, this dtype
85+ will create an Arrow backed string column.
86+ - The column names and the Index will also be backed by Arrow strings.
87+ - PyArrow will become a required dependency with pandas 3.0 to accommodate this change.
88+
89+ This future dtype inference logic can be enabled with:
90+
91+ .. code-block :: ipython
92+
93+ pd.options.future.infer_string = True
94+
1295 .. _whatsnew_220.enhancements :
1396
1497Enhancements
0 commit comments