Skip to content

Conversation

jorisvandenbossche
Copy link
Member

This changes the Series and DataFrame constructors to return a proper shallow copy (i.e. only share data, not attributes like the index) instead of returning a new Series/DataFrame object but with the same manager, for the case of Series(s) or DataFrame(df) (so no reindexing or casting, and with default of copy=False).

See #49523 for an example.

Strictly speaking this is a breaking change (or notable bug fix, not sure if this was done intentionally in the past).

methods to get a full slice (for example ``df.loc[:]`` or ``df[:]``) (:issue:`49469`)
- The :class:`Series` and :class:`DataFrame` constructors will now return a shallow copy
(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively,
and with the default of ``copy=False`` (and if no other triggers a copy). Previously,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and if no other triggers a copy" is there a word missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose I meant "no other keyword triggers a copy", eg by passing a dtype, or by passing index/columns causing a reindex.

(i.e. share data, but not attributes) when passed a Series and DataFrame, respectively,
and with the default of ``copy=False`` (and if no other triggers a copy). Previously,
the new Series or DataFrame would share the index attribute (e.g. ``df.index = ...``
would also update the index of the parent or child) (:issue:`49523`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it is setting df.index.name = ..., not setting df.index itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually the index itself, see the code snippet in the issue: #49523

This is because right now the returned Series is actually sharing the same manager, and so setting df.index = .. updates the index attribute of the manager in-place, and since they share the same manager, also the other Series/DataFrame's index gets updated.

I suppose actually also after this change with doing a proper shallow copy, they still share the actual Index object, and so mutating an attribute of the index (eg df.index.name = ..) will still propagate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: as expected, mutating index' attributes is still propagated (since the Index object is shared, because immutable):

In [1]: s = pd.Series([1, 2, 3])
   ...: s2 = pd.Series(s)

In [2]: s2.index.name = "test!"

In [3]: s
Out[3]: 
test!
0    1
1    2
2    3
dtype: int64

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe consider also "shallow copying" the index when creating a shallow copy of a Series/DataFrame, i.e. create a new Index but viewing the same data, to avoid the above. But that's a bigger change / for a separate issue.

+1

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel good to go?

@jbrockmendel
Copy link
Member

LGTM merge on green

@jorisvandenbossche jorisvandenbossche merged commit 180d81f into pandas-dev:main Feb 10, 2023
@jorisvandenbossche jorisvandenbossche deleted the series-dataframe-constructor-shallow-copy branch February 10, 2023 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Constructors Series/DataFrame/Index/pd.array Constructors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: should creating a Series from a Series return a shallow copy? (share data, not attributes)

2 participants