Skip to content

BUG: concat losing columns dtypes for join=outer #47586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 3, 2022
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Jul 2, 2022

@simonjayhawkins This also happens for regular dtypes, so would not backport

@phofl phofl added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Index Related to the Index class or subclasses labels Jul 2, 2022
@jreback jreback added this to the 1.4.4 milestone Jul 3, 2022
@@ -995,6 +995,7 @@ Reshaping
- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`)
- Bug in concatenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`)
- Bug in :func:`concat` losing dtype of columns when ``join="outer"`` and ``sort=True`` (:issue:`47329`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is marked as 1.4.4. ok to leave in 1.5 though (if so just change the issue and ignore this comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the milestones.

initially we though this happens only
for ea dtypes, but this was wrong. Occurs also for numpy dtypes

@phofl phofl modified the milestones: 1.4.4, 1.5 Jul 3, 2022
@jreback jreback merged commit 1ac1391 into pandas-dev:main Jul 3, 2022
@jreback
Copy link
Contributor

jreback commented Jul 3, 2022

thanks @phofl

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl

"""
dtypes = [idx.dtype for idx in indexes if isinstance(idx, Index)]
if dtypes:
dtype = find_common_type(dtypes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we never pass a mixed list of Indexes and lists? could add type annotations here for clarity.

Copy link
Member Author

@phofl phofl Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A parent function uses list[list[Hashable] | Index], so not sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that if we could pass a mixed list, then the logic added here would not account for the types in a list and only use the Indexes to find the common dtype. We could therefore expect this to raise in those cases?

Copy link
Member

@simonjayhawkins simonjayhawkins Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A parent function uses list[list[Hashable] | Index], so not sure

we may need to change to list[list[Hashable]] | list[Index]

@@ -223,7 +224,7 @@ def union_indexes(indexes, sort: bool | None = True) -> Index:

indexes, kind = _sanitize_and_check(indexes)

def _unique_indices(inds) -> Index:
def _unique_indices(inds, dtype) -> Index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring also needs updating at some point.

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: concat losing column dtype for extension arrays and object dtype
3 participants