Skip to content

BUG: Ensure df.itertuples() uses plain tuples correctly #30600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 2, 2020

Conversation

simongibbons
Copy link
Contributor

@simongibbons simongibbons commented Jan 1, 2020

Currently DataFrame.itertuples() has an off by one error
when it inspects whether or not it should return namedtuples
or plain tuples in it's response.

This PR addresses that bug by correcting the condition
that is used when making the check.

Closes: #28282

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry


# Dataframes with >=255 columns will fallback to regular tuples
with pytest.raises(AttributeError):
result_255_columns.foo_1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because namedtuple generates a new class when it is used, the only way of determining if something is a namedtuple or a plain one is to try and use it as such.

Hopefully you think this is sufficiently clear with the comment.

@@ -1018,8 +1018,8 @@ def itertuples(self, index=True, name="Pandas"):
# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))

# Python 3 supports at most 255 arguments to constructor
if name is not None and len(self.columns) + index < 256:
# Python versions before 3.7 support at most 255 arguments to constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to skip the arg len check for >= 3.7? does it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was simple to do.

@@ -288,6 +288,22 @@ def test_sequence_like_with_categorical(self):
for c, col in df.items():
str(s)

def test_itertuples_fallback_to_regular_tuples(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this next to the other itertuples tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 1, 2020
Currently DataFrame.itertuples() has an off by one error
when it inspects whether or not it should return namedtuples
or plain tuples in it's response.

This PR addresses that bug by correcting the condition
that is used when making the check.

Closes: pandas-dev#28282
1. Ensure we return named tuples in more cases (when using python >=
   3.7)
2. Move test around to be with the itertuples test
3. Update docstring with the new behaviour.
assert isinstance(tup3, tuple)
if PY37:
assert hasattr(tup3, "_fields")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has changed as we will return named tuples always for python >= 3.7 now.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@WillAyd WillAyd added this to the 1.0 milestone Jan 2, 2020
@jreback jreback merged commit 13c9601 into pandas-dev:master Jan 2, 2020
@jreback
Copy link
Contributor

jreback commented Jan 2, 2020

thanks @simongibbons very nice!

@simongibbons simongibbons deleted the fix-28282 branch January 6, 2020 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.itertuples() incorrectly determines when plain tuples should be used
3 participants