Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow using Iterable in Series and DataFrame constructor #21987

Merged
merged 3 commits into from
Jul 26, 2018

Conversation

holymonson
Copy link
Contributor

@holymonson holymonson commented Jul 20, 2018


@TomAugspurger Hope you can review this.

BTW, perhpas you may want to change Iterable in is_list_like() to Sequence.

gen = (i for i in range(10))
pandas.core.dtypes.inference.is_list_like(gen)
# True
len(gen)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: object of type 'generator' has no len()
gen[0]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: 'generator' object is not subscriptable

@holymonson
Copy link
Contributor Author

This test failed. But a single-column DataFrame seems reasonable to me.

# corner, silly
# TODO: Fix this Exception to be better...
with tm.assert_raises_regex(ValueError, 'constructor not '
'properly called'):
DataFrame((1, 2, 3))

pd.DataFrame((1, 2, 3))
#   0
#0  1
#1  2
#2  3

@holymonson holymonson force-pushed the iterable_in_constructor branch from b498a5c to d1c1bc2 Compare July 20, 2018 10:48

expected = Series(list(range(10)), dtype='int64')
result = Series(range(10), dtype='int64')
result = Series(Iter(), dtype='int64')
Copy link
Member

@mroeschke mroeschke Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant this simply be iter(range(10))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range is list-like, subclass of Sequence, which doesn't need consuming by data = list(data). I'd like to add back range(10) test and keep both.

@holymonson holymonson force-pushed the iterable_in_constructor branch from d1c1bc2 to 4fb3b38 Compare July 21, 2018 02:52
@holymonson
Copy link
Contributor Author

Allow pd.DataFrame((1, 2, 3)) because pd.DataFrame([1, 2, 3]) is valid. They should be the same.

@codecov
Copy link

codecov bot commented Jul 21, 2018

Codecov Report

Merging #21987 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21987      +/-   ##
==========================================
- Coverage   92.02%   92.02%   -0.01%     
==========================================
  Files         170      170              
  Lines       50707    50704       -3     
==========================================
- Hits        46661    46658       -3     
  Misses       4046     4046
Flag Coverage Δ
#multiple 90.42% <100%> (-0.01%) ⬇️
#single 42.31% <80%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/series.py 94.11% <100%> (ø) ⬆️
pandas/core/frame.py 97.19% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 693e235...a2ace52. Read the comment docs.

@gfyoung gfyoung added Enhancement Internals Related to non-user accessible pandas implementation Low-Memory labels Jul 22, 2018
@gfyoung gfyoung requested a review from TomAugspurger July 22, 2018 22:45
expected = DataFrame([[1, 2, 3]] * 10)
result = DataFrame(Iter())
tm.assert_frame_equal(result, expected)

Copy link
Member

@gfyoung gfyoung Jul 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference the issue number in a comment below the functional definition.

Applies to all newly-added tests.

@holymonson holymonson force-pushed the iterable_in_constructor branch from 4fb3b38 to 166f4f6 Compare July 23, 2018 03:10
@TomAugspurger
Copy link
Contributor

Allow pd.DataFrame((1, 2, 3)) because pd.DataFrame([1, 2, 3]) is valid. They should be the same.

Context: That goes back a long way to DataMatrix: c34ac74#diff-225c6e44f997b9b698ab52a48b223018R39

+1 for allowing it.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging later today, unless someone else beats me to it.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2018

will have a look

don’t merge

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note, other enhancements in 0.24

@@ -390,8 +390,9 @@ def __init__(self, data=None, index=None, columns=None, dtype=None,
else:
mgr = self._init_ndarray(data, index, columns, dtype=dtype,
copy=copy)
elif isinstance(data, (list, types.GeneratorType)):
if isinstance(data, types.GeneratorType):
elif (isinstance(data, collections.Iterable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here (1 line above) which defines what this clause is checking

elif isinstance(data, (set, frozenset)):
raise TypeError("{0!r} type is unordered"
"".format(data.__class__.__name__))
elif (isinstance(data, collections.Iterable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback added this to the 0.24.0 milestone Jul 25, 2018
@holymonson holymonson force-pushed the iterable_in_constructor branch from 166f4f6 to a2ace52 Compare July 25, 2018 11:33
@jreback jreback merged commit be6ad72 into pandas-dev:master Jul 26, 2018
@jreback
Copy link
Contributor

jreback commented Jul 26, 2018

thanks @holymonson !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot produce DataFrame from dict-yielding iterator.
5 participants