-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: add expand kw to str.get_dummies #10103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/strings.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to change this example to showcase expand=False when it actually has multiple groups, i.e.,
>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)', expand=False)
0 [a, 1]
1 [b, 2]
2 [nan, nan]
Name: [0, 1], dtype: object
I'd also move this one example down, so we'd have:
"A pattern with more than one group will return a DataFrame."
"But you can specify expand=False to return Series."
|
Question: it there actually a need to have the option of If we add them, my opinion about the discussion points:
|
|
@jorisvandenbossche Correct. There is an option to make
Result will be a |
|
Not that important for this discussion, but |
Well, I would also like that very much, but the default values of the keyword would in any case not be unified. So therefore, as it is not really possible to unify it that way, I was considering the option of not adding the keyword at all (which is also no unified behaviour) |
f30f63c to
da9a38e
Compare
|
status? |
|
I hope to work on this, but it requires |
5ab40f1 to
08c283c
Compare
d7cf295 to
8f867ca
Compare
|
@sinhrks what are we doing with this one? |
|
There are 2 points, and I think 1st point (add
|
|
@sinhrks I suppose you could add |
Ref: #10008.
Though this still under work (needs #10089 to simplify
get_dummiesflow), would like to discuss followings.####.str.extractnote: overlaps with #11386Currently it returnsSeriesfor a single group andDataFramefor multiples. To supportexpandkw, we have to choose :1. Addexpandoption keeping existing behavior with warning for future change toextract=True(current impl).2. Addexpandoption keeping existing behavior. Standardizeextract=None(or other option) to select returning dimensionality automatically.3. Addexpandoption with defaultTrue(orFalse). This breaks the API.4. MakeIndex.str.extractreturnMultiIndexin multiple group case without addingexpandoption..str.get_dummiesexpandkw with defaultTrue. Currently this always returnsDataFrame(and raisesTypeErrorinIndex). This doesn't break an API (current impl).Index.str.get_dummiesreturnMultiIndexwithout addingexpandoption.CC @mortada