-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: add expand kw to str.get_dummies #10103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1 2 | ||
2 NaN | ||
dtype: object | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to change this example to showcase expand=False
when it actually has multiple groups, i.e.,
>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)', expand=False)
0 [a, 1]
1 [b, 2]
2 [nan, nan]
Name: [0, 1], dtype: object
I'd also move this one example down, so we'd have:
"A pattern with more than one group will return a DataFrame."
"But you can specify expand=False
to return Series."
Question: it there actually a need to have the option of If we add them, my opinion about the discussion points:
|
@jorisvandenbossche Correct. There is an option to make
Result will be a |
Not that important for this discussion, but |
Well, I would also like that very much, but the default values of the keyword would in any case not be unified. So therefore, as it is not really possible to unify it that way, I was considering the option of not adding the keyword at all (which is also no unified behaviour) |
f30f63c
to
da9a38e
Compare
status? |
I hope to work on this, but it requires |
5ab40f1
to
08c283c
Compare
d7cf295
to
8f867ca
Compare
@sinhrks what are we doing with this one? |
There are 2 points, and I think 1st point (add
|
@sinhrks I suppose you could add |
Ref: #10008.
Though this still under work (needs #10089 to simplify
get_dummies
flow), would like to discuss followings.####.str.extract
note: overlaps with #11386Currently it returnsSeries
for a single group andDataFrame
for multiples. To supportexpand
kw, we have to choose :1. Addexpand
option keeping existing behavior with warning for future change toextract=True
(current impl).2. Addexpand
option keeping existing behavior. Standardizeextract=None
(or other option) to select returning dimensionality automatically.3. Addexpand
option with defaultTrue
(orFalse
). This breaks the API.4. MakeIndex.str.extract
returnMultiIndex
in multiple group case without addingexpand
option..str.get_dummies
expand
kw with defaultTrue
. Currently this always returnsDataFrame
(and raisesTypeError
inIndex
). This doesn't break an API (current impl).Index.str.get_dummies
returnMultiIndex
without addingexpand
option.CC @mortada