-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate map on GroupedDataFrame #2662
Conversation
test/data.jl
Outdated
[:x1] => x -> x > 0, ["x1"] => x -> x > 0, | ||
r"1" => x -> x > 0, AsTable(:) => x -> x.x1 > 0) | ||
@test filter(fun, df) isa DataFrame | ||
@inferred filter(fun, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strangely removing methods for filter
but not touching this specific method made @inferred
fail. CC @timholy as maybe you can explain this?
test/iteration.jl
Outdated
gdfv = groupby(dfv, :a) | ||
|
||
for x in (df, dfv) | ||
@test collect(x) == map(identity, x) == [v for v in x] == [x[i, :] for i in 1:3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I did not implement custom map
and collect
and thus currently the following identity holds. Do we really think it is a problem?
note that e.g. map
can take several iterables like map(fun, df1, df2)
and the question is if we would want to intercept this also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take a defensive approach and make everything for which we don't have a clear use case an error for now. People may expect collect
to materialize a data frame in some way, like in Query.jl or dplyr, and be confused if they get a weird vector of rows. Likewise, we could imagine doing something useful in the future with a multi-argument map
.
Regarding map(identity, df)
, as I said at #2254, I think it should return a DataFrame
, just like all cases where the function returns a row-like object.
test/iteration.jl
Outdated
end | ||
|
||
for x in (gdf, gdfv) | ||
@test collect(x) == map(identity, x) == [v for v in x] == [x[i] for i in 1:3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how things worked (and still work) for GroupedDataFrame
.
|
||
# Iteration protocol | ||
|
||
function Base.iterate(df::AbstractDataFrame, i=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function Base.iterate(df::AbstractDataFrame, i=1) | |
function Base.iterate(df::AbstractDataFrame, i::Int=1) |
test/iteration.jl
Outdated
gdfv = groupby(dfv, :a) | ||
|
||
for x in (df, dfv) | ||
@test collect(x) == map(identity, x) == [v for v in x] == [x[i, :] for i in 1:3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take a defensive approach and make everything for which we don't have a clear use case an error for now. People may expect collect
to materialize a data frame in some way, like in Query.jl or dplyr, and be confused if they get a weird vector of rows. Likewise, we could imagine doing something useful in the future with a multi-argument map
.
Regarding map(identity, df)
, as I said at #2254, I think it should return a DataFrame
, just like all cases where the function returns a row-like object.
Actually given the comment by @pdeffebach in #2654 (comment) (which I read as "the comment of experienced user, but not hard-core data scientist"), I am hesitant to implement this PR at all. Also not having The only thing that for sure needs fixing is
|
I also prefer what @pdeffebach suggested as we start to loose the grip on "non-breaking" promise (I know we do not want to break anything in this PR but it is almost breaking for a very common use case 😄). |
I think it's OK if we don't implement Regarding |
@pdeffebach - can you please add a "voice of the user" and "downstream package maintainer" regarding Given the discussion I would not implement |
I am fine to deprecate I prefer consistency with It's worth noting that I don't use it in DataFramesMeta, either, and don't have plans for a Regarding I'm not sure about iterate, but we could make it easier to expose To reply form another thread here
I don't think this is a good idea. Since I don't think we have seen any examples yet in the wild of someone doing this kind of inspection and having things be flexible and not have too many edge cases. |
So - the decision would be to deprecate Then again we go back to the decision what to do with |
|
I was most concerned about So the conclusion is:
then the only question is what we do with
|
I think this is fine. I wouldn't mind bringing back w.r.t |
The issue is that |
|
Ah - you are right it is documented to produce an |
The next difficulty is that we agreed that
What do you think? |
I don't think |
I agree this is a hard decision. Let me summarize the
|
To add on to this, we don't have broadcasted |
So - in conclusion, @nalimilan + @pdeffebach - are we OK to keep |
If we don't deprecate |
I would be fine with this also. As commented above after thinking about this I feel that it is not that hard to learn that We just had a similar discussion in #2665 that |
OK, let's keep |
c95d6d5
to
e1e8e9e
Compare
OK - so in conclusion we only deprecate |
only coverage fails here |
NEWS.md
Outdated
* applying `map` to `GroupedDataFrame` is now deprecated. The return value | ||
of this function might change in 2.0 release | ||
([#2662](https://github.com/JuliaData/DataFrames.jl/pull/2662)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here again, it would be annoying to have to wait for 2.0 if we want to change this. How about backporting it to 0.22?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will change this PR in a way to allow it to be included in 0.22.6 release.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you! |
Sorry to bump an old thread, but I have a bunch of code from March that is now broken because map no longer works with grouped dataframes. For example,
The new version is quite arcane (combine? x1?) compared to vanilla |
Actually the replacement (as suggested by the deprecation warning) is much simpler: |
Agreed. We removed old
which will be more efficient than both, your
|
Fixes #2254
I will comment on the design decisions in the code.