-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DEPR: flags #52153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: flags #52153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you first create an issue to discuss this specific feature? You mentioned the idea of deprecating this in #51280, but the actual discussion there was mostly about __finalize__
and _metadata
and how it relates to subclasses (i.e. the underlying mechanics of how it works), and not about the actual user-facing feature that flags
/set_flags
provide (do we have an idea if people are using this? do we think it is no longer useful? what are the alternatives?)
Right now probably -0 on deprecating. While I don't issues about this feature, it's seems like a way to force a no duplicate labels mode |
Actually I discovered that pandera has a general dataframe schema validation framework that covers the duplicate labels case, so I suppose I would be +0 on deprecating |
@TomAugspurger any objection here? IIRC this was your thing |
In #51280, it seems like there's not a ton of evidence that this is causing issues in real use cases. The one linked issue turned out to be unrelated, right (but that's about How do you reconcile
with deprecating |
Just that you championed implementing it and might have an opinion about keeping it.
I imagine the 3rd party libraries wouldn't necessarily be implementing subclasses of pandas classes, so wouldn't use the hook at all. |
Yeah, I think we should keep it. |
In #28394 where this was implemented, @TomAugspurger wrote:
I do think the feature of The proposal would be to move |
I think we'll need to sort out #52166 and anything else riding on |
Tom is right that performance is mostly a red herring here. I think of it is being "directionally" behaving like a performance penalty but with a \epsilon magnitude. It is tiny but hits frequently. The reason to deprecate is that this is basically-never-used* API surface (which is a constantly-complained-about pain point) with a use case that has viable alternatives**. Also that propagation is half-baked with no prospect of getting to fully-baked. * Github search for ** My preferred alternative would be a 3rd party UniqueColumnsDataFrame that would have a check at |
I also believe this functionality of "validation" is best served by a 3rd party library like pandera https://pandera.readthedocs.io/en/stable/dataframe_schemas.html?highlight=duplicate#column-validation |
A challenge with doing this in a 3rd party library is around propagating
these attributes of a table. By placing this on the DataFrame and
propagating it through operations, you can have an entire block of method
calls for which this property is enforced. A 3rd-party library isn't going
to be able to do that; users would need to add method calls at each place
they want to validate the property.
Do people have thoughts on the merit of that? To me, that seems worth the
effort.
…On Thu, Dec 28, 2023 at 1:21 PM Matthew Roeschke ***@***.***> wrote:
I also believe this functionality of "validation" is best served by a 3rd
party library like pandera
https://pandera.readthedocs.io/en/stable/dataframe_schemas.html?highlight=duplicate#column-validation
—
Reply to this email directly, view it on GitHub
<#52153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIRQLUY7ZIMERQD3RXLYLXBCHAVCNFSM6AAAAAAWF7BVCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGQZTGMBTHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I agree. For me, I would find value in this particular flag I view this as being similar to treating warnings as errors. We had a client that ran our code by adding IMHO, providing options to users that assist with debugging is something we should do as much as we can. It's one reason I got behind |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.xref #51280