-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add an option to intersect arguments passed to Cols #3224
Conversation
I'm iffy on this. It's not obvious from reading code (aside from documentation) that multiple arguments should mean AND rather than OR. But it is awkward that I think OP's complaint is not that big a deal, tbh. using |
Also you can use |
I'm happy with leaving the whole set operation vs. boolean algebra issue as a philosophical difference, where you just pick one and stick with it! In R, data.table sticks to the former while dplyr/tidyselect sticks to the latter - I accept their respective choices and people don't complain within their chosen bubbles. Anyways if/when DF.jl is seriously considering supporting boolean algebra for column selection, I think it might be worth getting input from existing users to see their preferences/difficulties - maybe sets are more intuitive for Julia's more math-y users! But really thank you for giving my ramble a serious thought :) Edit: I forgot to give an actual thought about the proposal but just in case - I like this extension of |
Let us wait for @nalimilan to comment (as usual 😄). Plus maybe let me add the three options to vote:
(I vote for :three but I add all options below to make easy voting) |
Given a discussion on Slack the following thing could be added alternatively:
|
For context: I like the correspondence between One question. Why does |
It currently is required. We could change it (this requires a careful consideration, but should be doable). So the current list of things to do is:
|
I'm not a fan of passing multiple arguments to Note that dplyr uses Regarding |
Yes - I think the concept of passing multiple arguments to
Let us wait for others to comment. There is no rush with making a decision here. Thank you! |
Currently |
but the problem is that |
Given the comments the to-do list would be (the idea is not to add any new exported names):
|
Sounds good. I'm just not sure the last point (deprecation) is worth it. We allow |
@nalimilan - Deprecation is indeed optional. However, can you clarify your last point why you do not think it is a good idea. My reasoning is:
|
with the recent ability to middle slurp would it be considered to allow |
We need to keep Julia 1.6 compatibility. |
OK - now I remember the problem with See #3034 @krynju - if we added this could DTables.jl efficiently support this? (I fear that not - and we have to stick with name-only selectors and Even if this is not doable we can add |
Under the hood I take col names and col types from On type based selectors: I guess it may be useful in some specific cases? Personally I'd rather stick to name based just to keep my confidence high and not depend on the input type, which may be suddenly parsed differently from version to version (the String to InlineStrings transition that happened at some point) On multiple column selectors: Confusing - at first look I thought that would return a union in the example from the OP. |
@krynju - just to clarify. It is not only types, but also column values, so what currently is done like this:
would get some special syntax, e.g. (but in general I understand that you agree that just sticking to column names based selectors is safer for now - right?). |
Alright, I get it. Types are still ok For values: I technically could support this by interpreting the input adequately, but I see little value in spending time on this as this seems like a niche use case and I'd rather have the user write the DTables code to figure this out and make it as simple as possible For DTables running a full column check against all columns is just wasteful.
Yes, names are always there and they're reliable. |
I also have just realized that we allow for |
After the discussions in this PR I am going to limit myself to adding First JuliaData/DataAPI.jl#58 needs to be decided an released. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you! |
Cols
selector provides an easy way to take a union of column names.I think it is natural to add
names(df, cols1, cols2)
that returns an intersection of column names selected bycols1
andcols2
.Why it is natural?
names(df)
returns all columns currently.names(df, cols)
adds one condition on all columns. So it is natural to extend it to more conditions.In this way we will have an easy way to do both column union and intersection. Why no special object for intersection:
names
as they are context sensitiveExample:
@yjunechoe - what is your opinion on this proposal?
CC @pdeffebach