-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency for sanity, being data.frame like for easy transition #1188
Comments
+1MM |
I like this idea too. From the earlier responses, it seems the biggest drawback could be introducing inconsistency, as a new user would expect the two approaches below to return the same result.
Is there a way they both could return the same result? |
@jrowen thanks. Yes they both should return the same result (as data.frame would). x <- "z"
DT[, x] It'd be ambiguous in this case, isn't it? One way off the top of my head is for the enhanced-ness to kick in Hm, now I'm thinking if this'd only create more problems instead :-( |
Perhaps you could make the error message more friendly and help the user. Or even find the cases and add "with = FALSE" and advise the user that the change was made (like with setting column names the "old" way). I have been using data.table for a year and a half, and I periodically want to use column numbers for some quick interactive work and get an error. Not a big deal to type with = FALSE, but a nice reminder would be welcome. This would serve to teach new users as well. |
I don't know. It might just make it harder for people to learn. I agree with Mark that adding a discouraging warning would help with that. If you allow too much prominence to this way of accessing columns, it may prove something of a slippery slope. Can you really do this without also doing these?
Aside: If you do this, perhaps you could add some faster accessor for EDIT: I'm trying to explain what I mean over here: http://chat.stackoverflow.com/transcript/message/24012297#24012297 |
I quite like the automatic |
I agree with eduard, drop= true is one of the worst parts of data.frame. I think it makes sense to implement with= false, as this improves consistency and doesn't materially degrade the quality of data table, but drop= true would just be implementing a bad idea for the sake of consistency. Sent from my iPhone
|
I think it defeats the purpose of the change if you don't use |
@franknarf1 I disagree. |
@eantonya Yeah, I guess we do disagree; sorry if I'm repeating myself, but I'll try to clarify. I'm not crazy about the sometimes-this-sometimes-that behavior of data.frame either, but the premise of this proposed enhancement is that data.frame syntax should be supported to some limited extent. Within that limited scope (when |
@franknarf1 perhaps I should clarify. Ideally, what I'd like is for data.frame syntax in DT[, 1:2]
DT[, c("x", "y")]
cols = c("x", "y")
DT[, cols] all of these should return two column data.table. However, as @jrowen pointed out from the old post, the last case is tricky (for cases like the one I've shown in the previous post). Unless this case can be taken care of quite nicely, I personally don't see a huge advantage of implementing this feature. I can imagine myself explaining the behaviour to beginners (or in a talk) with too many ifs-and-buts.. and that's not helping. So, what would be great is to figure out whether there's a way around the last scenario without breaking too many things. And whether it's worth it. I don't feel strongly about I'm also fully aware of the case |
@arunsrinivasan Yeah, I also don't see a benefit from the feature change. As you say, it seems like it would make explaining the syntax harder and lead to messier code everywhere (as people start using data.frame syntax as a crutch). Back to my aside (mentioned in your last sentence). Yeah, I've never seen anyone else complain about |
@arunsrinivasan I actually don't see a big problem with some cases not working. I see this as guessing @franknarf1 I'm not sure what you mean - of course I'd use this feature myself - I use The framework from which I see this change is that of enhancing |
@eantonya My mistake. I'd find the use of the feature in my code very hard to parse (by eye). As far as the enhancement goes (excluding the mimickry), doesn't Richardo's |
I don't have anything against that option (and I think that should work regardless of this one going in), but would prefer typing As far as how to guess - I would propose the following - if any of the names in I think this takes care of the cases above and a few more I can think of right now. Thinking some more - evaluating smth twice is fairly dangerous, so perhaps it's ok to live with the evaluation result no matter what it is (so return columns for character/int/numeric and actual result otherwise). |
@eantonya I'm not really familiar with parsing R calls, but it sounds like cases like this:
would no longer work, since If some guesswork way were implemented, maybe it could be made into an option, |
Ok, let's add |
Okay, I'll see if I think of or come across any others. Nothing comes to mind beyond
|
Great comments above. In an attempt to draw it all together, I'm thinking we should make the following changes. If I've read correctly, I think (hope!) this will please everyone and displease nobody.
We can always go further later depending on how it goes. We'll wait for everyone who's commented so far to confirm before going ahead (and only then after 1.9.6 is (finally) on CRAN!) |
Seems good to me. As always, thanks to you and Arun for doing the hard work. |
Sounds great to me. |
To reduce inconsistency it can be good to remove default value for |
@jangorecki Yes nice idea - agree. |
I too am in favor of the revised proposal. |
Here's a slightly different viewpoint. There are places where I'd dearly love to dispatch a DT into some old code expecting a DF and automagically pick up a big improvement in merge() speed (and conceivably for other operations involving grouping). Unfortunately with the non-DF behavior of [ that often won't work, and sometimes I end up basically as.x'ing back and forth between DT and DF in order to keep old code happy. One clean solution to this is setcompatibility(c("on","off,"?")). off provides "native DT" behavior for those who want to fully exploit DT capabilities. on provides "native DF" behavior unless the operation clearly doesn't make sense for DF. E.g. I don't think you can have DF[DF,] so something like this would clearly be invoking a DT-style join. Conceivably there could be other compatibility levels, e.g. almost-DF without drop. Since this would be a setxxx by reference it also hopefully has very little performance cost. |
@ronhylton your comments has much wider scope then discussed topic. It might be better to isolate it as new FR. Discussed detection of |
@ronhylton Agree with Jan - best raise a new issue. One option is to place your old code in a package, then it would automatically divert to base syntax when passed a data.table. |
I have reached this thread 3 times over the last year or so, which is when I started using
One of the top search hits for "r data.table subset by column and row" is this page: http://personal.colby.edu/personal/m/mgimond/RIntro/04_Manipulating_data_tables.html, which states "For example, to access one |
This is such a nice fix to what has been a real stumbling block for data.table users. Superb! Leaving a note here as I referenced this in comments following a related SO answer that should itself eventually be edited to reflect the change. |
Error in so annoying error, not letting me knit the file, i'm so very poor in coding, this is daunting me more. |
Hello. I've noticed that some of my old code doesn't work anymore because of this change. For example this code was scaling the values contained in the columns defined by mycols.
But now it doesn't work. I need to do something like this: Is there a better way? |
Hi @skanskan, |
(After a brief discussion with Matt)
The behaviour
with=FALSE
:In talking to colleagues, and at meetings or over emails, it seems that restoring the data.frame behaviour only for those cases where
j
is integer/character vector can only bring more sanity (trading inconsistency).The issue is that
data.table
usage revolves around[
a lot, and therefore users are confronted with having to learn this difference quite early, and having to learn new syntax for a known basic operation doesn't sit well. It also doesn't seem to help in explaining how a data.table is a data.frame with this basic operation.AFAICT, there's no real usage to having just character/integer vectors in
j
. Therefore, it'd be great to havewith=FALSE
being unnecessary and be able to subset columns the data.frame way:The default return of vector in case of only one column and use of
drop=FALSE
should also be restored. This'll help get over the basic data.frame like usage very quickly without having to wonder "why", and start learning the actual essential enhanced-ness data.table provides.It'd be great to hear thoughts from other users as well.
This has come up before (raised by Matt) : http://r.789695.n4.nabble.com/with-FALSE-td4589266.html but 'leave it as it is' was the response more or less.
The text was updated successfully, but these errors were encountered: