-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Columns appearing in the function in by=
disappers in j
#1427
Comments
Yeah, I agree it's quite inconvenient and was thinking about a similar FR myself. So +1 for this. Though I wonder what the majority of users prefer. It might need to add another argument in order to be possible to generate both options. |
I've long been annoyed by this and had assumed it was filed as a bug already. Even if you try to be clever and get around it with |
Cannot that be tricked with |
@jangorecki Maybe. Also Anyway, I'm not sure that we need to change the default, but my workaround feels awfully hacky. |
Just wrote an SO answer where this would have made for a shiny clean solution. Would also be nice (though sub-optimal, IMO) to be able to add
|
Just ran into the same issue, this was unexpected. My use was to spit out one CSV for each year, while retaining original date for comparison. My expectation was that For now, I'll just generate the column beforehand. I do see that the documentation for .SD does say "excluding any columns used in by", I guess I had never noticed that bit before. |
The I would say it's redundant to store the column data both in the file name and in the file, so using I do understand the convenience factor of the redundant column, and I think Two more notes:
|
I think we had a misunderstanding. I am using I want to write out one file for each year's worth of data. But, each row in the data should maintain the actual column. The example below should make this clearer than my ramblings did.
Created on 2020-06-22 by the reprex package (v0.3.0) My thought process was the by columns are the columns generated which encode the actual grouping. In the second example, d does not encode the actual grouping, so it doesn't count as one of the by columns. To be clear, not saying the current behavior is wrong, or bad, just that it was unexpected in my mental model. As I said, it's not much to ask to create the grouping column explicitly before writing, so I have done that. |
Yes, recognize you're talking about Your use case is slightly different in that the next input step is Excel, not R.
|
Another work around is library(data.table)
dt <- data.table(a = 1, d = as.Date(c('2020-01-02', '2018-01-02')))
dt[,print(cbind(d, .SD)), by = list(year = year(d))]
#> d a
#> <Date> <num>
#> 1: 2020-01-02 1
#> d a
#> <Date> <num>
#> 1: 2018-01-02 1
#> Empty data.table (0 rows and 1 cols): year These are the two lines that cause the Line 752 in ad7b67c
Line 920 in ad7b67c
While I would expect a different output, @Henrik-P pointed this help text out in #3262
Here, column If we wanted to address this, I would start with refactoring how the |
Thank you for the alternative workaround, always good to learn something new. I didn't know that What confused me is I considered the columns used in |
related: #4079 |
I find it confusing and contra-intuitive that a computed expression in I came across this issue while answering this question on SO which requires to create a grouping variable on-the-fly. My expectation was that the code below should be working
but gave the error message
because column I had to ressort to the workaround proposed by ColeMiller1:
I found three other workarounds
and
and
which look even more convoluted. On the other hand, I understand that it might be annoying as well if grouping variables appear twice in the result: First as grouping var in front and then a second time as part of Would it be possible to distinguish in the
Examples:
The default names The |
It seems that all columns appearing the the function that produces the
by
column will not be included inj
nor in.SD
.For example, I have a data table of a long list of yyyyMMdd dates from
20150101
to20151001
and I useby = substr(date, 1, 6)
to group the data into year-months. But in each group accessed either from.SD
or in the scope ofj
,date
column disappears so that I cannot get the original date in this way. I am not sure if previous versions had this problem (I remember its behavior does not look like this before or I'm wrong).To walk around I have to first make the new column
year_month
first and thenby = year_month
.I'm using latest version of data.table (v1.9.6) in CRAN.
The text was updated successfully, but these errors were encountered: