first on empty DT should return empty DT #3858

jangorecki · 2019-09-11T17:09:59Z

dt = data.table(a=1,b=2)[0,]
first(dt)
#       a     b
#1:    NA    NA
head(dt, 1)
#Empty data.table (0 rows and 2 cols): a,b

… dt returns empty dt, closes #3857, #3858

st-pasha · 2019-09-11T18:38:06Z

I was recently struggling with a similar question in datatable, regarding applying reduce operators to a 0-row Frame. Conceptually there could be 2 approaches for grouping such a frame:

it creates 1 group of 0 rows;
it creates 0 groups of any rows.

Curious to hear your reasoning as to which of them is better.

jangorecki · 2019-09-11T19:09:59Z

Definitely 0 groups of any rows. 1 group of 0 rows make sense for a grand total summary where we are applying reduce function without any actual grouping. Related subject are grouping sets: rollup, cube

d = data.table(grp=character(), val=numeric())
groupingsets(d, by="grp", sets=list(character()), j=.(sum=sum(val), mean=mean(val), len=length(val)))
#      grp   sum  mean   len
#   <char> <num> <num> <int>
#1:   <NA>     0   NaN     0
d = data.table(grp="a", val=1)
groupingsets(d, by="grp", sets=list(character()), j=.(sum=sum(val), mean=mean(val), len=length(val)))
#      grp   sum  mean   len
#   <char> <num> <num> <int>
#1:   <NA>     1     1     1

sets=list(character()) denotes grand total aggregation only

st-pasha · 2019-09-11T21:28:20Z

So, is there a difference between grouping by an empty vector (such as in your example with grouping sets), and having no by= clause at all in DT[i,j]? For example, if DT=data.table(A=numeric()), then what should be returned from DT[, .(first(A), sum(A), min(A))]?

… dt returns empty dt, closes #3857, #3858 (#3859)

jangorecki · 2019-09-12T07:21:14Z

@st-pasha generally the same as we would run it outside of data.table

> A=numeric()
> list(head(A,1L), sum(A), min(A))
[[1]]
numeric(0)

[[2]]
[1] 0

[[3]]
[1] Inf

Warning message:
In min(A) : no non-missing arguments to min; returning Inf

extra warning inside dt occurs due to different length of results

st-pasha · 2019-09-12T18:10:50Z

@jangorecki Then why DT[,first(A)] was mapped to head(A,1), and not to first(A) (as seems most straightforward), or A[1] (as alluded to in documentation of first)?

> A = numeric()
> head(A,1L)
numeric(0)
> first(A)
[1] NA
> A[1]
[1] NA

Or is it because first() is not really considered a reduce operation? Because in python datatable we classify first as a reducer, and this is where the discrepancy may be coming from.

jangorecki · 2019-09-13T11:08:10Z

there is no first/last in base R. So first/last needs to wrap either to

head(x, n=1) and tail(x, n=1) or
x[1] and x[max(length(x), 1)] (yes, extra complexity needed here)

Latter will always expand to a 1 element vector. We decided to wrap to head/tail.
xts which implemented first/last long time ago seems to be affected by same inconsistency: joshuaulrich/xts#309

jangorecki self-assigned this Sep 11, 2019

jangorecki added a commit that referenced this issue Sep 11, 2019

first and last not load xts namespace when not needed, first on empty…

537c81f

… dt returns empty dt, closes #3857, #3858

jangorecki mentioned this issue Sep 11, 2019

first and last not load xts namespace when not needed #3859

Merged

jangorecki added this to the 1.12.4 milestone Sep 11, 2019

mattdowle closed this as completed in #3859 Sep 12, 2019

mattdowle pushed a commit that referenced this issue Sep 12, 2019

first and last not load xts namespace when not needed, first on empty…

09aaac4

… dt returns empty dt, closes #3857, #3858 (#3859)

jangorecki added a commit that referenced this issue Sep 13, 2019

first-last examples improve, #3858

ac14ad6

jangorecki mentioned this issue Sep 13, 2019

first-last examples improve, #3858 #3870

Merged

mattdowle pushed a commit that referenced this issue Sep 13, 2019

first-last examples improve, #3858 (#3870)

98cfa12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first on empty DT should return empty DT #3858

first on empty DT should return empty DT #3858

jangorecki commented Sep 11, 2019

st-pasha commented Sep 11, 2019

jangorecki commented Sep 11, 2019 •

edited

Loading

st-pasha commented Sep 11, 2019

jangorecki commented Sep 12, 2019 •

edited

Loading

st-pasha commented Sep 12, 2019

jangorecki commented Sep 13, 2019 •

edited

Loading

first on empty DT should return empty DT #3858

first on empty DT should return empty DT #3858

Comments

jangorecki commented Sep 11, 2019

st-pasha commented Sep 11, 2019

jangorecki commented Sep 11, 2019 • edited Loading

st-pasha commented Sep 11, 2019

jangorecki commented Sep 12, 2019 • edited Loading

st-pasha commented Sep 12, 2019

jangorecki commented Sep 13, 2019 • edited Loading

jangorecki commented Sep 11, 2019 •

edited

Loading

jangorecki commented Sep 12, 2019 •

edited

Loading

jangorecki commented Sep 13, 2019 •

edited

Loading