local and global versions of `.I`, `.N` #1206

MichaelChirico · 2015-07-03T17:06:47Z

It's always been a bit confusing to me that .I is "global" in the sense that it doesn't change with by, while .N is "local" in the sense that it does.

I understand (some of) the advantages of this arrangement, but I think there are ample situations for using a local .I (see, e.g. 1, 2, 3, etc.) or a global .N (e.g., 1).

I'm not sure how easy this is to build into the source code, but having .i and .n be "local" while .I and .N are "global" seems like an intuitive alternative. On the other hand, it could be painful to switch the behavior of .N given that it's so ubiquitous in data.table code.

Throwing a hat in the ring for .SD and .sd as well, since I've been tempted a few times to try .SD with the intention of getting the full table within by, specifically here.

The text was updated successfully, but these errors were encountered:

franknarf1 · 2015-07-03T17:08:24Z

I agree. It'd be quite a break from backward compatibility, but that notation would be useful and a lot more intuitive.

jangorecki · 2015-07-03T19:59:35Z

Quite a big chance. Not sure if the performance gain are big enough to use .i instead of current 1:.N, anybody measured it? 2.0.0 release is going to have some breaking changes so this could be the place to release such change.

arunsrinivasan · 2015-07-07T10:59:08Z

Like the idea very much, but not sure if it's possible at this stage, as it'd break a lot of code.. Where were you when this was implemented first :P?

Marked as FR for now.

MichaelChirico · 2015-07-07T19:27:37Z

In my R swaddling blankets, I suppose ;)

I understand .N->.n is a big push, but I rarely need that.

.i, however, shouldn't break any code and I would use it all the time!

arunsrinivasan · 2015-07-07T20:46:48Z

Right. But seq_len(.N) is .i what you're looking for.. Is that not okay? I ask because I find the intent quite clear and understandable. .i and .I can get quite confusing quickly. If it's really necessary, then maybe .seqN? Not sure. I'm always on alert when we've to add more symbols :-).

MichaelChirico · 2015-07-07T23:23:14Z

It feels pretty natural to me, and writing, e.g. var[.i<5] is certainly more intuitive (and cleaner!) than var[seq_len(.N)<5] (or even var[.seqN<5]), but maybe that's just me--FWIW I have more of a math than a programming background, which may be why it's easy for me to compartmentalizing capital vs. lowercase symbols.

I understand (and appreciate!) the aversion to over-loading data.table with arcane symbols, but <opinion>I think that anyone that can handle .I and .N can conquer .i quickly, given the tight relationship to .I. <\opinion>.

Just one more parallel to draw--var[.N] is redundant with var[length(var)], but .len was eschewed for the (I think) clearly superior .N; .end also would have worked but would seem more obtuse in other contexts (e.g. var:=.end).

Food for thought! Thanks for the consideration.

franknarf1 · 2015-07-07T23:47:47Z

Or... .GRPI?

Personally, I think it would be easier for new users if the shortcuts were revamped to not only include this extra one but also to be consistent in some sense, like

capital/lowercase (which I also prefer, being a mathy type) or
.I & .N / .GRPI & .GRPN or
.I & .N / .GI & .GN (also making .G an alias or replacement for .GRP) or
.DTI & .DTN / .I & .N

Breaking compatibility like this isn't so great, and I'd settle for .i or .GRPI or .GI or even .seqN (though it strikes me as too R-ish) alone. I'd use that shortcut all the time.

MichaelChirico · 2015-07-10T21:10:11Z

I've added an example to the main post of when my instinct was to use .n and .N but needed to use nrow(dt) instead

franknarf1 · 2017-04-25T17:27:49Z

Maybe related: it might be nice to have .NGRP for the number of groups. E.g., here it could use the condition if (.GRP != .NGRP) instead of if(.I[.N] < nrow(DT)). http://stackoverflow.com/a/43615843/

This would also be nice for easily tracking progress by throwing a print(.GRP/.NGRP) into j.

st-pasha · 2018-01-31T00:41:03Z

How about the following scheme

Current	New symbol	Meaning
`.I` (if no groups)	`.I`	row number in the resulting data.table
`.N` (if no groups)	`.N`	number of rows in the resulting data.table (may not be always computable)
?	`x.I`	row number in `DT`
?	`x.N`	number of rows in `DT`
?	`i.I`	row number in `i` data.table when joining
?	`i.N`	number of rows in `i` data.table when joining
`.SD`	`.SD`	data.table with subset of data within the current group
`.I`	`.SD.I`	row number within the current group
`.N`	`.SD.N`	number of rows within the current group
`.BY`	`.BY`	data.table with all groupby keys, OR current key within the current group
`.GRP`	`.BY.I`	group counter
?	`.BY.N`	number of groups

MichaelChirico · 2018-01-31T02:16:22Z

Not sure when symbol overload kicks in... certainly most seem intuitive (though I admit I don't immediately get .BY.I/.BY.N. Why not .GRP.I and .GRP.N?

And why wouldn't .N always be computable? Unless there's some plan for distributing data.table?

The primary concern remains the introduction of code-breaking behavior.

st-pasha · 2018-01-31T02:36:08Z

The idea is that ?.I is always an index within some data.table, where ? explicitly states which one (and empty ? means the data.table which is being constructed, and hence having no name yet). Similarly, ?.N always denotes the number of rows in N.

The symbols .BY.I and .BY.N refer to data.table .BY, which is the currently existing symbol and it denotes the data.table of all unique group-by keys. On the contrary, .GRP currently means the "group counter", so .GRP.I/.GRP.N would require change in the meaning of .GRP.

I was trying to make a suggestion that is least breaking and most logically consistent. As it stands, it only changes the meaning of .I and .N, and only within the group-by context.

.N may not be computable if j expression returns a data.table with unpredictable number of rows. So if you have an expression like DT[, {if(.N>5) .SD else data.table()} ] then it is impossible to know how many rows there will be in the resulting data.table (which is the new meaning of .N) until you actually construct that data.table.

MichaelChirico · 2018-03-13T07:06:34Z

Adding this potentially confusing syntax to this issue (not sure if worth fixing):

testDT = data.table(full1 = LETTERS, full2 = letters)
# .N = nrow(testDT)
testDT[seq(1, .N, by = 2L),
# .N = .5 * nrow(testDT)
        stagger1 := rnorm(.N)]

daynefiler · 2023-11-10T18:25:01Z

Just wanted to ping this as a great feature request -- I would very regularly use a version of .I and second the proposal for .GRPI.

arunsrinivasan added the feature request label Jul 7, 2015

MichaelChirico mentioned this issue Sep 24, 2015

need 'rowid()' like 'rleid()' #1353

Closed

franknarf1 mentioned this issue Jan 30, 2018

Symbol .I consistency when not grouping #2598

Open

MichaelChirico mentioned this issue Jul 2, 2019

How do .I and .SD work? #3668

Closed

MichaelChirico mentioned this issue Jan 30, 2020

Add .NGRP; the number of groups #4215

Merged

jangorecki changed the title ~~Feature Request: local and global versions of .I, .N~~ local and global versions of .I, .N Apr 6, 2020

ColeMiller1 mentioned this issue Sep 1, 2020

.NROW as a way to access the nrow after subsetting in i but before entering a subgroup #4688

Closed

MichaelChirico added the breaking-change issues whose solution would require breaking existing behavior label Nov 10, 2023

MichaelChirico mentioned this issue Nov 15, 2023

seq_linter() should recommend .I for 1:.N r-lib/lintr#2101

Closed

jangorecki added this to the 2.0.0 milestone Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local and global versions of `.I`, `.N` #1206

local and global versions of `.I`, `.N` #1206

MichaelChirico commented Jul 3, 2015

franknarf1 commented Jul 3, 2015

jangorecki commented Jul 3, 2015

arunsrinivasan commented Jul 7, 2015

MichaelChirico commented Jul 7, 2015

arunsrinivasan commented Jul 7, 2015

MichaelChirico commented Jul 7, 2015 •

edited

Loading

franknarf1 commented Jul 7, 2015

MichaelChirico commented Jul 10, 2015

franknarf1 commented Apr 25, 2017 •

edited

Loading

st-pasha commented Jan 31, 2018

MichaelChirico commented Jan 31, 2018

st-pasha commented Jan 31, 2018

MichaelChirico commented Mar 13, 2018

daynefiler commented Nov 10, 2023

local and global versions of .I, .N #1206

local and global versions of .I, .N #1206

Comments

MichaelChirico commented Jul 3, 2015

franknarf1 commented Jul 3, 2015

jangorecki commented Jul 3, 2015

arunsrinivasan commented Jul 7, 2015

MichaelChirico commented Jul 7, 2015

arunsrinivasan commented Jul 7, 2015

MichaelChirico commented Jul 7, 2015 • edited Loading

franknarf1 commented Jul 7, 2015

MichaelChirico commented Jul 10, 2015

franknarf1 commented Apr 25, 2017 • edited Loading

st-pasha commented Jan 31, 2018

MichaelChirico commented Jan 31, 2018

st-pasha commented Jan 31, 2018

MichaelChirico commented Mar 13, 2018

daynefiler commented Nov 10, 2023

local and global versions of `.I`, `.N` #1206

local and global versions of `.I`, `.N` #1206

MichaelChirico commented Jul 7, 2015 •

edited

Loading

franknarf1 commented Apr 25, 2017 •

edited

Loading