Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a better way to set the key in complicated one-liners #794

Closed
eantonya opened this issue Sep 2, 2014 · 4 comments
Closed

a better way to set the key in complicated one-liners #794

eantonya opened this issue Sep 2, 2014 · 4 comments

Comments

@eantonya
Copy link
Contributor

eantonya commented Sep 2, 2014

If one tries to set the key on an intermediate expression before continuing, the flow breaks down, as the setkey has to be in front and the whole thing becomes a mess. Here's a random example (that doesn't do anything, and can be done differently, but imagine more complex stuff instead, e.g. http://stackoverflow.com/a/25628423/817778) of what I have in mind:

dt = data.table(a = 1:6, b = 1:2)

setkey(dt[, a[1], by = b], V1)[, c := 2*V1]

For a short period of time someone pointed out (I forget who, sorry) that it can be done like so:

dt[, a[1], by = b][, setkey(.SD, V1)][, c := 2*V1]

which I actually thought was fairly neat, but then this stopped being an option after .SD bindings got locked down.

I think we need to either allow the above .SD expression or have another alternative that can be written/read in a similar manner, from left to right, without having to skip back to the beginning of the sentence.

@arunsrinivasan
Copy link
Member

.SD is allocated just once with the length of the maximum group, to avoid creating it again and again for each group. So I wouldn't opt for unlocking the binding, as it could lead to pretty nasty-to-track bugs.

@jangorecki
Copy link
Member

You should still be able to use setattr(setorderv(.SD,cols),"sorted",cols), but I'm not sure how safe it is.

@MichaelChirico
Copy link
Member

It's not clear to me from the example why we need to set the key here.

I found myself doing the messy mid-line setkey much more before on was an option; is this still a necessary feature? I'm struggling to think of an example where it's necessary, maybe something like:

setkey(dt[ , a[1], by = b], V1)[.(V1[1]), sum(a)]

Can't say I've ever needed something like that.

Also just recalled @jangorecki 's advice on #590:

If you use chaining you can also use fread(...)[, .SD,, "key"] which makes it a start point of chain for operations on sorted data.

@eantonya
Copy link
Contributor Author

Agreed, on solves the issue neatly. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants