Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_on() doesn't allow logical operator other than = #6

Closed
leungi opened this issue Jun 23, 2019 · 3 comments
Closed

filter_on() doesn't allow logical operator other than = #6

leungi opened this issue Jun 23, 2019 · 3 comments

Comments

@leungi
Copy link

leungi commented Jun 23, 2019

Reprex below.

Intent: get last row by group

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(table.express)
#> 
#> Attaching package: 'table.express'
#> The following object is masked from 'package:dplyr':
#> 
#>     order_by
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(data.table)
#> Warning: package 'data.table' was built under R version 3.5.3
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last

carsDT = data.table(mtcars, keep.rownames = TRUE)

#### dplyr
mtcars %>% 
  group_by(am, vs) %>% 
  filter(row_number() == n())
#> # A tibble: 4 x 11
#> # Groups:   am, vs [4]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21.5     4  120.    97  3.7   2.46  20.0     1     0     3     1
#> 2  19.2     8  400    175  3.08  3.84  17.0     0     0     3     2
#> 3  15       8  301    335  3.54  3.57  14.6     0     1     5     8
#> 4  21.4     4  121    109  4.11  2.78  18.6     1     1     4     2

#### data.table
carsDT[, .SD[.N], by=.(am, vs)]
#>    am vs               rn  mpg cyl  disp  hp drat    wt  qsec gear carb
#> 1:  1  0    Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60    5    8
#> 2:  1  1       Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60    4    2
#> 3:  0  1    Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01    3    1
#> 4:  0  0 Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05    3    2

#### table.express
# attempt to get last row by group, leveraging mult argument
carsDT %>% 
  start_expr %>% 
  group_by(am, vs) %>%
  table.express::order_by(am, vs) %>% 
  table.express::filter_on(!is.na(am), mult = 'last') %>%
  end_expr
#> All arguments in '...' must be named.

Created on 2019-06-23 by the reprex package (v0.2.1)

@asardaes
Copy link
Owner

For this kind of filtering, data.table only supports specifying values, not expressions. However, it does support something like carsDT[!list(NA), on = "am"], so I could add a .negate parameter to filter_on that would add that initial !.

Nevertheless, that still won't work for your use case because data.table ignores by when there's no j. To select the last row by group, you could do the following:

carsDT %>% 
  start_expr %>% 
  arrange(am, vs) %>%
  group_by(am, vs) %>%
  select(.SD[.N]) %>%
  end_expr

Note that table.express exports the generics from dplyr, so you would only need the namespace for order_by, or you can simply use arrange if you also want to have dplyr loaded.

asardaes added a commit that referenced this issue Jun 23, 2019
@leungi
Copy link
Author

leungi commented Jun 23, 2019

Thanks for the prompt response @asardaes.

I came very close earlier by doing:

select(carsDT[.N])

I realized that you leveraged the select() as a pseudo summarise() from 'data.table' point of view, which is a brilliant touch! I'll have to adapt to this framework; appreciate your patience.

I'm aware that data.table ignores by when j is empty; hence I was hacking around including using mutate_sd() with .SDcols set to .N, but to no avail.

Point taken for order_by(); I did check conflicts with conflicted::conflict_scout() prior, but keeping it safe.

@leungi
Copy link
Author

leungi commented Jun 24, 2019

@asardaes: I'll close this, since the negation is not needed with what you already have.

A nice touch would be to allow filter_on() have the same .collapse argument as filter_sd(), for consistency. 😄

carsDT[!list(0), on = "am"]

carsDT %>% 
  start_expr %>% 
  filter_sd(.COL != 0, .SDcols = "am") %>% 
  end_expr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants