Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to keep/discard list elements by name #817

Closed
jnolis opened this issue Jan 23, 2021 · 14 comments · Fixed by #927
Closed

Feature request: ability to keep/discard list elements by name #817

jnolis opened this issue Jan 23, 2021 · 14 comments · Fixed by #927
Labels
feature a feature request or enhancement

Comments

@jnolis
Copy link

jnolis commented Jan 23, 2021

From the discussion in this twitter thread it seems there is a need to remove elements from lists by name. The current "best" solution is to assign an element NULL using base R commands, which does not have an elegant tidy piping implementation. Since this is a fairly common task that needs to be done, it would be helpful to create a purrr a function that can easily be put within a sequence of piped purrr calls:

image

One approach I have been thinking of after writing that last tweet would be to create a purrr::keep_names() and purrr::discard_names(). The point of these functions would be to closely mimic the existing purrr::keep() and purrr::discard(), but to have the functions be applied to the names of the list rather than the values. It could also work on a vectors of names as the input rather than a function, for the common case when you just want to keep/remove specific elements. So something like this:

library(purrr)

example <- as.list(1:4)
names(example) <- list("a", "b", "c", "rstudioconf_2022","cat")

> keep_names(example, c("a","b"))
$a
[1] 1

$b
[1] 2

> discard_names(example, ~ .x %in% letters)
$rstudioconf_2022
[1] 3

$cat
[1] 4

And then in the case of the Twitter thread, Elaine could have just added discard_names("b") into her code.


Things I like about adding these functions:

  1. They solve a problem I personally have as well.
  2. They seem to open the door a little bit to more functions that work on names that could be useful in purrr. For example, I also do x <- setnames(x, x) a lot at the start of my purrr piping sequences and that could have a convenience function.

This I dislike about adding these functions:

  1. They seem like they could be inefficient applying to each element independently. In the case of the discard_names example above, the anonymous function would have been called twice, but since %in% is vectorized here a single function call across the whole vector of names would have been fine to get the boolean results of the function. On a list thousands of elements long this could be a problem.
  2. The set of functions you could theoretically open to having some sort of "name" equivalent (like map_names()) is so big I do fall into some "slippery slope" fears of this going too far.

These functions seems simple enough that I would think I could personally make a PR request to add them. I would love some feedback on if other people would want them included or if they should be changed somehow. Thank you!!

@ijlyttle
Copy link
Contributor

stray observation: given the correspondence between lists and data-frames, could tidyselect be useful here?

@lionel-
Copy link
Member

lionel- commented Jan 23, 2021

@ijlyttle You can experiment with this unexported function which implements tidyselect over all vector inputs:

list(a = 1, b = 2, aa = 3) %>%
  tidyselect:::select(starts_with("a"))
#> $a
#> [1] 1
#>
#> $aa
#> [1] 3

c(a = 1, b = 2, aa = 3) %>%
  tidyselect:::select(starts_with("a"))
#>  a aa
#>  1  3

@lionel-
Copy link
Member

lionel- commented Jan 23, 2021

This is probably more a funs:: function than a purrr:: one though.

@jnolis
Copy link
Author

jnolis commented Jan 23, 2021

Honestly if tidyselect:::select could become an exported function (and perhaps renamed to avoid confusion with dplyr::select) I think that would do exactly what I was looking for, right?

@lionel-
Copy link
Member

lionel- commented Jan 23, 2021

Right. This is a big design decision though.

In the meantime you can add it to your set of helper functions if you'd like to use it right away:

vec_select <- function(.x, ..., .strict = TRUE) {
  pos <- tidyselect::eval_select(quote(c(...)), .x, strict = .strict)
  rlang::set_names(.x[pos], names(pos))
}

It might be slow with long vectors. Feel free to post any feedback in an issue on the tidyselect repo.

@deeenes
Copy link

deeenes commented Apr 23, 2021

This is probably more a funs:: function than a purrr:: one though.

And in more complex cases, when the predicate function needs to operate both on the name and the value at the same time?

@DanChaltiel
Copy link

DanChaltiel commented Oct 4, 2021

This is probably more a funs:: function than a purrr:: one though.

I strongly agree with deeenes here.

IMHO, one could expect a rather homogenous design throughout the tidyverse, and dplyr::select() have us used to more complex cases such as:

purrr::keep(example, c("a", starts_with("c"), where(~str_detect(.x, "\\d+"))))

It would be pretty awesome if purrr::keep() could behave exactly like dplyr::select() and could use both names and predicates (and even tidyhelpers if possible).

@zsigmas
Copy link

zsigmas commented Dec 3, 2021

For those looking for a simple and pipable solutions, albeit only covers a simple cases but they can be modified to cover more general cases.

The easiest solution here is to use indexing and not assigning NULL to the entry, unless for some reason that is a must.

function naming can be improved (not my strongest point :) ) but I think the general gist on how to create these functions is there.

#' @param l a named list
#' @param kn a vector containing the names to keep

keep_names <- function(l, kn) {
  l[names(l) %in% kn]
}

x <- list(a = 1, b = 2, c = 3)
keep_names(x, "a")
# $a
# [1] 1

keep_names(x, c("a", "b"))
# $a
# [1] 1
# 
# $b
# [1] 2

#' @param l a named list
#' @param fn a function that will receive a list of the names. Must produce a TRUE FALSE value. Must be vectorized.

keep_names_func <- function(l, fn) {
  l[fn(names(l))]
}

x <- list(ka = 1, kb = 2, c = 3)
keep_names_func(x, function(n){startsWith(n, "k")})
# $ka
# [1] 1
# 
# $kb
# [1] 2

# Or even one with both names and value

#' @param l a named list
#' @param fn a function that will receive the list of names and value. a function that will receive a list of the names. Must produce a TRUE FALSE value. Must be vectorized.

keep_names_func_both <- function(l, fn) {
  l[fn(names(l), l)]
}

x <- list(ka = 1, kb = 2, c = 3)
keep_names_func_both(x, function(n, v){startsWith(n, "k") & v>=2})
# $kb
# [1] 2

@ijlyttle
Copy link
Contributor

ijlyttle commented Jan 5, 2022

I came across purrr's modify_at() function, which seems to have everything that's needed, including tidyselect.

Two problems, though:

  1. vars(), which I understand is rlang::quos(), is not available in purrr.
  2. It doesn't seem to work.

I figured I'd put it in front of the group, using @jnolis' example:

library("purrr")

example <- as.list(1:5)
names(example) <- list("a", "b", "c", "rstudioconf_2022", "cat")

# this seems like it should work, but it doesn't
modify_at(example, rlang::quos(any_of(letters)), ~NULL)
#> $b
#> [1] 2
#> 
#> $rstudioconf_2022
#> [1] 4

# same thing - the sets should be complementary, but they aren't
modify_at(example,  rlang::quos(!any_of(letters)), ~NULL)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3
#> 
#> $cat
#> [1] 5

Created on 2022-01-05 by the reprex package (v2.0.1)

Of course, I could be doing something wrong™️.

@hadley hadley added the feature a feature request or enhancement label Aug 24, 2022
@hadley
Copy link
Member

hadley commented Aug 27, 2022

I think these are interesting ideas but I don't quite see how they fit into purr. A straightfforward implementation of keep_names() and discard_names() feels a bit too simple for purrr:

discard_names <- function(.x, .p, ...) {
  sel <- .p(names(x))
  .x[!is.na(x) & !sel]
}

keep_names <- function(.x, .p, ...) {
  sel <- .p(names(x))
  .x[!is.na(x) & sel]
}

And we're currently moving away from tidyselect usage in purrr, because NSE just doesn't feel very "purrr-like".

But maybe we could make something a bit more flexible?

keep_names <- function(.x, .names, ...) {
  if (is.character(.names) {
    idx <- intersect(names(.x), .names)
  } else if (is.function(.names) || is_formula(.names)) {
    ,names <- rlang::as_function(.names)
    idx <- .names(names(x))
    
    if (is.logical(idx)) {
      idx[is.na(idx)] <- FALSE
    } else if (is.character(idx)) {
      idx <- intersect(names(.x), idx)
    } else if (!is.integer(idx)) {
      abort("If `.names` is a function, it must return an logical, integer, or character vector")
    }
    
  }
  .x[idx]
}

Then you could write x |> keep_names("foo") or x |> keep_names(~ .x %in% LETTERS) etc.

@jnolis
Copy link
Author

jnolis commented Aug 28, 2022

That seems like a reasonable compromise to me! I'd also have the negation for discard_names, but that simple implementation covers the cases I was thinking of when I wrote this.

@hadley
Copy link
Member

hadley commented Sep 8, 2022

Just realised that these should probably be keep_at() and discard_at(), and we should extend the same handling of names and integers to map_at(), modify_at(), etc.

@jnolis
Copy link
Author

jnolis commented Sep 9, 2022

That makes sense to me! a map_at() that lets you programmatically rename a list seems especially convenient.

@hadley
Copy link
Member

hadley commented Sep 9, 2022

Some progress:

keep_at <- function(.x, .names, ...) {
  if (!is_named(.x)) {
    cli::cli_abort("{.arg .x} must be named")
  }
  x_names <- names(.x)

  if (is.character(.names)) {
    idx <- intersect(.names, x_names)
  } else if (is.function(.names) || is_formula(.names)) {
    names <- rlang::as_function(.names)
    idx <- .names(x_names, ...)

    if (is.logical(idx)) {
      if (length(idx) != length(x_names)) {
        cli::cli_abort("Result of `.fun .names()` must be length {length(x_names}) not {length(idx)}.")
      }
      idx[is.na(idx)] <- FALSE
    } else if (is.character(idx)) {
      idx <- intersect(names(.x), idx)
    } else {
      cli::cli_abort("If {.arg .names} is a function, it must return a logical or character vector, not {.obj_type_friendly {idx}}.")
    }
  } else {
    names <- .names
    cli::cli_abort("{.arg .names} must be a function or a character vector, not {.obj_type_friendly {names}}.")
  }
  .x[idx]
}

@jnolis to be clear, map_at() and friends already exist and only apply the transformation to the named elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants