Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: A function to check if a set of variables form a unique ID in a dataframe #7098

Closed
bholtemeyer opened this issue Oct 30, 2024 · 2 comments

Comments

@bholtemeyer
Copy link

I'd like to have a function to check if a set of variables form a unique ID in a dataframe, like this: https://search.r-project.org/CRAN/refmans/eeptools/html/isid.html

I think this would make code more readable as pipes would not need to be involved.

function would return TRUE or FALSE. TRUE indicates the variables uniquely identify the rows. FALSE indicates they do not.

@ggrothendieck
Copy link

ggrothendieck commented Nov 18, 2024

If the reason to want this is is so that one can check prior to using mutate(..., .by = ...) to get the effect of rowwise then perhaps it would be better to support something like .by = .ROWID .

A one-liner that calculates isid would be:

isid <- function(data, ...) ! anyDuplicated(data[c(...)])

isid(anscombe) # TRUE
isid(anscombe, "x1", "x2") # TRUE
isid(anscombe, c("x1", "x2")) # TRUE
isid(anscombe, "x4") # FALSE

anscombe
##    x1 x2 x3 x4    y1   y2    y3    y4
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## 3  13 13 13  8  7.58 8.74 12.74  7.71
## 4   9  9  9  8  8.81 8.77  7.11  8.84
## 5  11 11 11  8  8.33 9.26  7.81  8.47
## 6  14 14 14  8  9.96 8.10  8.84  7.04
## 7   6  6  6  8  7.24 6.13  6.08  5.25
## 8   4  4  4 19  4.26 3.10  5.39 12.50
## 9  12 12 12  8 10.84 9.13  8.15  5.56
## 10  7  7  7  8  4.82 7.26  6.42  7.91
## 11  5  5  5  8  5.68 4.74  5.73  6.89

@DavisVaughan
Copy link
Member

I think there are many ways to use existing tools for this, so I think it is a little too niche to make a helper in dplyr for this

library(dplyr)
library(vctrs)

uniquely <- function(...) {
  args <- rlang::list2(...)
  names(args) <- paste0("..", seq_along(args))
  args <- vctrs::new_data_frame(args)
  !vctrs::vec_duplicate_any(args)
}

anscombe |>
  summarise(
    res = !vec_duplicate_any(pick(x1, x2)),
    res2 = uniquely(x1, x2),
    res3 = n_distinct(x1, x2) == nrow(anscombe)
  )
#>    res res2 res3
#> 1 TRUE TRUE TRUE

anscombe |>
  summarise(
    res = !vec_duplicate_any(pick(x4)),
    res2 = uniquely(x4),
    res3 = n_distinct(x4) == nrow(anscombe)
  )
#>     res  res2  res3
#> 1 FALSE FALSE FALSE

See #6660 for .by = row ideas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants