Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

Closed
eringrand opened this issue Jan 8, 2018 · 2 comments

Comments

@eringrand
Copy link

If the data frame that I'm trying to check for duplicates has a column called n (ex. from dplyr::count() or dplyr::add_count(), then get_dupes() gets a bit confused in terms of the numbering. I believe this is because get_dupes() uses dplyr::count(), which adds a count column called "nn" instead of "n", but get_dupes() is still using n as the count column.

For example, in this toy example student 102 does not have a duplicate for subject = 1, yet get_dupes() is giving three duplicates instead of 2.

library(tidyverse)
library(janitor)

students <- tibble::tribble(
                        ~student_number, ~grade, ~subject,
                                   100L,      7,        1,
                                   100L,      6,        1,
                                   102L,      7,        0,
                                   102L,      7,        1,
                                   102L,      8,        0,
                                   105L,      7,        0
                        ) %>%
     add_count(student_number)
 
get_dupes(students, student_number, subject)
#> Warning: package 'bindrcpp' was built under R version 3.3.3
#> # A tibble: 5 x 5
#>   student_number subject    nn grade dupe_count
#>            <int>   <dbl> <int> <dbl>      <int>
#> 1            100       1     2     7          2
#> 2            100       1     2     6          2
#> 3            102       0     2     7          3
#> 4            102       0     2     8          3
#> 5            102       1     1     6          3
@sfirke sfirke closed this as completed in 28b402b Jan 9, 2018
@sfirke
Copy link
Owner

sfirke commented Jan 9, 2018

Great bug report! Thank you. Providing the source of the problem (dplyr::count() producing nn) and a reprex sure made this easy to fix. I think this is fixed but let me know if not.

@sfirke
Copy link
Owner

sfirke commented Jan 9, 2018

(also I had not seen dplyr::add_count(), filing that one away)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants