get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

eringrand · 2018-01-08T21:26:26Z

If the data frame that I'm trying to check for duplicates has a column called n (ex. from dplyr::count() or dplyr::add_count(), then get_dupes() gets a bit confused in terms of the numbering. I believe this is because get_dupes() uses dplyr::count(), which adds a count column called "nn" instead of "n", but get_dupes() is still using n as the count column.

For example, in this toy example student 102 does not have a duplicate for subject = 1, yet get_dupes() is giving three duplicates instead of 2.

library(tidyverse)
library(janitor)

students <- tibble::tribble(
                        ~student_number, ~grade, ~subject,
                                   100L,      7,        1,
                                   100L,      6,        1,
                                   102L,      7,        0,
                                   102L,      7,        1,
                                   102L,      8,        0,
                                   105L,      7,        0
                        ) %>%
     add_count(student_number)
 
get_dupes(students, student_number, subject)
#> Warning: package 'bindrcpp' was built under R version 3.3.3
#> # A tibble: 5 x 5
#>   student_number subject    nn grade dupe_count
#>            <int>   <dbl> <int> <dbl>      <int>
#> 1            100       1     2     7          2
#> 2            100       1     2     6          2
#> 3            102       0     2     7          3
#> 4            102       0     2     8          3
#> 5            102       1     1     6          3

The text was updated successfully, but these errors were encountered:

sfirke · 2018-01-09T02:26:39Z

Great bug report! Thank you. Providing the source of the problem (dplyr::count() producing nn) and a reprex sure made this easy to fix. I think this is fixed but let me know if not.

sfirke · 2018-01-09T02:28:00Z

(also I had not seen dplyr::add_count(), filing that one away)

sfirke closed this as completed in 28b402b Jan 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

eringrand commented Jan 8, 2018

sfirke commented Jan 9, 2018

sfirke commented Jan 9, 2018

get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

get_dupes has incorrect duplicates and dupe_count when there's a column called "n" in the data frame #162

Comments

eringrand commented Jan 8, 2018

sfirke commented Jan 9, 2018

sfirke commented Jan 9, 2018