-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
median and mean within pmap do not respect na.rm = TRUE when first element is NA #790
Comments
@dhslone df %>%
slice(1:2) %>%
mutate(pm_med = pmap_dbl(list(a, b, c), median, na.rm = TRUE),
pm_mean = pmap_dbl(list(a, b, c), mean, na.rm = TRUE),
pm_sum = pmap_dbl(list(a, b, c), sum, na.rm = TRUE))
#> # A tibble: 2 x 6
#> a b c pm_med pm_mean pm_sum
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 1 1 6
#> 2 4 5 6 4 4 15 You can see above that the I think users need to be careful where to put the # Median
df %>% mutate(ap_med = apply(select(df,c(a, b, c)), 1, median, na.rm = TRUE),
pm_med = pmap_dbl(list(a, b, c), median, na.rm = TRUE),
pm_med.1 = pmap_dbl(list(a, b, c), median, na.rm = FALSE),
pm_med.2 = pmap_dbl(list(a, b, c), ~median(c(...))),
pm_med.3 = pmap_dbl(list(a, b, c), ~median(c(...)), na.rm = TRUE),
pm_med.4 = pmap_dbl(list(a, b, c), ~median(c(...), na.rm = TRUE)),
pm_med.5 = pmap_dbl(list(a, b, c), ~median(c(...), na.rm = TRUE), na.rm = TRUE))
#> # A tibble: 5 x 10
#> a b c ap_med pm_med pm_med.1 pm_med.2 pm_med.3 pm_med.4 pm_med.5
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 1 1 2 1.5 2 1.5
#> 2 4 5 6 5 4 4 5 4.5 5 4.5
#> 3 NA 2 NA 2 NA NA NA NA 2 1.5
#> 4 2 NA NA 2 2 2 NA NA 2 1.5
#> 5 NA NA NA NA NA NA NA NA NA 1
# Mean
df %>% mutate(ap_mean = apply(select(df,c(a, b, c)), 1, mean, na.rm = TRUE),
#pm_mean = pmap_dbl(list(a, b, c), mean, na.rm = TRUE), # Doesn't work unlike median
#pm_mean.1 = pmap_dbl(list(a, b, c), mean, na.rm = FALSE), # Doesn't work unlike median
pm_mean.2 = pmap_dbl(list(a, b, c), ~mean(c(...))),
pm_mean.3 = pmap_dbl(list(a, b, c), ~mean(c(...)), na.rm = TRUE),
pm_mean.4 = pmap_dbl(list(a, b, c), ~mean(c(...), na.rm = TRUE)),
pm_mean.5 = pmap_dbl(list(a, b, c), ~mean(c(...), na.rm = TRUE), na.rm = TRUE))
#> # A tibble: 5 x 8
#> a b c ap_mean pm_mean.2 pm_mean.3 pm_mean.4 pm_mean.5
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 2 1.75 2 1.75
#> 2 4 5 6 5 5 4 5 4
#> 3 NA 2 NA 2 NA NA 2 1.5
#> 4 2 NA NA 2 NA NA 2 1.5
#> 5 NA NA NA NaN NA NA NaN 1
# Sum
df %>% mutate(ap_sum = apply(select(df,c(a, b, c)), 1, sum, na.rm = TRUE),
pm_sum = pmap_dbl(list(a, b, c), sum, na.rm = TRUE),
pm_sum.1 = pmap_dbl(list(a, b, c), sum, na.rm = FALSE),
pm_sum.2 = pmap_dbl(list(a, b, c), ~sum(c(...))),
pm_sum.3 = pmap_dbl(list(a, b, c), ~sum(c(...)), na.rm = TRUE),
pm_sum.4 = pmap_dbl(list(a, b, c), ~sum(c(...), na.rm = TRUE)),
pm_sum.5 = pmap_dbl(list(a, b, c), ~sum(c(...), na.rm = TRUE), na.rm = TRUE))
#> # A tibble: 5 x 10
#> a b c ap_sum pm_sum pm_sum.1 pm_sum.2 pm_sum.3 pm_sum.4 pm_sum.5
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 6 6 6 6 7 6 7
#> 2 4 5 6 15 15 15 15 16 15 16
#> 3 NA 2 NA 2 2 NA NA NA 2 3
#> 4 2 NA NA 2 2 NA NA NA 2 3
#> 5 NA NA NA 0 0 NA NA NA 0 1
I feel this needs to be properly highlighted in the documentation with examples where it can go wrong. Created on 2021-03-06 by the reprex package (v1.0.0) |
I was so focused on the NA behavior that I did not notice the other problems! I have been using the more verbose ~ formulation as much as possible, and you are reinforcing that. I prefer things to break rather than silently give an unexpected result. |
Yes, this is an unfortunate problem with median: median(1, 2, 3, 4)
#> [1] 1 Created on 2022-08-24 by the reprex package (v2.0.1) |
I searched github issues and this may be related to #751
When passing na.rm = TRUE to mean or median within pmap, if the first element is NA then the result is NA. Using apply shows the expected behavior. sum, max, min all agree between pmap and apply
The text was updated successfully, but these errors were encountered: