replace_na.data.frame() should be able to handle single values #1108

DanChaltiel · 2021-03-13T19:55:14Z

When using replace_na() on a dataframe, you have to use the columns names, as using a single value will raise an error:

library(tidyr)
x=tibble(x=1:5,y=rep("huh",5))
x[1:2,]=NA
replace_na(x, "Missing")
#> Error in replace_na.data.frame(x, "Missing"): is_list(replace) is not TRUE
replace_na(x, list(x="Missing", y="Missing"))
#> # A tibble: 5 x 2
#>   x       y      
#>   <chr>   <chr>  
#> 1 Missing Missing
#> 2 Missing Missing
#> 3 3       huh    
#> 4 4       huh    
#> 5 5       huh

^{Created on 2021-03-13 by the reprex package (v1.0.0)}

However, this is quite troublesome when you don't know the names of the dataframe beforehand.

Moreover, I see no obvious place where replace_na(x, "Missing") would be error-prone, so I do not understand why replace has to be a list in this case.

Would this behavior be considered as a feature?

The text was updated successfully, but these errors were encountered:

hadley · 2021-08-23T16:55:05Z

I think we're mostly moving away from functions like this in favour of vectorised functions + across, e.g.

library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = 1:5, y = rep("huh",5))
df[1:2,] <- NA

df %>% mutate(across(where(is.character), coalesce, "Missing"))
#> # A tibble: 5 × 2
#>       x y      
#>   <int> <chr>  
#> 1    NA Missing
#> 2    NA Missing
#> 3     3 huh    
#> 4     4 huh    
#> 5     5 huh

^{Created on 2021-08-23 by the reprex package (v2.0.0)}

This isn't to say that replace_na() is going away, just that I'm leery of adding new features to it at the moment.

hadley · 2021-08-23T17:24:19Z

Hmmmm, but we have this api in pivot_longer()/pivot_wider() (e.g. #1111) so we should implement it consistently.

DanChaltiel · 2021-08-24T11:10:17Z

I'm leery of adding new features to it at the moment.

On second thought, this might look more like correcting a bug than adding a feature.

IMHO, the fact that replace_na() has a method for data frames is a syntactic sugar that, although unnecessary, comes often quite handy.

To me, the example you provided is a lot longer and more verbous than the simple call to replace_na(), without being much clearer or tidier.

But of course replace_na.data.frame() could call across() internally.

DavisVaughan · 2021-11-18T21:56:32Z

Note to self: Remember to update complete()'s fill documentation, and add a test there too

DavisVaughan · 2021-11-18T22:17:51Z

Actually, do we really want this? This feels like a step in the wrong direction. With the replace_na() updates in #1219, the original example doesn't work anymore because "Missing" can't be converted to integer (i.e. the type of x).

library(tidyr)

df <- tibble(
  x = 1:5, 
  y = rep("huh",5)
)

replace_na(df, list(x = "Missing", y = "Missing"))
#> Error: Can't convert `replace$x` <character> to match type of `data$x` <integer>.

# So this wouldn't work!
# replace_na(df, "Missing")

Implementing this would mean that every column in df would have to be the same type (because the 1 replace value would be applied to every column), which seems extremely unlikely for the typical analysis.

It would also make code harder to read. i.e. is replace_na(x, 1) replacing values in a vector x or a data frame x?

DanChaltiel · 2021-11-22T09:05:56Z

Well, you are right, this would have to cast every column to the target class.

While it is definitely not as elegant as I initially thought, I don't think this is error-prone and I would still use this feature (for what it's worth).

Moreover, warnings could be implemented in case of multiple classes (e.g. Warning, every column has been coerced to character.).

hadley · 2021-12-03T16:47:22Z

Yeah, based on @DavisVaughan's further exploration I've gone back to my original position and I don't think it's worth changing replace_na().

hadley closed this as completed Aug 23, 2021

hadley reopened this Aug 23, 2021

hadley added feature a feature request or enhancement missing values 💀 labels Aug 23, 2021

hadley mentioned this issue Aug 23, 2021

make fill argument in complete work similarly to value_fill in pivot: assume everything() if a single value is passed #1111

Closed

DavisVaughan added the ask label Nov 18, 2021

hadley closed this as completed Dec 3, 2021

DavisVaughan mentioned this issue Apr 13, 2022

complete(fill) should accept single value #1351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace_na.data.frame() should be able to handle single values #1108

replace_na.data.frame() should be able to handle single values #1108

DanChaltiel commented Mar 13, 2021 •

edited

Loading

hadley commented Aug 23, 2021

hadley commented Aug 23, 2021

DanChaltiel commented Aug 24, 2021 •

edited

Loading

DavisVaughan commented Nov 18, 2021

DavisVaughan commented Nov 18, 2021 •

edited

Loading

DanChaltiel commented Nov 22, 2021

hadley commented Dec 3, 2021

replace_na.data.frame() should be able to handle single values #1108

replace_na.data.frame() should be able to handle single values #1108

Comments

DanChaltiel commented Mar 13, 2021 • edited Loading

hadley commented Aug 23, 2021

hadley commented Aug 23, 2021

DanChaltiel commented Aug 24, 2021 • edited Loading

DavisVaughan commented Nov 18, 2021

DavisVaughan commented Nov 18, 2021 • edited Loading

DanChaltiel commented Nov 22, 2021

hadley commented Dec 3, 2021

DanChaltiel commented Mar 13, 2021 •

edited

Loading

DanChaltiel commented Aug 24, 2021 •

edited

Loading

DavisVaughan commented Nov 18, 2021 •

edited

Loading