Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace_na.data.frame() should be able to handle single values #1108

Closed
DanChaltiel opened this issue Mar 13, 2021 · 7 comments
Closed

replace_na.data.frame() should be able to handle single values #1108

DanChaltiel opened this issue Mar 13, 2021 · 7 comments
Labels
ask :bowtie: feature a feature request or enhancement missing values 💀

Comments

@DanChaltiel
Copy link

DanChaltiel commented Mar 13, 2021

When using replace_na() on a dataframe, you have to use the columns names, as using a single value will raise an error:

library(tidyr)
x=tibble(x=1:5,y=rep("huh",5))
x[1:2,]=NA
replace_na(x, "Missing")
#> Error in replace_na.data.frame(x, "Missing"): is_list(replace) is not TRUE
replace_na(x, list(x="Missing", y="Missing"))
#> # A tibble: 5 x 2
#>   x       y      
#>   <chr>   <chr>  
#> 1 Missing Missing
#> 2 Missing Missing
#> 3 3       huh    
#> 4 4       huh    
#> 5 5       huh

Created on 2021-03-13 by the reprex package (v1.0.0)

However, this is quite troublesome when you don't know the names of the dataframe beforehand.

Moreover, I see no obvious place where replace_na(x, "Missing") would be error-prone, so I do not understand why replace has to be a list in this case.

Would this behavior be considered as a feature?

@hadley
Copy link
Member

hadley commented Aug 23, 2021

I think we're mostly moving away from functions like this in favour of vectorised functions + across, e.g.

library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = 1:5, y = rep("huh",5))
df[1:2,] <- NA

df %>% mutate(across(where(is.character), coalesce, "Missing"))
#> # A tibble: 5 × 2
#>       x y      
#>   <int> <chr>  
#> 1    NA Missing
#> 2    NA Missing
#> 3     3 huh    
#> 4     4 huh    
#> 5     5 huh

Created on 2021-08-23 by the reprex package (v2.0.0)

This isn't to say that replace_na() is going away, just that I'm leery of adding new features to it at the moment.

@hadley hadley closed this as completed Aug 23, 2021
@hadley
Copy link
Member

hadley commented Aug 23, 2021

Hmmmm, but we have this api in pivot_longer()/pivot_wider() (e.g. #1111) so we should implement it consistently.

@DanChaltiel
Copy link
Author

DanChaltiel commented Aug 24, 2021

I'm leery of adding new features to it at the moment.

On second thought, this might look more like correcting a bug than adding a feature.

IMHO, the fact that replace_na() has a method for data frames is a syntactic sugar that, although unnecessary, comes often quite handy.

To me, the example you provided is a lot longer and more verbous than the simple call to replace_na(), without being much clearer or tidier.

But of course replace_na.data.frame() could call across() internally.

@DavisVaughan
Copy link
Member

Note to self: Remember to update complete()'s fill documentation, and add a test there too

@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 18, 2021

Actually, do we really want this? This feels like a step in the wrong direction. With the replace_na() updates in #1219, the original example doesn't work anymore because "Missing" can't be converted to integer (i.e. the type of x).

library(tidyr)

df <- tibble(
  x = 1:5, 
  y = rep("huh",5)
)

replace_na(df, list(x = "Missing", y = "Missing"))
#> Error: Can't convert `replace$x` <character> to match type of `data$x` <integer>.

# So this wouldn't work!
# replace_na(df, "Missing")

Implementing this would mean that every column in df would have to be the same type (because the 1 replace value would be applied to every column), which seems extremely unlikely for the typical analysis.

It would also make code harder to read. i.e. is replace_na(x, 1) replacing values in a vector x or a data frame x?

@DanChaltiel
Copy link
Author

Well, you are right, this would have to cast every column to the target class.

While it is definitely not as elegant as I initially thought, I don't think this is error-prone and I would still use this feature (for what it's worth).

Moreover, warnings could be implemented in case of multiple classes (e.g. Warning, every column has been coerced to character.).

@hadley
Copy link
Member

hadley commented Dec 3, 2021

Yeah, based on @DavisVaughan's further exploration I've gone back to my original position and I don't think it's worth changing replace_na().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ask :bowtie: feature a feature request or enhancement missing values 💀
Projects
None yet
Development

No branches or pull requests

3 participants