Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update replace_na() to utilize vctrs #1219

Merged
merged 3 commits into from
Nov 18, 2021

Conversation

DavisVaughan
Copy link
Member

Revival of #1109
Closes #1168
Closes #912 (by mentioning that you should use across() in the NEWS bullet)

This PR updates replace_na() to utilize vctrs through vec_equal_na() and vec_assign(). This has two breaking changes:

  • vec_assign() always casts the RHS to the LHS, but previously with [<- the reverse could be true. I think this is a step in the right direction

    • This means you can't do replace = 1.5 on an integer vector
    • More practically, it means for lists you can't do replace = "foo", it must be replace = list("foo")
  • vec_equal_na() is now used for detecting missing values. For list-cols, it will only detect NULL as missing, while the current implementation will replace NULL and empty atomics like integer(). Again, I think this is a net positive.

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Nov 12, 2021

Caught a few broken revdeps from this. I still think it is worth pushing forward with this change, sending PRs as required, because these look like real coding errors that we could help fix, and 7 PRs / issues isn't that many. Thoughts @hadley?

(It is especially worth moving forward with this if we also upgrade fill() and drop_na(), as it would be weird to upgrade some but not all to vctrs)

Summary:

  • 13 revdeps affected
  • 8 unique issues. Generally related to incorrectly using either replace_na(<dbl>, replace = <chr>) or the reverse, replace_na(<chr>, replace = <dbl>). This is stricter now through usage of vec_assign() over [<-.
  • 2 unknown issues

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do the work on the revdeps, I think it's reasonable to fix this.

Prior to release, we should also spend a little time running down the "failures to check" and see if we can align more closely to CRAN.

col_arg <- col_args[[i]]
value_arg <- value_args[[i]]

check_replacement(value, col_arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? Or can we rely on vec_assign() to apply the recycling rules appropriately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this, just because of the way replace_na() works.

Otherwise the replace value would have to be either size 1 or the same size as sum(is.na(data)), and that is almost never the case because you don't know how many missing values there are ahead of time.

x <- c(1, 2, NA, 3, NA)

# This works, of course
replace_na(x, 0)
#> [1] 1 2 0 3 0

# This would actually work too if we removed the restriction
# (because there are two missing values)
# (I don't think we want this to work)
replace_na(x, c(0, -1))
#> [1] 1 2 0 3 -1

# This would not work
# (because replace_na() wants to replace with something of length 2)
y <- c(1, 2, 3, 4, 5)
replace_na(x, y)
# Can't recycle `replace` (size 5) to size 2.

@DavisVaughan DavisVaughan force-pushed the feature/replace-na-refresh branch 2 times, most recently from d181a0d to 04493b5 Compare November 18, 2021 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replace_na() will replace any empty input replace_na lacks tidyselect specification
2 participants