Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix glue encoding #81

Merged
merged 8 commits into from
Feb 12, 2018
Merged

Fix glue encoding #81

merged 8 commits into from
Feb 12, 2018

Conversation

dpprdan
Copy link
Contributor

@dpprdan dpprdan commented Feb 9, 2018

glue() does not always return UTF-8 encoded strings

a <- "ä"
Encoding(glue("{a}"))
#> [1] "latin1"

"That is strange, because there is even a test for that", you might think. Well, that test does not actually test for identical encodings, because identical() and by extension expect_identical() do not care about encodings.

x <- "fa\xE7ile"
Encoding(x) <- "latin1"
identical(x, enc2utf8(x))
#> [1] TRUE

So I fixed the test by explicitly testing for Encoding() and added enc2utf8() to both glue() and collapse().

@dpprdan
Copy link
Contributor Author

dpprdan commented Feb 9, 2018

from help(identical):
"Character strings are regarded as identical if they are in different marked encodings but would agree when translated to UTF-8."

FYI I stumbled upon this, while trying to construct a SQL query with glue_sql and RPostgreSQL complains when the query is not encoded as UTF-8.

@jimhester
Copy link
Collaborator

Thanks, you are correct. I think a slightly more robust way to handle this is to add enc2utf8() to as_glue.character

diff --git a/R/glue.R b/R/glue.R
index 472f079..f9ab287 100644
--- a/R/glue.R
+++ b/R/glue.R
@@ -227,7 +227,7 @@ as_glue.glue <- function(x, ...) {
 #' @export
 as_glue.character <- function(x, ...) {
   class(x) <- c("glue", "character")
-  x
+  enc2utf8(x)
 }

Also I would prefer leaving the current expectations as they are and adding an additional expectation explicitly testing for "UTF-8", e.g. expect_equal(Encoding(glue("{x}")), "UTF-8")

@dpprdan
Copy link
Contributor Author

dpprdan commented Feb 10, 2018

Excellent ideas. Actually I had thought about as_glue but wasn't sure about side effects I might not be aware of.

A bit off-topic: When I run devtools::test() I get errors unless I do set the language to English, i.e. Sys.setenv(LANGUAGE = "en"). By default it is set to German, but then some messages are not what they are expected to be. Do you know if this is the intended behaviour (i.e. users should be aware of this) or should I file an issue?

@jimhester
Copy link
Collaborator

Thanks, looks good now!

CRAN always runs tests in English locale, and unfortunately there isn't a robust way to write non-brittle tests that respect the users' non-english locale.

@jimhester jimhester merged commit 3511d30 into tidyverse:master Feb 12, 2018
@dpprdan
Copy link
Contributor Author

dpprdan commented Feb 12, 2018

👍
I assumed that English locale is the default. I was just wondering whether testthat should always Sys.setenv(LANGUAGE = "en") automatically.

@jimhester
Copy link
Collaborator

We experimented with that, but there are some cases where a given string is only translated once, even if you later switch encodings, so even doing that is not enough. R CMD check should always run code in the C locale IIRC, so doing that is one way to ensure the tests pass regardless of local language.

@dpprdan
Copy link
Contributor Author

dpprdan commented Feb 12, 2018

Good to know, thanks!!

@dpprdan dpprdan deleted the fix_glue_encoding branch May 18, 2022 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants