Encoding issues on windows #73

t-kalinowski · 2017-11-09T15:19:49Z

Posting as an image since GH seems to mangle it differently.

Not sure the best way to fix this. Seems like the easiest fix is for upstream udunits2 to return strings with the correct Encoding.

The text was updated successfully, but these errors were encountered:

edzer · 2018-06-08T12:41:00Z

Did you have a look how this is now, with the udunits branch (soon master)?

t-kalinowski · 2018-06-09T13:02:04Z

So, as I mentioned in another post, I don't have easy access to a windows box these days. However, this issue also manifested on mac as well.

It seems that parts of it got fixed during the transition, but not completely. What remains to be done is to pull through udunits's encoding information (i.e., what was last passed to units:::ud_set_encoding()) and assign it to the character vector with Encoding<-() before returning to the user. Also, there are some minor differences between how udunits likes it's specification string vs how R likes it that we should patch over ( "UTF-8" vs "utf8"). A simple switch statement should probably do the trick.

Here is a screenshot of how this look on the mac currently (ths is with master branch from a few minutes ago):

Enchufa2 · 2018-06-09T13:48:59Z

I don't think so, because character vectors don't seem to have any encoding by default:

y <- "dummy"
Encoding(y)
#> [1] "unknown"

Instead, I think that units should simply set udunits's encoding appropriately according to the current session (I don't know if there's a better way to get the encoding than utils::localeToCharset()). That means, AFAIK, UTF-8 for Unix systems and latin1 for Windows.

t-kalinowski · 2018-06-09T14:40:29Z

> Encoding("μ")
[1] "UTF-8"
> Encoding("abc")
[1] "unknown"

R drops the encoding information for ascii only vectors, and only retains the utf-8 encoding marker for strings if necessary

t-kalinowski · 2018-06-09T14:42:18Z

Also, setting "UTF-8" encoding on ascii only vectors is safe

> x <- "abc"
> Encoding(x) <- "UTF-8"
> x
[1] "abc"
> Encoding(x)
[1] "unknown"

Enchufa2 · 2018-06-09T14:44:40Z

You are right. Then, I would do both things: 1) set the proper encoding according to the locale and 2) apply Encoding on every string returned from udunits.

t-kalinowski · 2018-06-09T14:45:32Z

I agree with both your points.

t-kalinowski mentioned this issue Nov 9, 2017

Encoding issues on windows pacificclimate/Rudunits2#18

Open

Enchufa2 mentioned this issue Dec 21, 2018

missing encoding of character representation of units #183

Closed

Enchufa2 mentioned this issue Jan 19, 2019

Fix #183: set native encoding on load #185

Merged

edzer closed this as completed in #185 Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issues on windows #73

Encoding issues on windows #73

t-kalinowski commented Nov 9, 2017

edzer commented Jun 8, 2018

t-kalinowski commented Jun 9, 2018

Enchufa2 commented Jun 9, 2018

t-kalinowski commented Jun 9, 2018 •

edited

Loading

t-kalinowski commented Jun 9, 2018

Enchufa2 commented Jun 9, 2018

t-kalinowski commented Jun 9, 2018

Encoding issues on windows #73

Encoding issues on windows #73

Comments

t-kalinowski commented Nov 9, 2017

edzer commented Jun 8, 2018

t-kalinowski commented Jun 9, 2018

Enchufa2 commented Jun 9, 2018

t-kalinowski commented Jun 9, 2018 • edited Loading

t-kalinowski commented Jun 9, 2018

Enchufa2 commented Jun 9, 2018

t-kalinowski commented Jun 9, 2018

t-kalinowski commented Jun 9, 2018 •

edited

Loading