Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Windows UTF-8 version of R #269

Closed
SpecterShell opened this issue Apr 6, 2021 · 30 comments
Closed

Add support for Windows UTF-8 version of R #269

SpecterShell opened this issue Apr 6, 2021 · 30 comments

Comments

@SpecterShell
Copy link

An experimental build of R using UTF-8 as native encoding on Windows has been released recently, and here is the instruction: https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html

I'm on Windows 10 20H2 and has set $LANG to zh_CN.UTF-8. After setting the codepage to 65001(UTF-8), the encoding in Rterm successfully switches to UTF-8, but the encoding in radian remains unchanged.

image

In most of the time the terminal and the graph can still handle CJK characters properly. But sometimes some messy codes show, like the red text in the screenshot.

I thought this issue came from Python itself, so I set $PYTHONUTF8 to 1 to force Python to use UTF-8, and ran python -m radian instead of radian. However it doesn't work for radian. I also tried to directly change the encoding in radian, and then it crashed.

@randy3k
Copy link
Owner

randy3k commented Apr 6, 2021

These are the relevant lines for decoding and encoding: https://github.com/randy3k/rchitect/blob/cb6e77c860ef180bd7f47416c94185fb9db04370/rchitect/utils.py#L142

Is there any prebuilt binary? The steps seem a bit involved. It seems that there is.

@SpecterShell
Copy link
Author

SpecterShell commented Apr 6, 2021

https://www.r-project.org/nosvn/winutf8/ucrt3/
And R-devel-win-80146-4517-4499.exe(or newer) is the installer of the prebuilt binary.

I just found that running demo(recursion) can also reproduce the issue while R is waiting for user's response. So there is no need to prepare building environment. Just install and test.
image

@randy3k
Copy link
Owner

randy3k commented Apr 6, 2021

Does something like Sys.setlocale(locale = "English_United States.utf8") fix it?

@randy3k
Copy link
Owner

randy3k commented Apr 6, 2021

I have to set this
Screen Shot 2021-04-06 at 2 51 47 PM
to change the default locale of python applications.

But then, things are still not working well in radian:

r$> x = "床前明月光"

r$> print(x)
[1] "�����"

r$> Sys.getlocale()
[1] "LC_COLLATE=English_United States.utf8;LC_CTYPE=English_United States.utf8;LC_MONETARY=English_United States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8"

r$> l10n_info()
$MBCS
[1] TRUE

$`UTF-8`
[1] TRUE

$`Latin-1`
[1] FALSE

$codepage
[1] 65001

$system.codepage
[1] 65001

@randy3k
Copy link
Owner

randy3k commented Apr 6, 2021

randy3k/rchitect@7c0d72a should fix the garbled text issue. However, it requires checking the beta UTF-8 setting in Region settings.

@randy3k
Copy link
Owner

randy3k commented Apr 7, 2021

There is some discussion on enabling UTF-8 codepage on Windows, but it won't happen soon. So for now, the unicode setting in regional settings is the only way to enable UTF-8 on radian.

@SpecterShell
Copy link
Author

In my opinion this issue occurs when radian calls R, system tends to use system-wide encoding settings instead of one set in the console, making R think that the current codepage is not UTF-8.
Setting the unicode setting may fix this problem by making all the applications use Unicode, but will make non-Unicode applications and even system corrupted.
Therefore, before there is a better way to solve this, I would use English as default in radian or rollback R to stable version.

@randy3k
Copy link
Owner

randy3k commented Apr 7, 2021

We have to wait until python supports Unicode codepage on Windows.

See https://groups.google.com/g/python-ideas/c/iPgyhq3_zyI/m/uGXrunPWAQAJ

@SpecterShell
Copy link
Author

UTF-8 has become the default encoding now. 🤔
https://cran.r-project.org/bin/windows/base/NEWS.R-4.2.0.html

@SpecterShell
Copy link
Author

The bare R term uses English by default utill the LANG is set. However, radian (or rchitect) always use system locale, resulting in garbled text.
A temporary workaround is to set LANG to en_US.UTF-8 at startup.
image

@randy3k
Copy link
Owner

randy3k commented Apr 25, 2022

I guess we'd need to force utf8 encoding for newer version of R.

@randy3k
Copy link
Owner

randy3k commented May 3, 2022

Hi @SpecterShell, would it work if you set the variable LANG using Sys.setenv(LANG = "en_US.UTF-8")?

@SpecterShell
Copy link
Author

Works
image

@randy3k
Copy link
Owner

randy3k commented May 4, 2022

Then the solution for us is simpler, we could just set it if it is unset.

@randy3k
Copy link
Owner

randy3k commented May 4, 2022

@SpecterShell
Could you test if f016d46 fixes it.

@SpecterShell
Copy link
Author

Fixed now. Thank you for the fix!
image

@randy3k
Copy link
Owner

randy3k commented May 5, 2022

Great, radian 0.6.2 is on the way.

@Yunuuuu
Copy link

Yunuuuu commented Jun 29, 2022

Only set LANG environment is not sufficient to make this work well, I must set locale manually by adding following code in my .Rprofile.

            # Set locale to utf8
            is_uft8_support <- grepl(
                "UTF-8|utf8", Sys.getenv("LANG"),
                ignore.case = TRUE, perl = TRUE
            ) &&
                R.version$major >= 4L &&
                R.version$minor >= 2.0
            if (is_uft8_support) {
                suppressWarnings(Sys.setlocale("LC_ALL", Sys.getenv("LANG")))
            }

This is my radian version:
image

when use raw radian without setting above code in my .Rprofile, the R sessionInfo and l10n_info indicate UTF-8 encoding isn't work. When we read file with special character using data.table::fread, some characters won't be parsed correctly
image
After setting locale, this works fine, and when we read file with special character using data.table::fread, special characters can be parsed successfully.
image

@Yunuuuu
Copy link

Yunuuuu commented Jun 29, 2022

RStudio has setted appropriate UTF-8 locale for R 4.2
image

@kristy-hu
Copy link

Where is the .Rprofile in vscode? I am new to R, and have little idea as to where to configure it... Thanks in advance!

@SpecterShell
Copy link
Author

SpecterShell commented Aug 6, 2022

Where is the .Rprofile in vscode? I am new to R, and have little idea as to where to configure it... Thanks in advance!

%USERPROFILE%\Documents\.Rprofile

@kristy-hu
Copy link

kristy-hu commented Aug 6, 2022

Hi @SpecterShell, there was no .Rprofile under my document folder.
However, when I typed ~/.Rprofile in R, it just asked me to create a new file. And after I pasted the code, the UTF-8 was turned on. And the problem was sort of solved.

Yet I have found 2 things quite smiliar to .Rprofile in R's directory:

  1. Rprofile under dir R\R-4.2.1\library\base\R
  2. Rprofile.site under dir R\R-4.2.1\etc

Is it possible to configure any of these to make the UTF-8 global setting? Thanks!

@SpecterShell
Copy link
Author

You can create one if it doesn't exist.

@kristy-hu
Copy link

You can create one if it doesn't exist.

OK, thanks!

@kristy-hu
Copy link

I think perhaps the best solution for this is adding Sys.setenv(LANG = "en_US.UTF-8") directly to .Rprofile, and the garbled text will never appear again. Now everything just works fine in English. The Chinese translation is actually a bit misleading...

@kongdd
Copy link

kongdd commented Sep 1, 2022

After set the local by Sys.setlocale("LC_ALL", "Chinese (Simplified)_China.utf8"), get the following warnings when using devtools::load_all()

There were 15 warnings (use warnings() to see them)

r$> warnings()
Warning messages:
1: In Sys.setlocale("LC_CTYPE", ctype) :
  using locale code page other than 936 may cause problems

@beansrowning
Copy link

beansrowning commented Oct 20, 2022

Not sure if I'm in the right place here, but R.4.2 + radian seems to be entirely non-functional in my case. No changing of env vars seems to change that.

Radian fails on start:

System setup

radian --version
radian version: 0.6.4
r executable: C:\Users\<>\AppData\Local\Programs\R\R-4.2.1\bin\R
r version: 4.2.1
python executable: C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\python.exe
python version: 3.10.8

Stack trace

radian
Traceback (most recent call last):
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\Scripts\radian-script.py", line 33, in <module>
    sys.exit(load_entry_point('radian==0.6.4', 'console_scripts', 'radian')())
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\app.py", line 108, in main
    RadianApplication(r_home, ver=__version__).run(options, cleanup=cleanup)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\app.py", line 217, in run
    self.session = create_radian_prompt_session(options, settings)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\prompt_session.py", line 153, in create_radian_prompt_session
    session = RadianPromptSession(
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\prompt.py", line 72, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\prompt_toolkit\shortcuts\prompt.py", line 473, in __init__
    self.default_buffer = self._create_default_buffer()
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\prompt.py", line 163, in _create_default_buffer
    return ModalBuffer(
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\buffer.py", line 185, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\buffer.py", line 22, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\prompt_toolkit\buffer.py", line 313, in __init__
    self.reset(document=document)
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\buffer.py", line 264, in reset
    self._reset_history()
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\buffer.py", line 257, in _reset_history
    for m, item in self.history.load():
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\history.py", line 9, in load
    self._loaded_strings = list(self.load_history_strings())
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\site-packages\radian\lineedit\history.py", line 68, in load_history_strings
    backup = f.readlines()
  File "C:\Users\<>\.pyenv\pyenv-win\versions\3.10.8\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2546: character maps to <undefined>

EDIT:

The culprit is here:

with open(self.filename, "r+") as f:

Trying to load ~/.radian_history without specifying utf-8 encoding. Deleting my prior history file seems to resolve in this case

@randy3k
Copy link
Owner

randy3k commented Oct 20, 2022

Good catch, we should make sure that the history file is in utf-8.

@psychelzh
Copy link

Any clue on why the first character of the package {targets} output is not as expect?

image

Should be this instead:

image

@kebuAAA
Copy link

kebuAAA commented Oct 11, 2023

where is the history file? I meet the same UnicodeDecodeError while this happens in the radian/rutils.pyfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants