Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue during knitting on Windows #1944

Closed
cderv opened this issue Jan 14, 2021 · 4 comments
Closed

Encoding issue during knitting on Windows #1944

cderv opened this issue Jan 14, 2021 · 4 comments
Labels
bug Bugs

Comments

@cderv
Copy link
Collaborator

cderv commented Jan 14, 2021

Initial Context

This was reported by @thomasp85 while working on fonts support for graphic design.

Here are the initial issues
---
title: "Untitled"
author: "C. Dervieux"
date: "13/01/2021"
output: html_document
---

```{r setup}
library(ggplot2)
preview_devices <- function(p, width = 2, height = 1) {
  quartz_file <- fs::path(knitr::fig_path(),  "windows.png")
  if (!dir.exists(dirname(quartz_file))) dir.create(dirname(quartz_file), recursive = TRUE)
  png(quartz_file, width, height, units = 'in', res = 300, type = "windows")
  plot(
    p + 
      ggtitle("  Windows device") + 
      theme(plot.title = element_text(size = 10, hjust = 0.5), plot.title.position = 'plot')
  )
  dev.off()
  cairo_file <- fs::path(knitr::fig_path(),  "cairo.png")
  png(cairo_file, width, height, units = 'in', res = 300, type = "cairo")
  plot(
    p + 
      ggtitle("  Cairo device") + 
      theme(plot.title = element_text(size = 10, hjust = 0.5), plot.title.position = 'plot')
  )
  dev.off()
  ragg_file <- fs::path(knitr::fig_path(),  "ragg.png")
  ragg::agg_png(ragg_file, width, height, units = 'in', res = 300)
  plot(
    p + 
      ggtitle("  Ragg device") + 
      theme(plot.title = element_text(size = 10, hjust = 0.5), plot.title.position = 'plot')
  )
  dev.off()
  list(quartz = quartz_file, cairo = cairo_file, ragg = ragg_file)
}
```

* * *

## Support of non-latin scripts

A device should recognise and properly handle scripts that flows in a different
direction the left-to-right

- The graphic engine in R does not permit devices to handle vertical text 😞

```{r, fig.show='hold'}
hebrew_text <- "זהו טקסט בעברית"
arabic_text <- "هذا نص باللغة العربية"
Encoding(arabic_text)
p <- ggplot() + 
  geom_text(aes(x = 0, y = 1:2, label = c(arabic_text, hebrew_text)), family = "Arial") + 
  expand_limits(y = c(0, 3)) +
  theme_void() + 
  theme(panel.background = element_rect('gray90', 'white', 3))
files <- preview_devices(p)
knitr::include_graphics(files$quartz)
knitr::include_graphics(files$cairo)
knitr::include_graphics(files$ragg)
```

leading to this graphs when executed in the IDE in R console
image
but this in the knitted document
image

Issues with encoding in knitr

Using a test.Rmd file encoded in UTF-8 with this content

```{r text}
hebrew_text <- "זהו טקסט בעברית"
Encoding(hebrew_text)
arabic_text <- "هذا نص باللغة العربية"
Encoding(arabic_text)
```

```{r}
hebrew_text
arabic_text
```

it will lead to a different result in the IDE when executing chunk (which leads to code being executed in the R console)
image

than when knitted knitr::knit("test.Rmd") resulting in

```r
hebrew_text <- "זהו טקסט בעברית"
Encoding(hebrew_text)
```

```
## [1] "unknown"
```

```r
arabic_text <- "هذا نص باللغة العربية"
Encoding(arabic_text)
```

```
## [1] "unknown"
```


```r
hebrew_text
```

```
## [1] "<U+05D6><U+05D4><U+05D5> <U+05D8><U+05E7><U+05E1><U+05D8> <U+05D1><U+05E2><U+05D1><U+05E8><U+05D9><U+05EA>"
```

```r
arabic_text
```

```
## [1] "<U+0647><U+0630><U+0627> <U+0646><U+0635> <U+0628><U+0627><U+0644><U+0644><U+063A><U+0629> <U+0627><U+0644><U+0639><U+0631><U+0628><U+064A><U+0629>"
```

It seems that there are some conversions happening during the evaluation process that leads to incorrect support of those UTF-8 strings

@yihui you may already know about this limitations regarding encoding. Are we missing something ?

@cderv cderv added the bug Bugs label Jan 14, 2021
@cderv
Copy link
Collaborator Author

cderv commented Jan 14, 2021

And I think this is known and related to old encoding issues : r-lib/evaluate#59 and r-lib/evaluate#66

There was a very close one that had a fix but it seems it did not fix this completly r-lib/evaluate#74

@yihui
Copy link
Owner

yihui commented Jan 24, 2021

Yes, it's a known issue: r-lib/evaluate#59. Unfortunately, there's nothing we could do about it, except waiting for the UTF-8 build of R is officially available: https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html

Actually there could a workaround but it depends on if your Windows supports the locale. That is, you can call Sys.setlocale(, "LANGUAGE") in .Rprofile. In this case, I don't know what the language name is since I know nothing about Hebrew. I've tried other languages like Chinese, German, and French, etc.

@yihui
Copy link
Owner

yihui commented Mar 24, 2022

R 4.2.0 is coming in about a month: https://developer.r-project.org I guess the current R-devel already works: https://cloud.r-project.org/bin/windows/base/rdevel.html If not, we can reopen this issue and investigate further.

@yihui yihui closed this as completed Mar 24, 2022
@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs
Projects
None yet
Development

No branches or pull requests

2 participants