Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode symbols in ggplot fail to render, but only in markdown #2256

Open
steveharoz opened this issue Dec 7, 2021 · 7 comments
Open

Unicode symbols in ggplot fail to render, but only in markdown #2256

steveharoz opened this issue Dec 7, 2021 · 7 comments
Labels
bug an unexpected problem or unintended behavior theme: knitr concerns knitr package

Comments

@steveharoz
Copy link
Contributor

In an RMD file knitted to HTML, unicode symbols in ggplot show up incorrectly.

Here's an example:

---
title: "R Notebook"
output: 
  html_document: 
    dev: ragg_png
---

Unicode text in the RMD works fine: arrows 🠜🠞

```{r}
library(ggplot2)
ggplot(mtcars) +
  aes(x=hp, y=mpg) +
  geom_point() +
  labs(y = "Arrows 🠜 🠞")
````

The axis label is mangled:
image

But if I copy the code for that plot into an R file, it works as expected:

library(ggplot2)
ragg::agg_png("delete.png", 1000, 1000, scaling = 3)
ggplot(mtcars) +
  aes(x=hp, y=mpg) +
  geom_point() +
  labs(y = "Arrows 🠜 🠞")
dev.off()

delete


> xfun::session_info('rmarkdown')
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043), RStudio 2021.9.0.351

Locale: LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

Package version:
  base64enc_0.1.3 digest_0.6.28   evaluate_0.14   fastmap_1.1.0   glue_1.5.1      graphics_4.1.2  grDevices_4.1.2 highr_0.9       htmltools_0.5.2 jquerylib_0.1.4 jsonlite_1.7.2  knitr_1.36     
  magrittr_2.0.1  methods_4.1.2   rlang_0.4.12    rmarkdown_2.11  stats_4.1.2     stringi_1.7.6   stringr_1.4.0   tinytex_0.35    tools_4.1.2     utils_4.1.2     xfun_0.27       yaml_2.2.1     

Pandoc version: 2.14.0.3
@cderv
Copy link
Collaborator

cderv commented Dec 7, 2021

Thanks for the report. I believe this is something to do with knitr and maybe the ragg_png device.

Taking your example and just knitting knitr::knit() will reproduce the issue.
It will create a .md file and a png file in figure/unnamed-chunk-1-1.png that looks like that.
unnamed-chunk-1-1

I think there were other report of this somewhere I need to link to cross reference

@cderv cderv added theme: knitr concerns knitr package bug an unexpected problem or unintended behavior labels Dec 7, 2021
@steveharoz
Copy link
Contributor Author

Thanks for confirming. FYI: I get the same problem with other devices like dev: svglite, which is why I posted here instead of ragg.

@yihui
Copy link
Member

yihui commented Dec 7, 2021

That's weird. When the device is ragg_png, knitr should use ragg::agg_png() to record the plot: https://github.com/yihui/knitr/blob/198ffcc40317035be00a323d35e9d0a7c5605bb9/R/block.R#L374-L377 And then redraw the plot to a file: https://github.com/yihui/knitr/blob/198ffcc40317035be00a323d35e9d0a7c5605bb9/R/plot.R#L156

I can't see your Unicode symbols on macOS (not even in plain text). They are probably only available on Windows. @cderv If you want to debug it, you may see if the recorded plot can be correctly replayed in the second agg_png() device:

ragg::agg_png("f1.png")
dev.control(displaylist = 'enable')
# draw the plot, then
x = recordPlot()
dev.off()

ragg::agg_png("f2.png")
x
dev.off()

@cderv
Copy link
Collaborator

cderv commented Dec 8, 2021

Thanks for the help @yihui

Yes doing this works as expected - both files contains the character

ragg::agg_png("f1.png")
dev.control(displaylist = 'enable')

library(ggplot2)
ggplot(mtcars) +
  aes(x=hp, y=mpg) +
  geom_point() +
  labs(y = "Arrows 🠜 🠞")

x = recordPlot()
dev.off()

ragg::agg_png("f2.png")
x
dev.off()

I can't reproduce the issue on Linux either so this must be specific character from Windows.

I am surprise that something is happening in the process. Maybe related to encoding as Windows is non UTF-8 native yet.
I have not found what is the issue yet

@cderv
Copy link
Collaborator

cderv commented Dec 8, 2021

I tried to look more closer inside the element.

During the knit process, the recorded plot is in res[[3]]

str(res[[3]], 1)
List of 3
 $ :Dotted pair list of 3
 $ : raw [1:35992] 00 00 00 00 ...
  ..- attr(*, "pkgName")= chr "graphics"
 $ :List of 2
  ..- attr(*, "pkgName")= chr "grid"
 - attr(*, "engineVersion")= int 14
 - attr(*, "pid")= int 42956
 - attr(*, "Rversion")=Classes 'R_system_version', 'package_version', 'numeric_version'  hidden list of 1
 - attr(*, "load")= chr(0) 
 - attr(*, "attach")= chr(0) 
 - attr(*, "class")= chr "recordedplot"

and within this object, there is the label which is not correctly encoded

  .. .. .. .. .. ..$ :List of 7
  .. .. .. .. .. .. ..$ widths       :List of 3
  .. .. .. .. .. .. .. ..$ :Classes 'unit', 'unit_v2'  hidden list of 1
  .. .. .. .. .. .. .. .. ..$ :List of 3
  .. .. .. .. .. .. .. .. .. ..$ : num 0
  .. .. .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. .. .. .. ..$ : int 8
  .. .. .. .. .. .. .. ..$ :Classes 'unit', 'unit_v2'  hidden list of 1
  .. .. .. .. .. .. .. .. ..$ :List of 3
  .. .. .. .. .. .. .. .. .. ..$ : num 1
  .. .. .. .. .. .. .. .. .. ..$ :List of 2
  .. .. .. .. .. .. .. .. .. .. ..$ :Classes 'unit', 'unit_v2'  hidden list of 1
  .. .. .. .. .. .. .. .. .. .. .. ..$ :List of 3
  .. .. .. .. .. .. .. .. .. .. .. .. ..$ : num 1
  .. .. .. .. .. .. .. .. .. .. .. .. ..$ :List of 11
  .. .. .. .. .. .. .. .. .. .. .. .. .. ..$ label        : chr "Arrows <U+0001F81C> <U+0001F81E>"

@yihui could it be again related to other encoding issue in evaluate: r-lib/evaluate#66 and r-lib/evaluate#59, linked to yihui/knitr#1944 ?

@cderv
Copy link
Collaborator

cderv commented Dec 8, 2021

I believe this could indeed be the issue.

I did the test with the current R-devel UCRT (https://developer.r-project.org/WindowsBuilds/winutf8/ucrt3/howto.html) and rendering the document works properly.

We were waiting for that regarding the issue in the evaluate package and this is coming soon: the support for UTF-8 by default on Windows in R will be for next version R 4.2 and this is coming in the regular R-devel version Next monday
https://developer.r-project.org/Blog/public/2021/12/07/upcoming-changes-in-r-4.2-on-windows/

So I believe this is an issue that will resolve itself at next R version.

@yihui
Copy link
Member

yihui commented Dec 8, 2021

Yes, I'm excited to learn that the UTF-8 support is finally going into R 4.2! I've been waiting for this for years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior theme: knitr concerns knitr package
Projects
None yet
Development

No branches or pull requests

3 participants