Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different results between chunk and knitted markdown / Cyrillic letters #2091

Closed
3 tasks done
werkstattcodes opened this issue Jan 4, 2022 · 3 comments
Closed
3 tasks done
Labels
question Questions (which should belong to forums instead of Github)

Comments

@werkstattcodes
Copy link

werkstattcodes commented Jan 4, 2022

I encountered a surprising behavior when knitting an .Rmd document.
The result for stringr::str_count(txt, regex("обн\\.")) differs when executing only the chunk(s) or knitting the document.

When running only the chunk, the result of str_count is 1.
When I knit the document, the result is 0.
When I change the regex pattern to a term without Cyrillic letters, e.g. "decision" there is no difference, i.e. the results are identica.

The problem also appears when I try to create a reprex (with the reprex package). The result also becomes 0, although the console shows me 1 as the result.

I asked a pertaining question on RStudio Community without being able to solve the issue (link).

Here the code for the entire Rmd document:

---
title: "test"
author: ""
date: "1/3/2022"
output: html_document
---

```{r}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."
```

```{r}
stringr::str_count(txt, regex("обн\\."))
```

```{r}
stringr::str_count(txt, regex("decision"))

```

```{r}
xfun::session_info('rmarkdown')
```

Here the rmd output (copy-paste)

test
1/3/2022
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5.9000     v purrr   0.3.4     
## v tibble  3.1.6          v dplyr   1.0.7     
## v tidyr   1.1.4          v stringr 1.4.0     
## v readr   2.1.1          v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."
stringr::str_count(txt, regex("обн\\."))
## [1] 0
stringr::str_count(txt, regex("decision"))
## [1] 1
xfun::session_info('knitr')
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Locale:
##   LC_COLLATE=English_Austria.1252  LC_CTYPE=English_Austria.1252   
##   LC_MONETARY=English_Austria.1252 LC_NUMERIC=C                    
##   LC_TIME=English_Austria.1252    
## 
## Package version:
##   evaluate_0.14   glue_1.6.0      graphics_4.1.2  grDevices_4.1.2
##   highr_0.9       knitr_1.37.2    magrittr_2.0.1  methods_4.1.2  
##   stats_4.1.2     stringi_1.7.6   stringr_1.4.0   tools_4.1.2    
##   utils_4.1.2     xfun_0.29       yaml_2.2.1

By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.org/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@cderv
Copy link
Collaborator

cderv commented Jan 5, 2022

I believe this is another issue related to encoding and windows not being UTF-8 by default.

Next R version build on Windows will be UTF-8 - it is currently in the R devel version and I tried it. It is working as expected in that case.

image

So I think this is related to #1944 and r-lib/evaluate#59

Other recent similar issue: rstudio/rmarkdown#2256

So I think this will be solved at last with the next version of R !

You can try r-devel if you want to check : https://cran.r-project.org/bin/windows/base/rdevel.html

@werkstattcodes
Copy link
Author

Many thanks for the clarification!
I ran the code on rstudio cloud (which i assume doesn't run on windows), and the results were as expected.

@cderv cderv added the question Questions (which should belong to forums instead of Github) label Jan 10, 2022
@github-actions
Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Questions (which should belong to forums instead of Github)
Projects
None yet
Development

No branches or pull requests

2 participants