Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against escapes in strings #12

Closed
MilesMcBain opened this issue Apr 8, 2022 · 6 comments
Closed

Protect against escapes in strings #12

MilesMcBain opened this issue Apr 8, 2022 · 6 comments

Comments

@MilesMcBain
Copy link
Owner

I saw a string with this sequence crash paint: 23\xbfC

@MilesMcBain
Copy link
Owner Author

Some reprex material:

foo <- "23\xbfC"
crayon::col_nchar(foo)
## Error in base::nchar(strip_style(x), ...): invalid multibyte string, element 1
Encoding(foo) <- "latin1"
crayon::col_nchar(foo)
## [1] 4
df <- data.frame(
    bad = "23\xbfC"
  )

paint(df)
## Error in base::nchar(strip_style(x), ...): invalid multibyte string, element 1
Encoding(df$bad) <- "latin1"

Idea: option to set a default encoding to be used in the event of an
exception?

@MilesMcBain
Copy link
Owner Author

From Dan Wilson, validEnc("23\xbfC")
From @joelnitta, stringi::stri_trans_general.

I think these two together might solve the problem

  library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
df <- data.frame(
    bad = "23\xbfC"
  )

df %>%
  mutate(
    good = case_when(
      !validEnc(bad) ~ stringi::stri_trans_general(bad, "latin-ascii")
    )
  ) %>%
  select(good) %>%
  paint::paint()
#> data.frame [1, 1]
#> good chr 23�C

Created on 2022-10-04 with reprex v2.0.2

@danwwilson
Copy link

The other aspect to think about might be how to identify data that has been transformed to allow printing. Maybe invert background/foreground colours?

@MilesMcBain
Copy link
Owner Author

I like this idea @danwwilson!

@MilesMcBain
Copy link
Owner Author

Just hit this one "2015 President and Vice-Chancellor\x92s Alumni Scholarship Appeal" 🤪

@danwwilson
Copy link

another option is to try enc2utf8()

> enc2utf8("2015 President and Vice-Chancellor\x92s Alumni Scholarship Appeal")
[1] "2015 President and Vice-Chancellor<92>s Alumni Scholarship Appeal"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants