Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing encoding of character representation of units #183

Closed
edzer opened this issue Dec 21, 2018 · 35 comments
Closed

missing encoding of character representation of units #183

edzer opened this issue Dec 21, 2018 · 35 comments

Comments

@edzer
Copy link
Member

edzer commented Dec 21, 2018

> as.character(units(set_units(1, degree)))
[1] "°"
> Encoding(as.character(units(set_units(1, degree))))
[1] "unknown"

This leads to problems, e.g. reported in r-spatial/sf#931

Something tells me here is the right place to solve it.

@dpprdan
Copy link

dpprdan commented Dec 21, 2018

Agree. I think I hunted the problem in {sf} down to

units::as_units("arc_degree")
#> 1 [°]
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language EN                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2018-12-21                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                     
#>  assertthat    0.2.0      2017-04-11 [1] CRAN (R 3.5.1)             
#>  backports     1.1.3      2018-12-14 [1] CRAN (R 3.5.1)             
#>  callr         3.1.0      2018-12-10 [1] CRAN (R 3.5.1)             
#>  cli           1.0.1.9000 2018-10-25 [1] Github (r-lib/cli@56538e3) 
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)             
#>  desc          1.2.0      2018-10-25 [1] Github (r-lib/desc@7c12d36)
#>  devtools      2.0.1      2018-10-26 [1] CRAN (R 3.5.1)             
#>  digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.1)             
#>  evaluate      0.12       2018-10-09 [1] CRAN (R 3.5.1)             
#>  fs            1.2.6      2018-08-23 [1] CRAN (R 3.5.1)             
#>  glue          1.3.0      2018-07-17 [1] CRAN (R 3.5.1)             
#>  highr         0.7        2018-06-09 [1] CRAN (R 3.5.1)             
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.1)             
#>  knitr         1.21       2018-12-10 [1] CRAN (R 3.5.1)             
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)             
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.1)             
#>  pkgbuild      1.0.2      2018-10-16 [1] CRAN (R 3.5.1)             
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.1)             
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)             
#>  processx      3.2.1      2018-12-05 [1] CRAN (R 3.5.1)             
#>  ps            1.2.1      2018-11-06 [1] CRAN (R 3.5.1)             
#>  R6            2.3.0      2018-10-04 [1] CRAN (R 3.5.1)             
#>  Rcpp          1.0.0      2018-11-07 [1] CRAN (R 3.5.1)             
#>  remotes       2.0.2      2018-10-30 [1] CRAN (R 3.5.1)             
#>  rlang         0.3.0.1    2018-10-25 [1] CRAN (R 3.5.1)             
#>  rmarkdown     1.11       2018-12-08 [1] CRAN (R 3.5.1)             
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)             
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.1)             
#>  stringi       1.2.4      2018-07-20 [1] CRAN (R 3.5.1)             
#>  stringr       1.3.1      2018-05-10 [1] CRAN (R 3.5.1)             
#>  testthat      2.0.1      2018-10-13 [1] CRAN (R 3.5.1)             
#>  units         0.6-2      2018-12-05 [1] CRAN (R 3.5.1)             
#>  usethis       1.4.0      2018-08-14 [1] CRAN (R 3.5.1)             
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)             
#>  xfun          0.4        2018-10-23 [1] CRAN (R 3.5.1)             
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)             
#> 
#> [1] D:/Users/Daniel/Documents/R/win-library/3.5
#> [2] C:/Program Files/R/R-3.5.1/library

@edzer
Copy link
Member Author

edzer commented Dec 21, 2018

I just pushed changes to a branch here called "encoding", here; could you pls install it & test this, and sf?

@edzer
Copy link
Member Author

edzer commented Dec 21, 2018

(Interesting check error on windows: https://ci.appveyor.com/project/edzer/units/builds/21173766)

@dpprdan
Copy link

dpprdan commented Dec 21, 2018

Looking good!

units::as_units("arc_degree")
#> 1 [°]
x_tbl <- tibble::tibble(place = "Münster", x = 7.625808, y = 51.96311)
sf::st_as_sf(x_tbl, coords = c("x", "y"), crs = 4326) 
#> Simple feature collection with 1 feature and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.625808 ymin: 51.96311 xmax: 7.625808 ymax: 51.96311
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> # A tibble: 1 x 2
#>   place              geometry
#>   <chr>           <POINT [°]>
#> 1 Münster (7.625808 51.96311)
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language EN                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2018-12-21                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib
#>  assertthat    0.2.0      2017-04-11 [1]
#>  backports     1.1.3      2018-12-14 [1]
#>  callr         3.1.0      2018-12-10 [1]
#>  class         7.3-14     2015-08-30 [2]
#>  classInt      0.3-1      2018-12-18 [1]
#>  cli           1.0.1.9000 2018-10-25 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  DBI           1.0.0      2018-05-02 [1]
#>  desc          1.2.0      2018-10-25 [1]
#>  devtools      2.0.1      2018-10-26 [1]
#>  digest        0.6.18     2018-10-10 [1]
#>  e1071         1.7-0      2018-07-28 [1]
#>  evaluate      0.12       2018-10-09 [1]
#>  fansi         0.4.0      2018-10-05 [1]
#>  fs            1.2.6      2018-08-23 [1]
#>  glue          1.3.0      2018-07-17 [1]
#>  highr         0.7        2018-06-09 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  knitr         1.21       2018-12-10 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pillar        1.3.1      2018-12-15 [1]
#>  pkgbuild      1.0.2      2018-10-16 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.2.1      2018-12-05 [1]
#>  ps            1.2.1      2018-11-06 [1]
#>  R6            2.3.0      2018-10-04 [1]
#>  Rcpp          1.0.0      2018-11-07 [1]
#>  remotes       2.0.2      2018-10-30 [1]
#>  rlang         0.3.0.1    2018-10-25 [1]
#>  rmarkdown     1.11       2018-12-08 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  sf            0.7-3      2018-12-21 [1]
#>  stringi       1.2.4      2018-07-20 [1]
#>  stringr       1.3.1      2018-05-10 [1]
#>  testthat      2.0.1      2018-10-13 [1]
#>  tibble        1.4.2      2018-01-22 [1]
#>  units         0.6-3      2018-12-21 [1]
#>  usethis       1.4.0      2018-08-14 [1]
#>  utf8          1.1.4      2018-05-24 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.4        2018-10-23 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                             
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-lib/cli@56538e3)         
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-lib/desc@7c12d36)        
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-spatial/sf@83157d1)      
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-quantities/units@01584a4)
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#> 
#> [1] D:/Users/Daniel/Documents/R/win-library/3.5
#> [2] C:/Program Files/R/R-3.5.1/library

@Enchufa2
Copy link
Member

@edzer: see #73

@edzer
Copy link
Member Author

edzer commented Dec 21, 2018

Thanks - I knew there was something... nice job for the start of the new year!

edzer added a commit that referenced this issue Jan 19, 2019
@edzer
Copy link
Member Author

edzer commented Jan 19, 2019

Not doing this still gives the AppVeyor error: https://ci.appveyor.com/project/edzer/units/builds/21736774

edzer added a commit that referenced this issue Jan 19, 2019
edzer added a commit that referenced this issue Jan 19, 2019
@Enchufa2
Copy link
Member

BTW, what was the problem with this commit? The AppVeyor build was ok, right? Isn't it enough to add ud_set_encoding(encoding) to .onLoad where encoding="latin1" when the platform is Windows?

@Enchufa2
Copy link
Member

See a1981e3 in branch encoding2 and AppVeyor build.

@Enchufa2
Copy link
Member

Better d228f48, which sets ascii if multi-byte character encoding is not set and defaults to utf8.

@edzer edzer closed this as completed in d913b62 Jan 19, 2019
@edzer
Copy link
Member Author

edzer commented Feb 15, 2019

I still don't get why this fails.

@Enchufa2
Copy link
Member

Maybe it's a bug in R, because this works ok (note the quotes around the unit):

LC_CTYPE=en_US.iso88591 Rscript -e 'units::set_units(1:10, "μm")'

@edzer
Copy link
Member Author

edzer commented Feb 17, 2019

That worked indeed: see here. Thanks!

@edzer
Copy link
Member Author

edzer commented Feb 17, 2019

Interestingly, even that approach doesn't work when used in examples: https://ci.appveyor.com/project/edzer/units/builds/22440023#L763

Do you feel like taking this up to r-pkg-devel or even r-devel?

@Enchufa2
Copy link
Member

Indeed, this is a quite strange, or at least inconsistent, behaviour. I wrote to r-devel.

@dpprdan
Copy link

dpprdan commented Feb 18, 2019

I am probably missing something really obvious here, but when I try to reproduce what you posted on r-devel, @Enchufa2, I get this error:

library(units)
#> udunits system database from C:/Users/daniel/Documents/.R/win-library/units/share/udunits
units::set_units(1:10, "µm")
#> Error: In 'µm', 'µm' is not recognized by udunits.
#> See a table of valid unit symbols and names with valid_udunits().
#> Add custom user-defined units with install_symbolic_unit().
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.2 (2018-12-20)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language en                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-02-18                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)
#>  backports     1.1.3   2018-12-14 [1] CRAN (R 3.5.1)
#>  callr         3.1.1   2018-12-21 [1] CRAN (R 3.5.2)
#>  cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)
#>  devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)
#>  digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)
#>  evaluate      0.13    2019-02-12 [1] CRAN (R 3.5.2)
#>  fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)
#>  glue          1.3.0   2018-07-17 [1] CRAN (R 3.5.1)
#>  highr         0.7     2018-06-09 [1] CRAN (R 3.5.1)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.5.1)
#>  knitr         1.21    2018-12-10 [1] CRAN (R 3.5.1)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)
#>  pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)
#>  processx      3.2.1   2018-12-05 [1] CRAN (R 3.5.1)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.5.2)
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.5.2)
#>  Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.1)
#>  remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)
#>  rlang         0.3.1   2019-01-08 [1] CRAN (R 3.5.2)
#>  rmarkdown     1.11    2018-12-08 [1] CRAN (R 3.5.1)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)
#>  stringi       1.3.1   2019-02-13 [1] CRAN (R 3.5.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.5.2)
#>  testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)
#>  units       * 0.6-2   2018-12-05 [1] CRAN (R 3.5.1)
#>  usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)
#>  xfun          0.4     2018-10-23 [1] CRAN (R 3.5.1)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.1)
#> 
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.5.2/library

@edzer
Copy link
Member Author

edzer commented Feb 18, 2019

It might be that this works only in the dev version; please try

units::set_units(1:10, "µm", mode = "standard")

otherwise.

@Enchufa2
Copy link
Member

It only works in the dev version indeed.

@dpprdan
Copy link

dpprdan commented Feb 18, 2019

Since I couldn't find a "dev" branch, I assume that you mean the master branch with "dev version". Here I get the same result:

units::set_units(1:10, "µm", mode = "standard")
#> Error: In 'µm', 'µm' is not recognized by udunits.
#> See a table of valid unit symbols and names with valid_udunits().
#> Add custom user-defined units with install_symbolic_unit().
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.2 (2018-12-20)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language EN                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-02-18                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib
#>  assertthat    0.2.0      2017-04-11 [1]
#>  backports     1.1.3      2018-12-14 [1]
#>  callr         3.1.1      2018-12-21 [1]
#>  cli           1.0.1.9000 2019-01-28 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  desc          1.2.0      2018-10-25 [1]
#>  devtools      2.0.1      2018-10-26 [1]
#>  digest        0.6.18     2018-10-10 [1]
#>  evaluate      0.12       2018-10-09 [1]
#>  fs            1.2.6      2018-08-23 [1]
#>  glue          1.3.0      2018-07-17 [1]
#>  highr         0.7        2018-06-09 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  knitr         1.21       2018-12-10 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pkgbuild      1.0.2      2018-10-16 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.2.1      2018-12-05 [1]
#>  ps            1.3.0      2018-12-21 [1]
#>  R6            2.3.0      2018-10-04 [1]
#>  Rcpp          1.0.0      2018-11-07 [1]
#>  remotes       2.0.2      2018-10-30 [1]
#>  rlang         0.3.1      2019-01-08 [1]
#>  rmarkdown     1.11       2018-12-08 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  stringi       1.2.4      2018-07-20 [1]
#>  stringr       1.3.1      2018-05-10 [1]
#>  testthat      2.0.1      2018-10-13 [1]
#>  units         0.6-3      2019-02-18 [1]
#>  usethis       1.4.0      2018-08-14 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.4        2018-10-23 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                             
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-lib/cli@94e2fc5)         
#>  CRAN (R 3.5.1)                     
#>  Github (r-lib/desc@7c12d36)        
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.2)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.2)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  Github (r-quantities/units@e21917e)
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#>  CRAN (R 3.5.1)                     
#> 
#> [1] D:/Users/Daniel/Documents/R/win-library/3.5
#> [2] C:/Program Files/R/R-3.5.2/library

It seems to me that the bug is caused by the fact that you never convert "µm" to UTF-8 (I've searched the repo for enc2utf8 and got 0 results). On Windows, this means that you are effectively passing a string in the native (non-UTF-8) encoding to the CPP function _units_R_ut_parse which seems to expect UTF-8.

units:::R_ut_parse("µm")
#> Error in units:::R_ut_parse("µm"): Error in function R_ut_parse: string unit representation contains syntax error
units:::R_ut_parse(enc2utf8("µm"))
#> <pointer: 0x0000000014623de0>

So when I add vars <- enc2utf8(vars) before this if statement for example, I get

units::set_units(1:10, "µm")
#> Units: [µm]
#>  [1]  1  2  3  4  5  6  7  8  9 10

units::set_units(1:10, µm) works, too.

edzer added a commit that referenced this issue Feb 19, 2019
edzer added a commit that referenced this issue Feb 19, 2019
@Enchufa2
Copy link
Member

@dpprdan Could you try, instead,

library(units)

set_units(1:10, "µm")

please? I think that running units::set_units(1:10, "µm") didn't set up the encoding of udunits2 properly.

@Enchufa2
Copy link
Member

But anyway, enc2utf8 makes no difference in the issue reported in r-devel. As shown there, the safe option, according also to the manual, is to completely avoid "µ" in the package.

@Enchufa2
Copy link
Member

Mmmmh... there's a bug here. I thought that latin-1 implied multi-byte character set, but in fact it doesn't (!). So the logic is wrong. Let me write a patch.

Enchufa2 added a commit that referenced this issue Feb 19, 2019
@Enchufa2
Copy link
Member

@dpprdan Please, reinstall from master and try again.

@dpprdan
Copy link

dpprdan commented Feb 19, 2019

@Enchufa2 same result with calling library first, before your patch. Note also that units::set_units(1:10, "µm") works with enc2utf8.

UPDATE: looking good:

library(units)
#> udunits system database from C:/Users/daniel/Documents/.R/win-library/units/share/udunits
set_units(1:10, "µm")
#> Units: [µm]
#>  [1]  1  2  3  4  5  6  7  8  9 10
Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.2 (2018-12-20)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language en                          
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2019-02-19                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version date       lib source                             
#>  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)                     
#>  backports     1.1.3   2018-12-14 [1] CRAN (R 3.5.1)                     
#>  callr         3.1.1   2018-12-21 [1] CRAN (R 3.5.2)                     
#>  cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)                     
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)                     
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)                     
#>  devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)                     
#>  digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)                     
#>  evaluate      0.13    2019-02-12 [1] CRAN (R 3.5.2)                     
#>  fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)                     
#>  glue          1.3.0   2018-07-17 [1] CRAN (R 3.5.1)                     
#>  highr         0.7     2018-06-09 [1] CRAN (R 3.5.1)                     
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.5.1)                     
#>  knitr         1.21    2018-12-10 [1] CRAN (R 3.5.1)                     
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)                     
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)                     
#>  pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)                     
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)                     
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)                     
#>  processx      3.2.1   2018-12-05 [1] CRAN (R 3.5.1)                     
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.5.2)                     
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.5.2)                     
#>  Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.1)                     
#>  remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)                     
#>  rlang         0.3.1   2019-01-08 [1] CRAN (R 3.5.2)                     
#>  rmarkdown     1.11    2018-12-08 [1] CRAN (R 3.5.1)                     
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)                     
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)                     
#>  stringi       1.3.1   2019-02-13 [1] CRAN (R 3.5.2)                     
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.5.2)                     
#>  testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)                     
#>  units       * 0.6-3   2019-02-19 [1] Github (r-quantities/units@fdb0ea4)
#>  usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)                     
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)                     
#>  xfun          0.4     2018-10-23 [1] CRAN (R 3.5.1)                     
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.1)                     
#> 
#> [1] C:/Users/daniel/Documents/.R/win-library
#> [2] C:/Program Files/R/R-3.5.2/library

SE works as well (I have to do this manually, because other packages have also encoding issues, and so do {reprex}/{knitr} in this particular example):

> set_units(1:10, µm)
Units:m]
 [1]  1  2  3  4  5  6  7  8  9 10

@dpprdan
Copy link

dpprdan commented Feb 19, 2019

Thinking about:

so do[es ...] {knitr} in this particular example

This essentially means that it won't be possible to render vignettes with the SE version (e.g. set_units(1:10, µm)) on Windows. Actually, one will not be able to use the SE functions in any {knitr}/{rmarkdown} documents on Windows, if I see this correctly. I think this may be one of the reasons why: yihui/knitr#1415 (Update: and this one: r-lib/evaluate#59).

So indeed, the safe(st) option is, as Tomas writes on r-devel, to rely on ASCII characters only.

@Enchufa2
Copy link
Member

Thanks, @dpprdan. As noted previously, having enc2utf8 and the like around would cause more trouble than it solves. Given that udunits2 is able to handle multiple encodings, it's easier and safer to just set the proper encoding in .onLoad and let things be. But there was a flaw in my initial patch.

It's a pity that we cannot have an example with micrometers though.

@dpprdan
Copy link

dpprdan commented Feb 19, 2019

Just a minor comment:

In the error message I got above, it says:

#> Error: In 'µm', 'µm' is not recognized by udunits.
#> See a table of valid unit symbols and names with valid_udunits().

It is my understanding that 'µm' is a valid symbol (and it just did not work due to said flaw in the initial patch). However, 'µm' does not appear in valid_udunits().

library(units)
#> udunits system database from C:/Users/daniel/Documents/.R/win-library/units/share/udunits
vunits <- valid_udunits()
#> udunits system database read from C:/Users/daniel/Documents/.R/win-library/units/share/udunits
subset(vunits, name_singular == 'micron', select = symbol:name_singular)
#>    symbol symbol_aliases name_singular
#> 94                              micron
micrm <- enc2utf8('µm')
micrm %in% vunits[['symbol']]
#> [1] FALSE

@edzer
Copy link
Member Author

edzer commented Feb 19, 2019

Saw that too. Could you please try:

library(units)
ud_units[["µm"]]
# 1 [µm]

I guess that's because this comes from a binary object (ud_units) with character symbols that have no encoding specified; they get interpreted in the local encoding.

@Enchufa2
Copy link
Member

udunits2 handles prefixes separately; i.e., in "µm", "µ" is the prefix and "m" is the symbol. That's why prefixes are not present in the symbol table.

@dpprdan
Copy link

dpprdan commented Feb 19, 2019

@edzer Your example works.

Re prefix vs symbol. IMHO that nuance does not carry through in

#> See a table of valid unit symbols and names with valid_udunits().

Would it make sense to add a column to the output valid_udunits() that shows the valid inputs to the value argument of e.g. set_units? When I first saw this, I thought that set_units checks the value argument against the valid_udunits()[["symbol"]] vector, and if it is not in there, it is not a valid input to value.

@Enchufa2
Copy link
Member

I don't think so. From ?valid_udunits:

Description:

     The returned dataframe is constructed at runtime by reading the
     xml database that powers unit conversion in [package:udunits2].

[...]

Details:

     Any entry listed under ‘symbol’ , ‘symbol_aliases’ , ‘
     name_singular’ , ‘name_singular_aliases’ , ‘name_plural’ , or
     ‘name_plural_aliases’ is valid. Additionally, any entry under
     ‘symbol’ or ‘symbol_aliases’ may can also contain a valid prefix,
     as specified by ‘valid_udunits_prefixes()’ .

     Note, this is primarily intended for interactive use, the exact
     format of the returned dataframe may change in the future.

@dpprdan
Copy link

dpprdan commented Feb 19, 2019

@Enchufa2 TBH, I didn't RTFM. So you're right, if I had and with what I know now about prefixes and symbols it makes perfect sense.

Actually it may indeed be an edge case because there is a dedicated entry for "micron" (or "1e-6 m" in the def column), and it does not have a symbol.

I'd still maintain, though, that the error message says nothing about prefixes. Maybe this is a compromise?

#> For valid unit symbols and names see valid_udunits(). 
#> For valid symbol prefixes see valid_udunits_prefixes().

Whatever you decide is fine with me, though.

@Enchufa2
Copy link
Member

@Enchufa2 TBH, I didn't RTFM. So you're right, if I had and with what I know now about prefixes and symbols it makes perfect sense.

Don't worry; been there too more times than I would be willing to admit. :D

Actually it may indeed be an edge case because there is a dedicated entry for "micron" (or "1e-6 m" in the def column), and it does not have a symbol.

udunits2 accepts both symbols and names. E.g.,

set_units(1, metre)
#> 1 [m]

And there are edge cases all around the place, because the authors recognise that there are some conventions in some fields that should be covered. One of them, apparently, is the "micron".

I'd still maintain, though, that the error message says nothing about prefixes. Maybe this is a compromise?

#> For valid unit symbols and names see valid_udunits(). 
#> For valid symbol prefixes see valid_udunits_prefixes().

Whatever you decide is fine with me, though.

Fair enough. We may want to modify the error message with something, e.g., like the following (if we want to be comprehensive):

Error: In ‘asdfasdf’, ‘asdfasdf’ is not recognized by udunits.

See a table of valid unit symbols and names with valid_udunits().
Custom user-defined units can be added with install_symbolic_unit().

See a table of valid unit prefixes with valid_udunits_prefixes().
Prefixes will automatically work with any user-defined unit.

Problem with this: it's quite long, and people don't read, that's a fact. What do you think, @edzer?

@edzer
Copy link
Member Author

edzer commented Feb 19, 2019

Good idea. People don't read documentation, but they do read error messages, especially helpful ones.

@Enchufa2
Copy link
Member

Great, changed in master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants