Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests 1590.3 & 1590.4 fail with data.table 1.11.2 on Windows 64 R 3.5.0 patched #2856

Closed
aadler opened this issue May 9, 2018 · 9 comments · Fixed by #2903
Closed

Tests 1590.3 & 1590.4 fail with data.table 1.11.2 on Windows 64 R 3.5.0 patched #2856

aadler opened this issue May 9, 2018 · 9 comments · Fixed by #2903
Milestone

Comments

@aadler
Copy link

aadler commented May 9, 2018

Package compiled from source using Rtool 3.5

# Minimal reproducible example

> test.data.table()
Running C:/R/RCurrent/R-3.5.0patched/library/data.table/tests/tests.Rraw 

**** Suggested package bit64 is not installed. Tests using it will be skipped.


**** Suggested package xts is not installed. Tests using it will be skipped.


**** Suggested package nanotime is not installed. Tests using it will be skipped.

Running test id 1590.3      Test 1590.3 ran without errors but failed check that x equals y:
> x = forderv(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = integer() 
First 6 of 0 (type 'integer'): integer(0)
Numeric: lengths (4, 0) differ
Running test id 1590.4      Test 1590.4 ran without errors but failed check that x equals y:
> x = base::order(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = 1:4 
First 6 of 4 (type 'integer'): [1] 1 2 3 4
Mean relative difference: 0.4444444
Running test id 1910           
10 longest running tests took 35s (38% of 91s)
      ID time nTest
 1: 1874 5.44     5
 2: 1223 5.36   728
 3: 1253 4.08   485
 4: 1739 3.15     5
 5: 1438 3.09   354
 6: 1648 3.07    45
 7: 1652 2.97    45
 8: 1650 2.87    45
 9: 1848 2.86     1
10: 1835 2.66     1
Error in eval(exprs[i], envir) : 
  2 errors out of 6930 in 00:01:31 on Wed May 09 10:24:14 2018. [endian==little, sizeof(long double)==16, sizeof(pointer)==8, TZ=America/New_York, locale='LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252']. Search inst/tests/tests.Rraw for test numbers: 1590.3, 1590.4.

# Output of sessionInfo()

> sessionInfo()
R version 3.5.0 Patched (2018-05-03 r74693)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19
@aadler
Copy link
Author

aadler commented May 9, 2018

Previous tests were on Windows Server 2012. Same issued on Win7 64bit.

> test.data.table()
Running C:/R/RCurrent/R-3.5.0/library/data.table/tests/tests.Rraw 

**** Suggested package bit64 is not installed. Tests using it will be skipped.


**** Suggested package xts is not installed. Tests using it will be skipped.


**** Suggested package nanotime is not installed. Tests using it will be skipped.

Running test id 1590.3      Test 1590.3 ran without errors but failed check that x equals y:
> x = forderv(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = integer() 
First 6 of 0 (type 'integer'): integer(0)
Numeric: lengths (4, 0) differ
Running test id 1590.4      Test 1590.4 ran without errors but failed check that x equals y:
> x = base::order(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = 1:4 
First 6 of 4 (type 'integer'): [1] 1 2 3 4
Mean relative difference: 0.4444444
Running test id 1910           
10 longest running tests took 42s (52% of 80s)
      ID  time nTest
 1: 1874 11.13     5
 2: 1835  8.55     1
 3: 1875  5.18     1
 4: 1223  3.71   728
 5: 1253  2.82   485
 6: 1739  2.60     5
 7: 1438  2.29   354
 8: 1848  2.23     1
 9:  895  1.97   165
10: 1648  1.88    45
Error in eval(exprs[i], envir) : 
  2 errors out of 6930 in 00:01:20 on Wed May 09 10:33:19 2018. [endian==little, sizeof(long double)==16, sizeof(pointer)==8, TZ=America/New_York, locale='LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252']. Search inst/tests/tests.Rraw for test numbers: 1590.3, 1590.4.
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19 

@jangorecki
Copy link
Member

Did you set different locale in R? Seems to be duplicate of #2771

@aadler
Copy link
Author

aadler commented May 9, 2018

I don't make any specific changes to my environments. What you see in my sessionInfo output is the "natural" locale. FWIW, 1.11.0 didn't have this problem on the same computer.

@jangorecki
Copy link
Member

jangorecki commented May 10, 2018

Could you retry on 1.11.2 and if possible also 1.11.0 using R --vanilla?
There were very few commits between 1.11.0 and 1.11.2, none of them related to that issue AFAIR.

@mattdowle mattdowle added this to the 1.11.4 milestone May 10, 2018
@mattdowle
Copy link
Member

mattdowle commented May 10, 2018

It is related to #2771 iiuc. My locale is en_US.UTF-8. @aadlers locale is English_United States.1252. I changed that test in PR #2813 (v1.11.0). Looks like both data.table::forderv and base::order, when passed a vector containing both unknown and Latin-1 encoded strings, are sensitive to locale. I didn't think there was any difference in this respect, between non-C locales, but looks like en_US.UTF-8 behaves differently to English_United States.1252 in this respect.
It's a fail for data.table, because as per those extensive comments I added in that PR, data.table ordering is not supposed to be sensitive to locale. And it isn't in general as the rest of the tests in that group are doing well. It's in this edge case of mixed unknown/Latin-1. The question now is, where is data.table::forder calling a function that reads locale?
The expanded test did its job and caught this, so that's something at least.

@aadler
Copy link
Author

aadler commented May 10, 2018

From the help in base::Comparisons

Character strings can be compared with different marked encodings (see Encoding): they are translated to UTF-8 before comparison.

Perhaps that is the issue, as in #2813, the code says "data.table is deliberately C-locale only"

@aadler
Copy link
Author

aadler commented May 10, 2018

Hi, @mattdowle.

In lines 721 of forder.c, when you define StrCmp, you call strcmp after converting the encodings. That should make sense, as per Cplusplus "This function performs a binary comparison of the characters. For a function that takes into account locale-specific rules, see strcoll."

However, since you're still having issues, perhaps not forcing the encoding and using strcoll is a possibility?

@ChuliangXiao
Copy link

> test.data.table()
Running C:/Users/xiao/Documents/R/win-library/3.3/data.table/tests/tests.Rraw 
Running test id 1590.3      Test 1590.3 ran without errors but failed check that x equals y:
> x = forderv(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = integer() 
First 6 of 0 (type 'integer'): integer(0)
Numeric: lengths (4, 0) differ
Running test id 1590.4      Test 1590.4 ran without errors but failed check that x equals y:
> x = base::order(c(x2, x1, x1, x2)) 
First 6 of 4 (type 'integer'): [1] 1 4 2 3
> y = 1:4 
First 6 of 4 (type 'integer'): [1] 1 2 3 4
Mean relative difference: 0.4444444
Running test id 1910           
10 longest running tests took 23s (44% of 53s)
      ID time nTest
 1: 1438 3.20   738
 2: 1874 2.76     5
 3: 1223 2.68   728
 4: 1650 2.64    91
 5: 1652 2.53    91
 6: 1648 2.48    91
 7: 1835 2.10     1
 8: 1253 2.08   485
 9: 1739 1.75     5
10:  895 1.56   165
Error in eval(expr, envir, enclos) : 
  2 errors out of 7707 in 53.8sec on Fri May 11 10:30:59 2018. [endian==little, sizeof(long double)==16, sizeof(pointer)==8, TZ=America/New_York, locale='LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252']. Search inst/tests/tests.Rraw for test numbers: 1590.3, 1590.4.

Having a similar issue with English_United States.1252

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nanotime_0.2.0    xts_0.10-2        zoo_1.8-1         bit64_0.9-7       bit_1.1-12        data.table_1.11.3

loaded via a namespace (and not attached):
[1] tools_3.3.0     RcppCCTZ_0.2.3  yaml_2.1.18     Rcpp_0.12.16    grid_3.3.0      lattice_0.20-35

@mattdowle
Copy link
Member

Aside: these tests pass on win-builder and on AppVeyor, because they aren't running there :( The logs show those test numbers aren't running.
Tests 1590.1-1590.4 run when ctype==collate && ctype!="C".
AppVeyor: LC_COLLATE=C; LC_CTYPE=English_United States.1252
win-builder: LC_COLLATE=C; LC_CTYPE=German_Germany.1252
So ctype!=collate on both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants