You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously, my production environment is using data.table 1.11.4. After upgrading to version 1.12.0 (CRAN version), I find sometimes the non-ASCII strings cannot be matched... It's very very difficult to reproduce... However, finally get managed to this reproducible example...
Note, again, this only happens on Windows, only when the column being keyed is encoded in native encoding. What's strange is that I cannot reproduce it on Windows 10. It can be only reproduced on Windows 7 (succeeded on 2 computers so it should not be an issue related to my computer).
Moreover, at first, it only occurs when options(stringsAsFactors = FALSE) being set... However, it's unrelated in the below example code.
I'll try to debug and fix it...
library(data.table)
x<-'借:Cash|借:损益类-交易费用|借:损益类-价差收入|借:损益类-公允价值变动损益|贷:资产类-公允价值变动|贷:资产类-成本'v<- c(
x,
"借:Cash|借:损益类-交易费用|借:损益类-价差收入|借:损益类-公允价值变动损益|贷:资产类-公允价值变动|贷:资产类-应计利息|贷:资产类-成本",
"借:Cash|借:损益类-利息收入|贷:资产类-成本", "借:Cash|借:损益类-利息收入|贷:资产类-成本|贷:资产类-应计利息",
"借:Cash|借:损益类-利息收入|贷:资产类-成本|贷:资产类-应计利息|贷:资产类-折溢价",
"借:Cash|借:损益类-利息收入|贷:资产类-成本|贷:资产类-应计利息|借:资产类-折溢价",
"借:Cash|借:损益类-利息收入|贷:资产类-应计利息", "借:Cash|借:损益类-利息收入|贷:资产类-应计利息|贷:资产类-成本",
"借:Cash|借:损益类-利息收入|借:权益类-资本公积|贷:资产类-公允价值变动|贷:资产类-应计利息|贷:资产类-成本"
)
tmp<- data.table(a=v, b=1, key='a')
print(tmp[J(x), b]) # returns NA on 1.12.0 and dev; returns 1 on 1.11.4
print(tmp[, b[v==x]]) # always return 1
The text was updated successfully, but these errors were encountered:
Smallest reprex I can have (the orders are different, again, only reproducible on Windows7 with Chinese as the default language). Hopefully, I can have some time this week to settle this down.
The example only works when the threads are larger than 1. In other words, it only happens without setDTthreads(1). Anyway, the original example works for all cases.
I'm pretty sure the following line leads to the bug. At the time, s may not be UTF-8 encoded and result in a different ustr_maxlen, which is then used in cradix_r().
Previously, my production environment is using data.table 1.11.4. After upgrading to version 1.12.0 (CRAN version), I find sometimes the non-ASCII strings cannot be matched... It's very very difficult to reproduce... However, finally get managed to this reproducible example...
Note, again, this only happens on Windows, only when the column being keyed is encoded in native encoding. What's strange is that I cannot reproduce it on Windows 10. It can be only reproduced on Windows 7 (succeeded on 2 computers so it should not be an issue related to my computer).
Moreover, at first, it only occurs when
options(stringsAsFactors = FALSE)
being set... However, it's unrelated in the below example code.I'll try to debug and fix it...
The text was updated successfully, but these errors were encountered: