Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.data.table.xts(foo) gives wrong index values when 'x' is in the column names. #4897

Closed
emilsjoerup opened this issue Feb 11, 2021 · 1 comment · Fixed by #4898
Closed
Assignees
Labels
Milestone

Comments

@emilsjoerup
Copy link

I have stumbled across what I believe to be a bug in as.data.table.xts(foo) where the index values, which should correspond to timestamps for the observations sometimes come out simply as the row numbers. I have looked on this repository and SO, but I found nothing on this topic.

From my experimentation this occurs when a column in foo has the name "x", it seems that the number of columns do not affect this bug, and the order of the columns do not change the result either.

I have written a little example that should sufficiently show the expected and the misbehavior and in which cases these occur.

Restarting R session...

> ## Pre-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3   
> library(xts)
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

> library(data.table)
data.table 1.13.7 IN DEVELOPMENT built 2021-02-11 11:21:19 UTC using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com

Attaching package: ‘data.table’

The following objects are masked from ‘package:xts’:

    first, last

> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
                  index  AAPL  MSFT
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(b)
    index     x
    <int> <int>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10
> as.data.table(c)
    index     x     y
    <int> <int> <int>
 1:     1     1   101
 2:     2     2   102
 3:     3     3   103
 4:     4     4   104
 5:     5     5   105
 6:     6     6   106
 7:     7     7   107
 8:     8     8   108
 9:     9     9   109
10:    10    10   110
> as.data.table(d)
    index     y     x
    <int> <int> <int>
 1:     1     1   101
 2:     2     2   102
 3:     3     3   103
 4:     4     4   104
 5:     5     5   105
 6:     6     6   106
 7:     7     7   107
 8:     8     8   108
 9:     9     9   109
10:    10    10   110
> as.data.table(e)
                  index     y     z
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> 
> 
> 
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
                  index    V1
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10
> colnames(x) <- "x"
> as.data.table(x)
    index     x
    <int> <int>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

I have implemented a simple fix by just using set() instead of "[.data.table" to assign the index value to the output data table in as.data.table.xts(). The output after the fix is:

Restarting R session...

> library(data.table)
> ## Post-fix
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> library(xts)
> library(data.table)
> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_DK.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.7 xts_0.12.1        zoo_1.8-8        

loaded via a namespace (and not attached):
[1] compiler_4.0.3  tools_4.0.3     grid_4.0.3      lattice_0.20-41
> a <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("AAPL", "MSFT")))
> b <- xts(cbind(1:10), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
> c <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x", "y")))
> d <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "x")))
> e <- xts(cbind(1:10, 101:110), as.POSIXct(101:110, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("y", "z")))
> as.data.table(a)
                  index  AAPL  MSFT
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(b)
                  index     x
                 <POSc> <int>
 1: 1970-01-01 00:01:41     1
 2: 1970-01-01 00:01:42     2
 3: 1970-01-01 00:01:43     3
 4: 1970-01-01 00:01:44     4
 5: 1970-01-01 00:01:45     5
 6: 1970-01-01 00:01:46     6
 7: 1970-01-01 00:01:47     7
 8: 1970-01-01 00:01:48     8
 9: 1970-01-01 00:01:49     9
10: 1970-01-01 00:01:50    10
> as.data.table(c)
                  index     x     y
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(d)
                  index     y     x
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> as.data.table(e)
                  index     y     z
                 <POSc> <int> <int>
 1: 1970-01-01 00:01:41     1   101
 2: 1970-01-01 00:01:42     2   102
 3: 1970-01-01 00:01:43     3   103
 4: 1970-01-01 00:01:44     4   104
 5: 1970-01-01 00:01:45     5   105
 6: 1970-01-01 00:01:46     6   106
 7: 1970-01-01 00:01:47     7   107
 8: 1970-01-01 00:01:48     8   108
 9: 1970-01-01 00:01:49     9   109
10: 1970-01-01 00:01:50    10   110
> 
> 
> 
> x <- xts(1:10, as.POSIXct(1:10, origin = "1970-01-01", tz = "UTC"))
> as.data.table(x)
                  index    V1
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10
> colnames(x) <- "x"
> as.data.table(x)
                  index     x
                 <POSc> <int>
 1: 1970-01-01 00:00:01     1
 2: 1970-01-01 00:00:02     2
 3: 1970-01-01 00:00:03     3
 4: 1970-01-01 00:00:04     4
 5: 1970-01-01 00:00:05     5
 6: 1970-01-01 00:00:06     6
 7: 1970-01-01 00:00:07     7
 8: 1970-01-01 00:00:08     8
 9: 1970-01-01 00:00:09     9
10: 1970-01-01 00:00:10    10

As shown it now gives the correct output.

This is just a work-around. I can't quite figure out what actually causes this behavior, but I think it has something to do with line 1298 in data.table.R , but I am not 'into' the code enough to be sure.
If I change the jsub from zoo::index(x) to return(x) using the str2lang() function, I get an integer 1:10, I would expect to get the original xts object. I think this is somewhat related to point 2.13 in the FAQ, but I feel this is more of a bug. Otherwise, somewhere this behavior should be documented?

I hope I have provided enough information to be helpful. If the workaround is deemed acceptable, I can create a PR with the fix and a test or two.

@jangorecki
Copy link
Member

Thank you for reporting.

library(xts)
library(data.table)
b = xts(cbind(1:3), as.POSIXct(101:103, origin = "1970-01-01", tz = "UTC"), dimnames = list(NULL, c("x")))
b
#                    x
#1970-01-01 00:01:41 1
#1970-01-01 00:01:42 2
#1970-01-01 00:01:43 3
as.data.table(b)
#   index     x
#   <int> <int>
#1:     1     1
#2:     2     2
#3:     3     3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants