-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize xts' index so users can provide their own index class #190
Comments
The idea is a right direction, but realizing it will be painfully difficult. I quickly looked at C code and functions like do_merge_xts() or do_rbind_xts() use coredata and index. xts C types expects INTSXP or REALSXP, which most often correspond to Date or POSIXct in R, and using index and coredata in C gives xts its speed (rbind, merge). xtime is based on POSIXct (REALSXP), so it is correctly supported by xts (C). I guess these changes are supposed to allow to use nanotime in the index. I am afraid, that there will be no simple way to operate on an nanotime based index from C (xts), eg. how to distinguish nanotime from POSIXct? - both will be seen as double/REALSXP. |
The point of an API is that there should be no need to distinguish between any potential index class. xts shouldn't know or care the class of the index, as long as the class supplies either an |
@silentbits has a point though, at C level nanotime will look like a REALSXP. But in fact, behind the scenes, it's the 64-bit value of this floating point that will be interpreted as a 64-bit integer. The issue in doing this is that without that cast the comparison operators used in functions like do_merge_xts() or do_rbind_xts() will be incorrect. So nanotime will represent a special case. Ugly to say the least. We can once more lament the fact that R doesn't provide a 64-bit integer and all the hacks that derive from this lack. Here is an example of what I'm saying about ordering: // -*- compile-command: "gcc -std=c99 -g prec.c -o prec; ./prec" -*-
#include <stdio.h>
#include <stdint.h>
int main() {
int64_t a = -1;
int64_t b = 1;
double* ad = (double*)(&a);
double* bd = (double*)(&b);
printf("ad: %lf, bd: %lf\n", *ad, *bd);
printf("ad < bd: %d\n", *ad < *bd);
return 0;
} Which outputs:
|
There is a simple solution that would solve all these problems: we should drop typedef union {
double *r_storage;
int64_t *xindex;
} i64_index_t; and then map https://github.com/joshuaulrich/xts/blob/master/src/any.c SEXP any_negative (SEXP x)
{
int i;
int len = length(x);
int *int_x = NULL;
i64_index_t idx; /* nanotime index union */
switch (TYPEOF(x)) {
case INTSXP:
int_x = INTEGER(x);
for (i = 0; i < len; i++) {
if (int_x[i] >= 0)
continue;
return ScalarLogical(1);
}
break;
case REALSXP:
/* get the REALSXP data pointer */
idx.r_storage = REAL(x);
for (i = 0; i < len; i++) {
/* use index data as int64_t (nanotime) */
if (idx.xindex[i] >= 0)
continue;
return ScalarLogical(1);
}
break;
default:
error("unsupported type");
}
return ScalarLogical(0);
} It's a small change, but the most important is that it does not break the internal logic of C code ( If you are interested, I would be very happy to help with such an implementation. Best regards, |
btw. R's https://github.com/wch/r-source/blob/trunk/src/main/util.c#L196 it means that the mark for a special case for > x <- as.integer64(123)
> x
integer64
[1] 123
> storage.mode(x)
[1] "double" |
@silentbits I'm replying to your comment in issue 241 here, to keep the conversation in one place.
I do like your idea, and I've been using a similar approach to clean up the merge C code. The idea is that using your approach to clean up the existing code should make it easier to add another index type. My changes are in the generic_index branch. You may also be interested in some of the really crazy stuff in the tight-loop branch (which has no hope of being merged without a solution that uses dynamic arrays to store the nodes). I'm sorry, I didn't realize you were willing to help. I'm glad to have whatever help you can give! I will work on writing a roadmap and some coding standards. I've also started a project for this feature, since it's bigger than just one issue, but I'm not sure if it's publicly visible. I'm going to be busy with R/Finance this week, so I won't be able to make much progress. But we will need plenty of unit tests and performance benchmark tests before making any changes, so you could work on those. It would help if you could look into setting up Rperform for benchmark regression testing. I just created an issue (245) for one thing that needs to be done before we can make significant progress on adding nanosecond resolution. I have been working on that in the insane_refactor branch. |
My opinion is a bit controversial: we should drop > getSymbols('AAPL')
[1] "AAPL"
> # extract last close price
> v1 <- AAPL[nrow(AAPL),4]
> v1
AAPL.Close
2018-06-15 188.84
> # get it as numeric
> v1 <- as.numeric(v1)
> v1
[1] 188.84
> # now let's assign '188.84' to v2
> v2 <- 188.84
> cat(v1, v2)
188.84 188.84
> # are they really equal?
> v1 == v2
[1] FALSE
> # so where is the problem?
> as.character(v1)
[1] "188.839996"
> as.character(v2)
[1] "188.84" We had the conversion > Sys.setenv(TZ='UTC')
> require(nanotime)
> # date + 100 ms
> str(nanotime("2018-06-16T00:00:00.100000000+00:00"))
integer64 1529107200100000000
> print(as.numeric(as.POSIXct("2018-06-16 00:00:00.100")), digits = 20)
[1] 1529107200.0999999046 So This is a very big change. That's why I think we should ask others for an opinion on this issue (Brian, Dirk, Jeff etc. etc.). If they also support this idea, then we can get to work. |
The xts class was originally implemented with the index class and index timezone attributes attached to the xts object itself. That means the xts class needs to know about all possible index classes, which makes it difficult to extend xts to use a new index class.
Jeff started to move the index-specific attributes from the xts object to the index object. See the commit message for e4edc53:
Most places currently set attributes on both objects, because it's not clear which components of xts depend on the index-specific attributes being attached to either object (xts or index).
We would first need to ensure that all attempts to access the index class or timezone are getting and returning values from the index object (note that this applies to R and C code). Then we would need to remove all attempts to set attributes on the xts object... and see what breaks.
Next we need to define an API for any potential index to implement. If I recall correctly, this was part of the idea behind xtime. The thing I remember most was that Jeff said, "xtime is for date-time classes what xts is for time-series classes". With an "xtime API", xts could program to it and you could use any class as an xts index as long as it implemented the xtime API. (Note that "xtime" may not be good name, since it is a Cox Automotive brand).
This API may need more than "index class" and "index timezone". The current xts API will need to be examined to see which functions rely on specific attributes and/or functionalities of the index. At the C level, the xts index can be either
INTSXP
,REALSXP
, or an S4 object (thanks to timeDate).The text was updated successfully, but these errors were encountered: