Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct time of day subsetting when using hours. #327

Merged

Conversation

claymoremarshall
Copy link

e.g. subsetting with time of day using hours only, e.g. "T01/T03", does not currently give expected
results, returnong the incorrect rows for the time window.

This requires modifying .subsetTimeOfDay(), which did not cover all behaviour previously covered by subsetting which used .parseISO8601().

Careful attention is needed to ensure the second time string (T03 in the above example) covers all bars up to the end of the hour (03:59:59.9999) as was the behaviour previously with .parseISO8601().

An (optional) timeString validation is done to ensure passed in time strings are valid.

Unit tests also updated for time of day subsetting.

Fixes #326

@joshuaulrich
Copy link
Owner

I only had time to take a quick look, but this looks very thorough! I will do a more comprehensive review this weekend.

I did notice you added this test: checkException(x["T0100/T0115"]). That will break existing code.

I won't ask you to add comprehensive test coverage for existing functionality before accepting this PR (though I appreciate any tests you do add!). But it's best practice to write and run your unit tests before making any changes, to ensure they produce expected behavior (including errors you intend to fix). So I would appreciate if you ran your tests against master before adding your changes.

@joshuaulrich joshuaulrich self-requested a review February 19, 2020 12:43
@claymoremarshall
Copy link
Author

Regarding this, do you want to support x["T0100/T0115"]?
I made an assumption that users should be forced to do x["T01:00/T01:15"] instead, but can modify the patch to permit without colons if desired. Just makes string checking a little more tricky

@braverock
Copy link
Contributor

Regarding this, do you want to support x["T0100/T0115"]?
I made an assumption that users should be forced to do x["T01:00/T01:15"] instead, but can modify the patch to permit without colons if desired. Just makes string checking a little more tricky

I would suggest requiring the colon. The ISO standard, as I recall, expects the colon, and forcing a model of ##:##:##.######## seems to leave the smallest number of ways in qhich this could be misinterpreted.

@joshuaulrich
Copy link
Owner

joshuaulrich commented Aug 23, 2020

I would suggest requiring the colon. The ISO standard, as I recall, expects the colon

I just looked at the standard. The basic format does not use a colon, while the extended format does. Both are acceptable.

Edit: also, the standard requires each element be zero-padded, so these are the only supported formats:

Extended Basic
hh:mm:ss.sss or hhmmss.sss
hh:mm:ss or hhmmss
hh:mm or hhmm
hh  hh 

That said, we can still support "T1/T2" because that's unambiguous, but "T12/T23" is not. Even "T1020" is not clear whether it represents 1:02, or 10:02... unless we want to say that the leading zero is not required for hours, but is required for minutes and seconds.

joshuaulrich added a commit to claymoremarshall/xts that referenced this pull request Sep 7, 2020
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See joshuaulrich#326. See joshuaulrich#327.
joshuaulrich added a commit to claymoremarshall/xts that referenced this pull request Sep 7, 2020
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See joshuaulrich#326. See joshuaulrich#327.
cjm and others added 6 commits September 7, 2020 07:19
e.g.  subsetting with "T01/T03" does not currently give expected
results.

This requires fixing .subsetTimeOfDay()

Unit tests also updated for time of day subsetting.

Fixes joshuaulrich#326
The other file is approaching 300 lines, and better to do this now.
Doing it later would detach the history in the new file from the
original file.
This was removed in the commit with the subject:
  "Correct time of day subsetting when using hours."

It's not clear why it was removed, so I'm restoring it.
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See joshuaulrich#326. See joshuaulrich#327.
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See joshuaulrich#326. See joshuaulrich#327.
They create noise in the test results file. This makes it harder to
quickly find any problematic tests.
@joshuaulrich joshuaulrich force-pushed the 326_time_of_day_parsing branch from b84483a to f47ebcc Compare September 7, 2020 12:21
@joshuaulrich joshuaulrich marked this pull request as ready for review September 7, 2020 12:21
@joshuaulrich joshuaulrich merged commit 5d161ae into joshuaulrich:master Sep 7, 2020
joshuaulrich added a commit that referenced this pull request Sep 7, 2020
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See #326. See #327.
joshuaulrich added a commit that referenced this pull request Sep 7, 2020
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See #326. See #327.
@joshuaulrich
Copy link
Owner

@claymoremarshall thanks a lot for taking the initiative to fix this bug! Your code and tests were a really good start, and made it a lot easier for me to make updates.

I restored the 'basic format' functionality and made the regex a bit more robust. I also removed the ability to omit zero-padding on everything except hours. That wasn't supported previously, so we're not losing anything, and it makes the xts time-of-day strings conform to the standard (except that the leading zero can be omitted from hours, which isn't in the standard).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time of day subsetting by hour only giving unexpected result
3 participants