-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Passing ambiguous ndarray[datetime64[ns]] to DatetimeIndex constructor can cause ValueError with wrong offset #5152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pretty simple test case: In [88]: infer_freq([datetime(2013, 10, 7), datetime(2013, 10, 8), datetime(2013, 10, 9)])
Out[88]: 'D' which I agree is incorrect since a frequency of 'BD' cannot be ruled out. |
@wesm ? |
no that's not incorrect - it's reasonable and valid as the freq for the sequence, right? It might be ambiguous, but all that matters is inferring some frequency. So either it should return a list of possibilities, or we pass the buck to offsets to handle the ambiguity. |
Okay. Agreed, "incorrect" is not the right word. That being said, what exactly is the spec for infer_freq? There is not quite a total ordering of frequencies by specificity, but in general, should infer_freq return the most specific or the more general frequency? |
@cancan101 it obviously is the most general, it is used quite extensively interally to lazily evaluate frequency when its not already assigned |
Perhaps |
@wesm or at the very least could skip frequency inference when small. |
if you give infer_freq 5 consecutive weekdays, it'll come back with 'D' as its inferred frequency. But if your actual frequency is
BDay
, then, when DatetimeIndex checks that the frequency matches, 'B' != 'D'. (note that verify_integrity=False skips this). This leads to a more general issue aboutinfer_freq
with ambiguous cases. I think it makes the most sense to move these sorts of checks to a method on offset that takes a frequence and an Index or ndarray, and determines whether it is compatible.This matters because you can hit some edge cases when you pass freq and also datetime64[ns] to the DatetimeIndex constructor and more generally because comparing freqstr is probably not the best way to go about checking whether a frequency matches.
Default implementation could be:
and then bday could do something like (and this is totally psuedocode)
This gets more complicated with multiplied offsets, but I think it's worth considering.
produces this Traceback:
cc @cancan101 - this is what we need to deal with in adding your offsets. I believe that every other offset can be returned from infer_freq, so these offsets would be different and therefore could never pass integrity checks. So either we'd need to change infer_freq and/or define some kind of is_compatible method that intelligently covers all the ways in which the frequency could be something different than its freqstr.
The text was updated successfully, but these errors were encountered: