-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
I noticed some potentially unexpected behaviour when converting from string to date. Days that are out of bounds for the given month are rolled over into the following month.
I think the expected behaviour would be to either error (Python) or return NULL/NA (R), but not to roll over dates in the following month.
library(arrow, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
df <- tibble::tibble(string_date = "1999-02-30")
# base R returns NA
df %>%
mutate(date = strptime(string_date, format = "%Y-%m-%d"))
#> # A tibble: 1 × 2
#> string_date date
#> <chr> <dttm>
#> 1 1999-02-30 NA
# arrow rolls over the 30th of February into the 2nd of March
df %>%
arrow_table() %>%
mutate(date = strptime(string_date, format = "%Y-%m-%d")) %>%
collect()
#> # A tibble: 1 × 2
#> string_date date
#> <chr> <dttm>
#> 1 1999-02-30 1999-03-02 00:00:00Thanks Alenka, Joris and Rok for helping me with the Python examples:
pandas:
>>> import pandas as pd
>>> pd.to_datetime("1999-02-30", format="%Y-%m-%d")
...
ValueError: time data 1999-02-30 doesn't match format specifieddatetime:
>>> import datetime
>>> from datetime import datetime
>>> datetime.strptime("1999-02-30", "%Y-%m-%d")
...
ValueError: day is out of range for montharrow:
>>> import pyarrow.compute as pc
>>> print(pc.strptime("1999-02-30", format="%Y-%m-%d", unit="s"))
1999-03-02 00:00:00Reporter: Dragoș Moldovan-Grünfeld / @dragosmg
Watchers: Rok Mihevc / @rok
Related issues:
- [C++] Add error handling option to StrptimeOptions (is related to)
Note: This issue was originally created as ARROW-15948. Please see the migration documentation for further details.