-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Avoid panic on Date32 overflow #9144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those tests were added when there was a custom function used for shift_months (https://github.com/apache/arrow-rs/pull/2031/changes#diff-39c31d552939b8478fb676b059d5920a9c71b6659600348681c9ed7c03ea8485R52). Now that we use chrono functions, those tests are not necessary anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?
That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, added them back. Removed the datetime ones as we don't need them and added a couple test for the integer overflow handling
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach generally looks good, tho it's a lot harder to review a PR that mixes refactoring with bug fixes.
NOTE: Github isn't handling the diff very gracefully. I'm not sure why -- git diff -b handles it just fine?? Discussion here 🍿
arrow-arith/src/numeric.rs
Outdated
| let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap(); | ||
| let ms_per_day = 24 * 60 * 60 * 1000i64; | ||
|
|
||
| // Define the boundary dates using NaiveDate::from_ymd_opt | ||
| let max_valid_date = NaiveDate::from_ymd_opt(262142, 12, 31).unwrap(); | ||
| let min_valid_date = NaiveDate::from_ymd_opt(-262143, 1, 1).unwrap(); | ||
|
|
||
| // Calculate their millisecond values from epoch | ||
| let max_valid_millis = (max_valid_date - epoch).num_milliseconds(); | ||
| let min_valid_millis = (min_valid_date - epoch).num_milliseconds(); | ||
| let max_valid_millis = (MAX_VALID_DATE - EPOCH).num_milliseconds(); | ||
| let min_valid_millis = (MIN_VALID_DATE - EPOCH).num_milliseconds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes like this seem unrelated, better to split out as a "prefactor" PR that can merge quickly? Then this PR would be easier to review by not mixing refactor with bug fixes and API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// # Arguments | ||
| /// | ||
| /// * `i` - The Date32Type to convert | ||
| #[deprecated(since = "58.0.0", note = "Use to_naive_date_opt instead.")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should split out the #[deprecated] annotations to a separate PR so this one can merge immediately while the other waits for next major release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FWIW the next release will be major so it is less of a concern)
| let res = res.add(Duration::try_milliseconds(ms as i64).unwrap()); | ||
| Date32Type::from_naive_date(res) | ||
| let res = Date32Type::to_naive_date_opt(date)?; | ||
| let res = res.checked_add_signed(Duration::try_days(days as i64)?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that add just calls into checked_add_signed and panics. Thus this is equivalent
https://docs.rs/chrono/latest/src/chrono/naive/date/mod.rs.html#1952
impl Add for NaiveDate {
1953 type Output = NaiveDate;
1954
1955 #[inline]
1956 fn add(self, rhs: TimeDelta) -> NaiveDate {
1957 self.checked_add_signed(rhs).expect("NaiveDate + TimeDelta overflowed")
1958 }
| Date32Type, | ||
| as_primitive, | ||
| Date32Type::to_naive_date, | ||
| |v| Date32Type::to_naive_date_opt(v).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to panic implicitly and now the panic'ing is explicit. I think that is an improvement
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?
That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)
arrow-arith/src/numeric.rs
Outdated
| // Date64Type::to_naive_date_opt has boundaries determined by NaiveDate's supported range. | ||
| // The valid date range is from January 1, -262143 to December 31, 262142 (Gregorian calendar). | ||
|
|
||
| let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for cleaning up this repetition
|
Thank you for your contribution @cht42 🙏 |
# Which issue does this PR close? - Closes #N/A (internal refactoring - no issue) # Rationale for this change Noticed while writing tests in #9144, that the current tests could be re-written to be easier to read/re-use.⚠️ FIY, I used claude to refactor those tests, I read the changes and we are keeping the same test cases. The Date64 boundary tests in `arrow-arith/src/numeric.rs` had significant code duplication. Each test function for Date64 operations (`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`, `add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`, `subtract_month_day_nano_opt`) repeated similar setup code and boundary checks, making the test suite harder to maintain and extend. # What changes are included in this PR? This PR refactors the Date64 boundary tests by: 1. **Introducing shared constants** for commonly used values: - `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for chrono's valid date range 2. **Adding utility functions** to reduce repetition: - `date_to_millis(year, month, day)` - converts a date to milliseconds from epoch - `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` - common millisecond values 3. **Consolidating similar test patterns** into parameterized helper functions: - `test_year_month_op()` - tests `add_year_months_opt` and `subtract_year_months_opt` - `test_day_time_op()` - tests `add_day_time_opt` and `subtract_day_time_opt` - `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and `subtract_month_day_nano_opt` 4. **Reducing 8 separate test functions to 4** while maintaining the same test coverage Net result: **-297 lines** (163 added, 460 removed) with equivalent functionality. # Are these changes tested? Yes - this is a refactoring of existing tests. The same boundary conditions and edge cases are still tested, just organized more efficiently. Running `cargo test` confirms all tests pass. # Are there any user-facing changes? No. This is an internal test refactoring with no changes to public APIs or functionality. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
updated with the new tests, had to change them a bit again to account for the types difference |
# Which issue does this PR close? - Closes #N/A (internal refactoring - no issue) # Rationale for this change Noticed while writing tests in apache#9144, that the current tests could be re-written to be easier to read/re-use.⚠️ FIY, I used claude to refactor those tests, I read the changes and we are keeping the same test cases. The Date64 boundary tests in `arrow-arith/src/numeric.rs` had significant code duplication. Each test function for Date64 operations (`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`, `add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`, `subtract_month_day_nano_opt`) repeated similar setup code and boundary checks, making the test suite harder to maintain and extend. # What changes are included in this PR? This PR refactors the Date64 boundary tests by: 1. **Introducing shared constants** for commonly used values: - `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for chrono's valid date range 2. **Adding utility functions** to reduce repetition: - `date_to_millis(year, month, day)` - converts a date to milliseconds from epoch - `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` - common millisecond values 3. **Consolidating similar test patterns** into parameterized helper functions: - `test_year_month_op()` - tests `add_year_months_opt` and `subtract_year_months_opt` - `test_day_time_op()` - tests `add_day_time_opt` and `subtract_day_time_opt` - `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and `subtract_month_day_nano_opt` 4. **Reducing 8 separate test functions to 4** while maintaining the same test coverage Net result: **-297 lines** (163 added, 460 removed) with equivalent functionality. # Are these changes tested? Yes - this is a refactoring of existing tests. The same boundary conditions and edge cases are still tested, just organized more efficiently. Running `cargo test` confirms all tests pass. # Are there any user-facing changes? No. This is an internal test refactoring with no changes to public APIs or functionality. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
This PR does the same thing as what was done in #7737 to handle overflow errors gracely instead of panicking.
Part of #4456.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?