Skip to content

Conversation

@cht42
Copy link
Contributor

@cht42 cht42 commented Jan 12, 2026

Which issue does this PR close?

This PR does the same thing as what was done in #7737 to handle overflow errors gracely instead of panicking.

Part of #4456.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate parquet-variant parquet-variant* crates labels Jan 12, 2026
}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those tests were added when there was a custom function used for shift_months (https://github.com/apache/arrow-rs/pull/2031/changes#diff-39c31d552939b8478fb676b059d5920a9c71b6659600348681c9ed7c03ea8485R52). Now that we use chrono functions, those tests are not necessary anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?

That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, added them back. Removed the datetime ones as we don't need them and added a couple test for the integer overflow handling

@alamb alamb changed the title Refactor Date32 manipulation functions for overflow handling Avoid panic on Date32 overflow Jan 13, 2026
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach generally looks good, tho it's a lot harder to review a PR that mixes refactoring with bug fixes.

NOTE: Github isn't handling the diff very gracefully. I'm not sure why -- git diff -b handles it just fine?? Discussion here 🍿

Comment on lines 1654 to 2111
let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap();
let ms_per_day = 24 * 60 * 60 * 1000i64;

// Define the boundary dates using NaiveDate::from_ymd_opt
let max_valid_date = NaiveDate::from_ymd_opt(262142, 12, 31).unwrap();
let min_valid_date = NaiveDate::from_ymd_opt(-262143, 1, 1).unwrap();

// Calculate their millisecond values from epoch
let max_valid_millis = (max_valid_date - epoch).num_milliseconds();
let min_valid_millis = (min_valid_date - epoch).num_milliseconds();
let max_valid_millis = (MAX_VALID_DATE - EPOCH).num_milliseconds();
let min_valid_millis = (MIN_VALID_DATE - EPOCH).num_milliseconds();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this seem unrelated, better to split out as a "prefactor" PR that can merge quickly? Then this PR would be easier to review by not mixing refactor with bug fixes and API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// # Arguments
///
/// * `i` - The Date32Type to convert
#[deprecated(since = "58.0.0", note = "Use to_naive_date_opt instead.")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should split out the #[deprecated] annotations to a separate PR so this one can merge immediately while the other waits for next major release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW the next release will be major so it is less of a concern)

let res = res.add(Duration::try_milliseconds(ms as i64).unwrap());
Date32Type::from_naive_date(res)
let res = Date32Type::to_naive_date_opt(date)?;
let res = res.checked_add_signed(Duration::try_days(days as i64)?)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that add just calls into checked_add_signed and panics. Thus this is equivalent

https://docs.rs/chrono/latest/src/chrono/naive/date/mod.rs.html#1952

impl Add for NaiveDate {
1953 type Output = NaiveDate;
1954
1955 #[inline]
1956 fn add(self, rhs: TimeDelta) -> NaiveDate {
1957 self.checked_add_signed(rhs).expect("NaiveDate + TimeDelta overflowed")
1958 }

Date32Type,
as_primitive,
Date32Type::to_naive_date,
|v| Date32Type::to_naive_date_opt(v).unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to panic implicitly and now the panic'ing is explicit. I think that is an improvement

}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?

That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)

// Date64Type::to_naive_date_opt has boundaries determined by NaiveDate's supported range.
// The valid date range is from January 1, -262143 to December 31, 262142 (Gregorian calendar).

let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for cleaning up this repetition

@alamb
Copy link
Contributor

alamb commented Jan 13, 2026

Thank you for your contribution @cht42 🙏

@cht42
Copy link
Contributor Author

cht42 commented Jan 14, 2026

I followed @scovich recommendation and spun up #9165 to refactor the current tests. With that refactor, it should be easier to integrate similar tests with Date32Type, so it'd be nice to get that merged first

alamb added a commit that referenced this pull request Jan 14, 2026
# Which issue does this PR close?

- Closes #N/A (internal refactoring - no issue)

# Rationale for this change

Noticed while writing tests in #9144, that the current tests could be
re-written to be easier to read/re-use.

⚠️ FIY, I used claude to refactor those tests, I read the changes and we
are keeping the same test cases.

The Date64 boundary tests in `arrow-arith/src/numeric.rs` had
significant code duplication. Each test function for Date64 operations
(`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`,
`add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`,
`subtract_month_day_nano_opt`) repeated similar setup code and boundary
checks, making the test suite harder to maintain and extend.

# What changes are included in this PR?

This PR refactors the Date64 boundary tests by:

1. **Introducing shared constants** for commonly used values:
- `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for
chrono's valid date range

2. **Adding utility functions** to reduce repetition:
- `date_to_millis(year, month, day)` - converts a date to milliseconds
from epoch
- `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` -
common millisecond values

3. **Consolidating similar test patterns** into parameterized helper
functions:
- `test_year_month_op()` - tests `add_year_months_opt` and
`subtract_year_months_opt`
- `test_day_time_op()` - tests `add_day_time_opt` and
`subtract_day_time_opt`
- `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and
`subtract_month_day_nano_opt`

4. **Reducing 8 separate test functions to 4** while maintaining the
same test coverage

Net result: **-297 lines** (163 added, 460 removed) with equivalent
functionality.

# Are these changes tested?

Yes - this is a refactoring of existing tests. The same boundary
conditions and edge cases are still tested, just organized more
efficiently. Running `cargo test` confirms all tests pass.

# Are there any user-facing changes?

No. This is an internal test refactoring with no changes to public APIs
or functionality.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb
Copy link
Contributor

alamb commented Jan 14, 2026

I followed @scovich recommendation and spun up #9165 to refactor the current tests. With that refactor, it should be easier to integrate similar tests with Date32Type, so it'd be nice to get that merged first

Merged! Now there are some conflicts with this PR to address however

@cht42
Copy link
Contributor Author

cht42 commented Jan 15, 2026

updated with the new tests, had to change them a bit again to account for the types difference

Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
# Which issue does this PR close?

- Closes #N/A (internal refactoring - no issue)

# Rationale for this change

Noticed while writing tests in apache#9144, that the current tests could be
re-written to be easier to read/re-use.

⚠️ FIY, I used claude to refactor those tests, I read the changes and we
are keeping the same test cases.

The Date64 boundary tests in `arrow-arith/src/numeric.rs` had
significant code duplication. Each test function for Date64 operations
(`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`,
`add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`,
`subtract_month_day_nano_opt`) repeated similar setup code and boundary
checks, making the test suite harder to maintain and extend.

# What changes are included in this PR?

This PR refactors the Date64 boundary tests by:

1. **Introducing shared constants** for commonly used values:
- `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for
chrono's valid date range

2. **Adding utility functions** to reduce repetition:
- `date_to_millis(year, month, day)` - converts a date to milliseconds
from epoch
- `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` -
common millisecond values

3. **Consolidating similar test patterns** into parameterized helper
functions:
- `test_year_month_op()` - tests `add_year_months_opt` and
`subtract_year_months_opt`
- `test_day_time_op()` - tests `add_day_time_opt` and
`subtract_day_time_opt`
- `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and
`subtract_month_day_nano_opt`

4. **Reducing 8 separate test functions to 4** while maintaining the
same test coverage

Net result: **-297 lines** (163 added, 460 removed) with equivalent
functionality.

# Are these changes tested?

Yes - this is a refactoring of existing tests. The same boundary
conditions and edge cases are still tested, just organized more
efficiently. Running `cargo test` confirms all tests pass.

# Are there any user-facing changes?

No. This is an internal test refactoring with no changes to public APIs
or functionality.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants