Skip to content

18417: Feat: Make extract SQL expression timezone aware#6

Open
martin-augment wants to merge 7 commits intomainfrom
pr-18417-2025-11-06-11-45-38
Open

18417: Feat: Make extract SQL expression timezone aware#6
martin-augment wants to merge 7 commits intomainfrom
pr-18417-2025-11-06-11-45-38

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 6, 2025

18417: To review by AI


Note

Introduce a timezone-aware EXTRACT UDF, enhance DATE_PART and typed timestamp casts with timezone handling, and update planner, tests, and docs accordingly.

  • Datetime UDFs:
    • Add new extract UDF (datafusion/functions/src/datetime/extract.rs) with timezone-aware extraction and epoch/seconds handling.
    • Enhance date_part to adjust timestamp values using array timezone or session timezone; add helpers adjust_to_local_time and adjust_timestamp_array.
  • Planner/Registration:
    • Register extract in datetime::functions() and expose make_udf_function!(extract::ExtractFunc, extract).
    • Route SQL EXTRACT to extract UDF in DatetimeFunctionPlanner.
  • SQL Parser/Casts:
    • In sql/expr/mod.rs, detect timezone in TypedString casts and set Timestamp(_, Some(tz)); add extract_tz_from_string.
  • Tests:
    • Add sqllogictest file extract_tz.slt covering timezone-aware EXTRACT and session TZ.
    • Update expected plans/results to use extract(...) (replacing date_part(...)) and include Z suffix for timezone-aware ranges.
  • Docs:
    • Document extract function and update Time and Date Functions index.

Written by Cursor Bugbot for commit 5f2f5d7. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

Introduces a new EXTRACT scalar UDF with timezone-aware timestamp handling for both timezone-aware and naive timestamps. Replaces date_part references in query planner and test files. Enhances timezone support in casting and datetime functions. Updates documentation and adds comprehensive timezone tests.

Changes

Cohort / File(s) Summary
EXTRACT UDF and Function Registration
datafusion/functions/src/datetime/extract.rs, datafusion/functions/src/datetime/mod.rs, datafusion/functions/src/datetime/planner.rs
Introduces new ExtractFunc UDF with timezone-aware extraction for timestamps, dates, strings, and intervals. Supports per-element timezone adjustments for timezone-aware data and session timezone conversion for naive timestamps. Registers the function in the datetime module and updates the query planner to use extract() instead of date_part().
Timezone Support Enhancements
datafusion/functions/src/datetime/date_part.rs, datafusion/sql/src/expr/mod.rs
Adds timezone-aware handling to DatePartFunc with per-timestamp timezone adjustments and helper functions for local time conversion. Enhances SQL casting logic to extract and preserve timezone information from string literals when casting to timestamp types.
Test Updates - Plan Rewrites
datafusion/sqllogictest/test_files/group_by.slt, datafusion/sqllogictest/test_files/optimizer_group_by_constant.slt, datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part, datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part, datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part
Updates expected logical and physical plan output to replace date_part() function calls with extract() in projection and grouping expressions.
Test Updates - New Tests and Output Changes
datafusion/sqllogictest/test_files/extract_tz.slt, datafusion/sqllogictest/test_files/table_functions.slt
Adds comprehensive timezone-aware tests for the EXTRACT function covering various extraction units and timestamp formats. Updates timestamp range formatting in table functions output to include UTC designator (Z suffix).
Documentation
docs/source/user-guide/sql/scalar_functions.md
Adds user-facing documentation for the new EXTRACT function under Time and Date Functions, including syntax, arguments, and supported date parts.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18417-2025-11-06-11-45-38

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

}
arr = Arc::new(builder.finish());
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Double Timezone Adjustment Causes Incorrect Hour Extraction

Double timezone adjustment for hour extraction. The code already adjusts timestamps via adjust_timestamp_array (lines 206-247) which properly converts UTC timestamps to the target timezone. However, lines 282-304 apply an additional manual offset adjustment specifically for hour extraction on timezone-aware timestamps. This creates a double adjustment that produces incorrect results. Additionally, line 289 uses .unwrap() which could panic if the timezone string doesn't match the expected format (e.g., for named timezones instead of offset-based ones like "+04:00"). This special-case code should be removed as the timestamp adjustment is already handled correctly by adjust_timestamp_array.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Cursor AI reviewer is correct that the timestamp value is already adjusted with adjust_timestamp_array(), so there is no need to do it again manually for the hour part. Prevents a double offsetting for the hour part

} else {
let sign = if tz_str.starts_with('+') { 1i32 } else { -1i32 };
let hours_str = &tz_str[1..3];
let hours: i32 = hours_str.parse().unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unsafe unwrap for timezone parsing panics on invalid formats

Use of .unwrap() on parsing hours from timezone string. This will panic if the timezone string format is not exactly as expected (e.g., "+04:00"). Named timezones like "America/New_York" or other valid IANA timezone formats would cause a panic here. Should use proper error handling instead of .unwrap().

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Cursor AI reviewer is correct that named offsets are not supported in this PR. The finding prevents abort of the application due to the panic caused by the failed parsing of the hour.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

"doy" => date_part(array.as_ref(), DatePart::DayOfYear)?,
"dow" => date_part(array.as_ref(), DatePart::DayOfWeekSunday0)?,
"isodow" => date_part(array.as_ref(), DatePart::DayOfWeekMonday0)?,
"epoch" => epoch(array.as_ref())?,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "epoch" branch, the earlier timezone adjustment shifts the absolute epoch and returns a value offset by the timezone; epoch should be independent of timezone. Consider skipping the adjustment when extracting epoch so it reflects true seconds since Unix epoch.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct that there shouldn't be adjusting of the timestamp when the user needs it as epoch seconds. The finding prevents wrong values for the epoch case.

array
};

let part_trim = part_normalization(&part);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that timezone-adjusting the input before computing parts will also affect epoch, yielding an offset result; epoch should not depend on timezone. Consider bypassing the adjustment when the requested part is "epoch".

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct that there shouldn't be adjusting of the timestamp when the user needs it as epoch seconds. The finding prevents wrong values for the epoch case.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00bc4b4 and 5f2f5d7.

📒 Files selected for processing (13)
  • datafusion/functions/src/datetime/date_part.rs (6 hunks)
  • datafusion/functions/src/datetime/extract.rs (1 hunks)
  • datafusion/functions/src/datetime/mod.rs (3 hunks)
  • datafusion/functions/src/datetime/planner.rs (1 hunks)
  • datafusion/sql/src/expr/mod.rs (5 hunks)
  • datafusion/sqllogictest/test_files/extract_tz.slt (1 hunks)
  • datafusion/sqllogictest/test_files/group_by.slt (1 hunks)
  • datafusion/sqllogictest/test_files/optimizer_group_by_constant.slt (1 hunks)
  • datafusion/sqllogictest/test_files/table_functions.slt (1 hunks)
  • datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part (2 hunks)
  • datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part (2 hunks)
  • datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part (2 hunks)
  • docs/source/user-guide/sql/scalar_functions.md (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
datafusion/functions/src/datetime/date_part.rs (2)
datafusion/functions/src/datetime/to_local_time.rs (11)
  • as_primitive_array (188-188)
  • None (573-573)
  • adjust_to_local_time (142-142)
  • adjust_to_local_time (153-153)
  • adjust_to_local_time (164-164)
  • adjust_to_local_time (175-175)
  • adjust_to_local_time (194-194)
  • adjust_to_local_time (296-344)
  • adjust_to_local_time (443-443)
  • convert_timestamp (297-312)
  • new (112-116)
datafusion/functions/src/datetime/extract.rs (16)
  • as_primitive_array (375-375)
  • config (230-230)
  • tz_str (209-209)
  • adjust_to_local_time (322-368)
  • adjust_to_local_time (380-380)
  • convert_timestamp (323-338)
  • adjust_timestamp_array (216-216)
  • adjust_timestamp_array (219-219)
  • adjust_timestamp_array (222-222)
  • adjust_timestamp_array (224-224)
  • adjust_timestamp_array (236-236)
  • adjust_timestamp_array (239-239)
  • adjust_timestamp_array (242-242)
  • adjust_timestamp_array (244-244)
  • adjust_timestamp_array (370-386)
  • new (101-134)
datafusion/functions/src/datetime/extract.rs (2)
datafusion/common/src/cast.rs (10)
  • as_date32_array (47-49)
  • as_date64_array (52-54)
  • as_time32_millisecond_array (215-217)
  • as_time32_second_array (210-212)
  • as_time64_microsecond_array (220-222)
  • as_time64_nanosecond_array (225-227)
  • as_timestamp_microsecond_array (244-248)
  • as_timestamp_millisecond_array (237-241)
  • as_timestamp_nanosecond_array (230-234)
  • as_timestamp_second_array (251-253)
datafusion/common/src/utils/mod.rs (1)
  • take_function_args (923-937)
datafusion/sql/src/expr/mod.rs (2)
datafusion/functions-table/src/generate_series.rs (2)
  • value (122-124)
  • tz_str (126-128)
datafusion/functions/src/datetime/date_part.rs (2)
  • new (102-136)
  • tz_str (213-213)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
🔇 Additional comments (12)
datafusion/functions/src/datetime/date_part.rs (2)

204-252: LGTM! Timezone-aware extraction logic is well-structured.

The implementation correctly handles three cases:

  1. Timezone-aware timestamps (with explicit timezone)
  2. Naive timestamps (interpreted in session timezone)
  3. Non-timestamp types (no adjustment)

The logic properly adjusts timestamps before extraction, ensuring date parts are computed in the appropriate timezone.


46-46: Both error macros are actively used and should remain imported.

Verification confirms internal_err! is used on line 153 and internal_datafusion_err! is used on lines 333 and 339. Both macros are necessary for the file's error handling and should be retained.

datafusion/sqllogictest/test_files/tpch/plans/q8.slt.part (2)

63-63: LGTM! Test expectations correctly updated.

The logical plan projection now uses extract(Utf8("YEAR"), orders.o_orderdate) instead of date_part(...), which aligns with the new Extract UDF implementation.


100-100: LGTM! Physical plan projection updated consistently.

The physical plan correctly reflects the extract function usage: extract(YEAR, o_orderdate@2).

datafusion/functions/src/datetime/planner.rs (1)

32-32: LGTM! Planner correctly routes to new extract UDF.

The change from crate::datetime::date_part() to crate::datetime::extract() properly integrates the new timezone-aware extract function into the datetime function planner.

docs/source/user-guide/sql/scalar_functions.md (1)

2574-2603: LGTM! Comprehensive documentation for the extract function.

The documentation clearly describes:

  • Syntax: extract(field FROM source)
  • Supported fields (year, quarter, month, etc.)
  • Both field and source arguments

The documentation is consistent with the existing date_part function documentation and properly positioned in the Time and Date Functions section.

datafusion/sqllogictest/test_files/optimizer_group_by_constant.slt (1)

93-93: LGTM! Test output correctly reflects extract function.

The projection now shows NOT extract(Utf8("MONTH"),now()) instead of date_part, which is consistent with the extract UDF migration.

datafusion/sqllogictest/test_files/table_functions.slt (1)

356-357: LGTM! Timestamp output correctly includes timezone indicator.

The timestamps now include the Z suffix (e.g., 2023-01-01T00:00:00Z), which properly indicates UTC timezone. This is consistent with the timezone-aware extraction implementation.

datafusion/sqllogictest/test_files/tpch/plans/q9.slt.part (2)

59-59: LGTM! Logical plan updated to use extract.

The projection correctly uses extract(Utf8("YEAR"), orders.o_orderdate) for YEAR extraction.


85-85: LGTM! Physical plan consistent with logical plan.

The physical plan projection correctly uses extract(YEAR, o_orderdate@4).

datafusion/sqllogictest/test_files/tpch/plans/q7.slt.part (2)

65-65: LGTM! Logical plan correctly uses extract function.

The projection shows extract(Utf8("YEAR"), lineitem.l_shipdate) for extracting the year from the shipdate column.


94-94: LGTM! Physical plan matches logical plan semantics.

The physical plan projection correctly uses extract(YEAR, l_shipdate@2).

Comment on lines +301 to +365
fn adjust_to_local_time<T: ArrowTimestampType>(ts: i64, tz: Tz) -> Result<i64> {
fn convert_timestamp<F>(ts: i64, converter: F) -> Result<DateTime<Utc>>
where
F: Fn(i64) -> MappedLocalTime<DateTime<Utc>>,
{
match converter(ts) {
MappedLocalTime::Ambiguous(earliest, latest) => exec_err!(
"Ambiguous timestamp. Do you mean {:?} or {:?}",
earliest,
latest
),
MappedLocalTime::None => exec_err!(
"The local time does not exist because there is a gap in the local time."
),
MappedLocalTime::Single(date_time) => Ok(date_time),
}
}

let date_time = match T::UNIT {
Nanosecond => Utc.timestamp_nanos(ts),
Microsecond => convert_timestamp(ts, |ts| Utc.timestamp_micros(ts))?,
Millisecond => convert_timestamp(ts, |ts| Utc.timestamp_millis_opt(ts))?,
Second => convert_timestamp(ts, |ts| Utc.timestamp_opt(ts, 0))?,
};

let offset_seconds: i64 = tz
.offset_from_utc_datetime(&date_time.naive_utc())
.fix()
.local_minus_utc() as i64;

let adjusted_date_time = date_time.add(
TimeDelta::try_seconds(offset_seconds)
.ok_or_else(|| internal_datafusion_err!("Offset seconds should be less than i64::MAX / 1_000 or greater than -i64::MAX / 1_000"))?,
);

// convert back to i64
match T::UNIT {
Nanosecond => adjusted_date_time.timestamp_nanos_opt().ok_or_else(|| {
internal_datafusion_err!(
"Failed to convert DateTime to timestamp in nanosecond. This error may occur if the date is out of range. The supported date ranges are between 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807"
)
}),
Microsecond => Ok(adjusted_date_time.timestamp_micros()),
Millisecond => Ok(adjusted_date_time.timestamp_millis()),
Second => Ok(adjusted_date_time.timestamp()),
}
}

fn adjust_timestamp_array<T: ArrowTimestampType>(
array: &ArrayRef,
tz: Tz,
) -> Result<ArrayRef> {
let mut builder = PrimitiveBuilder::<T>::new();
let primitive_array = as_primitive_array::<T>(array)?;
for ts_opt in primitive_array.iter() {
match ts_opt {
None => builder.append_null(),
Some(ts) => {
let adjusted_ts = adjust_to_local_time::<T>(ts, tz)?;
builder.append_value(adjusted_ts);
}
}
}
Ok(Arc::new(builder.finish()))
}
Copy link

@coderabbitai coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Refactor to eliminate code duplication.

The helper functions adjust_to_local_time and adjust_timestamp_array are duplicated across three files:

  • datafusion/functions/src/datetime/date_part.rs (this file)
  • datafusion/functions/src/datetime/to_local_time.rs (lines 295-343, 369-385)
  • datafusion/functions/src/datetime/extract.rs (lines 321-367, 369-385)

This violates the DRY principle and creates a maintenance burden. Consider extracting these functions into a shared utility module (e.g., datafusion/functions/src/datetime/utils.rs or similar) and importing them in all three locations.

// In a new shared module (e.g., datetime/timezone_utils.rs):
pub(crate) fn adjust_to_local_time<T: ArrowTimestampType>(ts: i64, tz: Tz) -> Result<i64> {
    // ... existing implementation
}

pub(crate) fn adjust_timestamp_array<T: ArrowTimestampType>(
    array: &ArrayRef,
    tz: Tz,
) -> Result<ArrayRef> {
    // ... existing implementation
}

Then import in each file:

use crate::datetime::timezone_utils::{adjust_to_local_time, adjust_timestamp_array};

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that these helper functions could be reused instead of copied in several files. This will prevent technical dept and maintenance cost.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming this issue! You're absolutely right that extracting these helpers will prevent technical debt and reduce maintenance burden.

Since this refactoring affects multiple files in the datetime module, would you like to:

  1. Address it in this PR - I can help identify all the duplication points and suggest the refactoring structure
  2. Create a follow-up issue - Track this as a separate refactoring task to keep this PR focused on the timezone-aware functionality

If you'd like to go with option 2, I can open a GitHub issue for you with details about the duplicated functions and the suggested approach for consolidating them into a shared utility module.

What would you prefer?

Comment on lines +282 to +303
if is_timezone_aware && part_trim.to_lowercase() == "hour" {
if let Some(tz_str) = &tz_str_opt {
let offset_hours = if tz_str.as_ref() == "+00:00" {
0
} else {
let sign = if tz_str.starts_with('+') { 1i32 } else { -1i32 };
let hours_str = &tz_str[1..3];
let hours: i32 = hours_str.parse().unwrap();
sign * hours
};
let int_arr = as_int32_array(&arr)?;
let mut builder = PrimitiveBuilder::<Int32Type>::new();
for i in 0..arr.len() {
if arr.is_null(i) {
builder.append_null();
} else {
let v = int_arr.value(i);
builder.append_value(v + offset_hours);
}
}
arr = Arc::new(builder.finish());
}
Copy link

@coderabbitai coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the post-adjustment of hours for timezone-aware timestamps

adjust_timestamp_array(...) already converts each timestamp into the correct local time before date_part runs. The extra block that re-adds an offset:

  • panics outright for common zone names like America/New_York (hours_str indexing hits non-digits and the parse().unwrap() aborts),
  • ignores minute offsets (e.g. +05:30 becomes +05), and
  • double-adjusts fixed offsets, yielding wrong hour values.

We should drop this block and rely on the earlier conversion for all parts.

-        if is_timezone_aware && part_trim.to_lowercase() == "hour" {
-            if let Some(tz_str) = &tz_str_opt {
-                let offset_hours = if tz_str.as_ref() == "+00:00" {
-                    0
-                } else {
-                    let sign = if tz_str.starts_with('+') { 1i32 } else { -1i32 };
-                    let hours_str = &tz_str[1..3];
-                    let hours: i32 = hours_str.parse().unwrap();
-                    sign * hours
-                };
-                let int_arr = as_int32_array(&arr)?;
-                let mut builder = PrimitiveBuilder::<Int32Type>::new();
-                for i in 0..arr.len() {
-                    if arr.is_null(i) {
-                        builder.append_null();
-                    } else {
-                        let v = int_arr.value(i);
-                        builder.append_value(v + offset_hours);
-                    }
-                }
-                arr = Arc::new(builder.finish());
-            }
-        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if is_timezone_aware && part_trim.to_lowercase() == "hour" {
if let Some(tz_str) = &tz_str_opt {
let offset_hours = if tz_str.as_ref() == "+00:00" {
0
} else {
let sign = if tz_str.starts_with('+') { 1i32 } else { -1i32 };
let hours_str = &tz_str[1..3];
let hours: i32 = hours_str.parse().unwrap();
sign * hours
};
let int_arr = as_int32_array(&arr)?;
let mut builder = PrimitiveBuilder::<Int32Type>::new();
for i in 0..arr.len() {
if arr.is_null(i) {
builder.append_null();
} else {
let v = int_arr.value(i);
builder.append_value(v + offset_hours);
}
}
arr = Arc::new(builder.finish());
}
// [preceding code context - lines 1-281]
// [code that follows the removed block - originally after line 303]
🤖 Prompt for AI Agents
In datafusion/functions/src/datetime/extract.rs around lines 282 to 303, remove
the entire post-adjustment block that manually parses tz_str and adds
offset_hours to the hour values; this block panics on named zones, drops minute
offsets, and double-adjusts timestamps because adjust_timestamp_array already
produced local times. Delete that conditional branch (the if let Some(tz_str) {
... } block) and any dependent code so date_part uses the already-converted
values; ensure arr remains the Arc returned from adjust_timestamp_array and no
further manual offset arithmetic is performed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that the timestamp value is already adjusted with adjust_timestamp_array(), so there is no need to do it again manually for the hour part. Prevents a double offsetting for the hour part

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming! You're absolutely right—since adjust_timestamp_array() already converts the timestamps to local time by applying the timezone offset, the additional manual hour adjustment at lines 282-303 would indeed cause double offsetting. Removing that block ensures the hour (and all other parts) are extracted from the correctly adjusted timestamps without any redundant offset arithmetic.

If you'd like, I can help verify the fix once you've applied it, or assist with any related testing! 🎉


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

  • critical – datafusion/functions/src/datetime/extract.rs:261: The extra hour adjustment assumes the timezone string is a fixed ±HH:MM offset, slices tz_str[1..3], and unwraps the parse. This will panic for region names like Europe/Brussels (the common case for Arrow timestamps) and drops any minute component (e.g. +05:30 becomes +05). Either rely on the already-adjusted timestamps or compute the per-row offset via the parsed Tz rather than string slicing.
  • major – datafusion/functions/src/datetime/extract.rs:186 (same pattern in datafusion/functions/src/datetime/date_part.rs:187): The new pre-adjustment rewrites timezone-aware values into the session/local timezone before delegating to epoch. For extract(epoch from timestamp with time zone ...) this shifts the result by the offset, whereas SQL expects seconds since the UTC epoch irrespective of timezone. Please skip the local-time adjustment (or branch before calling epoch) so the epoch stays anchored to UTC.

@martin-augment
Copy link
Owner Author

martin-augment commented Nov 6, 2025

  • major – datafusion/functions/src/datetime/extract.rs:186 (same pattern in datafusion/functions/src/datetime/date_part.rs:187): The new pre-adjustment rewrites timezone-aware values into the session/local timezone before delegating to epoch. For extract(epoch from timestamp with time zone ...) this shifts the result by the offset, whereas SQL expects seconds since the UTC epoch irrespective of timezone. Please skip the local-time adjustment (or branch before calling epoch) so the epoch stays anchored to UTC.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that there shouldn't be adjusting of the timestamp when the user needs it as epoch seconds. The finding prevents wrong values for the epoch case.

@martin-augment
Copy link
Owner Author

martin-augment commented Nov 6, 2025

  • critical – datafusion/functions/src/datetime/extract.rs:261: The extra hour adjustment assumes the timezone string is a fixed ±HH:MM offset, slices tz_str[1..3], and unwraps the parse. This will panic for region names like Europe/Brussels (the common case for Arrow timestamps) and drops any minute component (e.g. +05:30 becomes +05). Either rely on the already-adjusted timestamps or compute the per-row offset via the parsed Tz rather than string slicing.

value:useful; category:bug; feedback:The Codex AI reviewer is correct that currently special timezones like +05:30 and -07:45 won't work as desired. The finding prevents incorrect date part values due to the ignored minutes in the offset. Named offsets are not supported at all in this PR. They will panic and abort the application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments