Skip to content

19755: Extract out date time parsing to a DateTimeParser trait#177

Open
martin-augment wants to merge 1 commit intomainfrom
pr-19755-2026-01-13-12-51-21
Open

19755: Extract out date time parsing to a DateTimeParser trait#177
martin-augment wants to merge 1 commit intomainfrom
pr-19755-2026-01-13-12-51-21

Conversation

@martin-augment
Copy link
Owner

19755: To review by AI

… jiff implementations. Overall jiff seems slightly faster in the case where parsing has no errors but slower when using multiple formats (higher cost for Error handling).
@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Walkthrough

This change introduces a pluggable date-time parsing abstraction system to DataFusion. A new DateTimeParser trait is added with two implementations: ChronoDateTimeParser and JiffDateTimeParser, selectable via a new date_time_parser configuration option defaulting to "chrono". The to_date and to_timestamp function families are refactored to use this parser abstraction instead of hard-coded parsing logic. A new jiff Cargo feature is introduced to optionally enable Jiff as a date-time parser. Benchmarks are extended to compare parsing performance across both implementations. Error messages are updated to reflect format arrays rather than single format strings. Configuration infrastructure is expanded to support parser selection.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 13, 2026

PR Review: Extract datetime parsing to DateTimeParser trait

Summary

This PR introduces a DateTimeParser trait to abstract datetime parsing, allowing multiple implementations (Chrono and Jiff). This is a solid architectural improvement that enables configurable parsing backends.

Strengths ✅

  1. Clean abstraction: The DateTimeParser trait provides a well-defined interface with three methods covering the main use cases
  2. Good documentation: The trait and implementations have comprehensive doc comments explaining differences between parsers
  3. Backward compatibility: The PR maintains the default Chrono parser and makes Jiff optional via feature flag
  4. Extensive benchmarks: The benchmark updates show performance comparison between parsers
  5. Configuration support: New date_time_parser config option allows runtime selection

Issues & Concerns ⚠️

1. Panic in production code (High Priority)

Location: datafusion/functions/src/datetime/parser.rs:76

_ => panic!("Unknown/unsupported date time parser: {p}"),

Issue: Using panic! for invalid configuration can crash the application. This should return a proper error.

Recommendation:

_ => return exec_err!("Unknown/unsupported date time parser: {p}. Supported: chrono, jiff"),

2. Trait object equality/hashing concerns (Medium Priority)

Location: Uses dyn-eq and dyn-hash crates for trait object comparison

Issue: The DateTimeParser trait requires DynEq + DynHash to enable comparison of trait objects. While functional, this adds external dependencies and may have subtle edge cases.

Questions:

  • Are these crates widely used/maintained?
  • Is equality comparison actually needed for the parser? Could we use Arc::ptr_eq instead?
  • The trait objects are compared in PartialEq implementations - is this for caching/memoization?

Recommendation: Consider if equality/hash is truly necessary, or if identity comparison would suffice.

3. Inconsistent error messages (Low Priority)

Location: datafusion/functions/src/datetime/parser.rs:218-222

The error messages differ between Chrono and Jiff:

  • Chrono: "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}"
  • Jiff: Just "{e}" (line 587)

Recommendation: Standardize error message format across implementations for better UX.

4. Benchmark data validity (Low Priority)

Location: datafusion/functions/benches/to_timestamp.rs:63

inputs.append_value("1997-01-31T09:26:56UTC");

format1_builder.append_value("%m");  // Won't match full timestamp
format2_builder.append_value("%H");  // Won't match full timestamp  
format3_builder.append_value("%+");

Issue: The first two formats intentionally don't match the input, so they'll always fail. This benchmarks error handling rather than successful parsing.

Recommendation: Use valid formats for all entries, or clearly document that this benchmarks format fallback behavior.

5. Missing feature flag guards (Low Priority)

Location: datafusion/functions/src/datetime/parser.rs:74

#[cfg(feature = "jiff")]
"jiff" => {
    Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser>
}

Issue: If someone sets date_time_parser = "jiff" without enabling the feature flag, they'll hit the panic mentioned in issue #1.

Recommendation: Add a helpful compile-time error or runtime message:

#[cfg(not(feature = "jiff"))]
"jiff" => {
    return exec_err!("jiff parser requested but 'jiff' feature is not enabled. Enable with --features jiff");
}

6. Typo in comment (Trivial)

Location: datafusion/functions/src/datetime/parser.rs:58

// impl Eq and PartialEaq for dyn DateTimeParser

Should be PartialEq not PartialEaq.

Code Quality Assessment

Architecture: Good ⭐⭐⭐⭐

  • Clean separation of concerns
  • Extensible design for future parsers
  • Proper use of trait abstraction

Error Handling: Needs Improvement ⭐⭐

  • Contains panic! in production code
  • Inconsistent error messages between implementations
  • Missing helpful errors for feature flag issues

Testing: Good ⭐⭐⭐⭐

  • Comprehensive test coverage maintained
  • Benchmark comparisons between implementations
  • SQL logic tests updated

Documentation: Excellent ⭐⭐⭐⭐⭐

  • Clear trait documentation
  • Implementation differences explained
  • Config option documented

Performance Considerations

The benchmarks show Jiff can be faster in some scenarios. The trait abstraction overhead should be minimal since the parser is selected once at function creation, not per invocation.

Security Concerns

No security issues identified. The parsers handle untrusted input appropriately with error returns.

Test Coverage

Test coverage appears comprehensive:

  • Updated existing tests for new error messages
  • Maintains backward compatibility tests
  • Config option changes tested

Recommendations Summary

  1. Must fix: Replace panic! with proper error handling (parser.rs:76)
  2. Should fix: Add helpful error for missing jiff feature flag
  3. Should review: Evaluate necessity of dyn-eq/dyn-hash dependencies
  4. Nice to have: Standardize error messages across implementations
  5. Nice to have: Fix typo in comment
  6. Nice to have: Review benchmark data validity

Overall Assessment: Approve with Changes

This is a well-designed refactoring that improves the codebase architecture. The primary concern is the panic! in production code which should be fixed before merging. The other issues are minor and could be addressed in follow-up work.


Review generated by Claude Code

@augmentcode
Copy link

augmentcode bot commented Jan 13, 2026

🤖 Augment PR Summary

Summary: This PR extracts datetime parsing into a dedicated DateTimeParser trait and wires it through the built-in datetime functions, enabling configurable parsing backends.

Changes:

  • Introduces datafusion_functions::datetime::parser with a DateTimeParser trait and a default Chrono implementation, plus an optional Jiff implementation behind the jiff feature.
  • Adds datafusion.execution.date_time_parser config option (default chrono) and documents it in the user guide and information_schema outputs.
  • Updates to_timestamp* and to_date to select and use the configured parser (and adds constructors that accept ConfigOptions).
  • Refactors formatted parsing paths to pass multiple candidate formats together and updates error messages to include the full list of attempted formats.
  • Updates SQLLogicTests and the datetime example expectations for the new error text.
  • Extends the to_timestamp benchmarks to compare Chrono vs Jiff parsing behavior and performance.

Technical Notes: Jiff support is opt-in via Cargo features, and the parser choice is read from ConfigOptions at UDF construction time.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

// writing a custom parser.
let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") {
// grab the string after the last space as the named timezone
let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parts[1] will panic if the input string doesn’t contain a space (e.g. format ends with " %Z" but the value is missing the trailing timezone), turning a parse error into a hard crash. Consider handling the “no separator / missing timezone” case explicitly and returning an error instead of indexing into parts.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.

} else {
val = Some(r);
}
let r = op(x, &v);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all provided format arguments are NULL for a given row, v becomes empty and op(x, &v) gets called anyway, which likely returns an error; previously this path would propagate NULL without attempting to parse. Consider short-circuiting when v.is_empty() to preserve NULL semantics for nullable format arguments (also applies to other varargs datetime functions using this helper).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! There is no point in calling the parse function if there are no formats to use. The logic should return an Err early with an explanation that a format is missing.

Ok(dt) => {
let dt = dt.fixed_offset();
let result =
Timestamp::from_nanosecond(dt.timestamp() as i128);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jiff parser uses seconds instead of nanoseconds

High Severity

In the JiffDateTimeParser::string_to_datetime_formatted method, when handling the %c format, chrono's dt.timestamp() returns seconds since epoch but is passed to Timestamp::from_nanosecond() which expects nanoseconds. This causes parsed timestamps to be off by a factor of 1 billion, resulting in completely incorrect dates (e.g., a year 2001 date would become approximately 1970). The code should use dt.timestamp_nanos_opt() instead of dt.timestamp().

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Cursor AI reviewer is correct! chrono::DateTime::timestamp() returns the seconds since epoch, not the nanoseconds, so it is invalid input. Prevents using a wrong value for instantiating a Timestamp object and this will lead to wrong calculations afterwards.

// grab the string after the last space as the named timezone
let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
let timezone_name = parts[0];
datetime_str = parts[1];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chrono parser panics when input lacks expected space

High Severity

When the format string ends with " %Z" but the input datetime_str contains no space, rsplitn(2, ' ') returns a single-element vector. Accessing parts[1] then causes an index out of bounds panic. The old code safely used rsplit_once(' ') which returns an Option and handled the no-space case gracefully. This is a regression that can crash query execution when users provide mismatched format strings and input.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
datafusion/functions/src/datetime/common.rs (1)

167-194: Scalar path passes formats one-at-a-time, inconsistent with array path.

The scalar handling loop calls op(a, &[s.as_str()]) with a single-element slice for each format argument, iterating externally. However, the array path in handle_array_op (line 319) collects all formats into a vector and calls op(x, &v) once with all formats.

This inconsistency means:

  • Scalar inputs: op is invoked multiple times, each with one format
  • Array inputs: op is invoked once with all formats

If the parser implementation depends on seeing all formats at once (e.g., for error messages listing all attempted formats), behavior will diverge between scalar and array inputs.

Proposed fix to align scalar path with array path
         ColumnarValue::Scalar(scalar) => match scalar.try_as_str() {
             Some(a) => {
                 let a = a.as_ref();
                 // ASK: Why do we trust `a` to be non-null at this point?
                 let a = unwrap_or_internal_err!(a);
 
-                let mut ret = None;
+                let mut formats: Vec<&str> = Vec::new();
 
                 for (pos, v) in args.iter().enumerate().skip(1) {
                     let ColumnarValue::Scalar(
                         ScalarValue::Utf8View(x)
                         | ScalarValue::LargeUtf8(x)
                         | ScalarValue::Utf8(x),
                     ) = v
                     else {
                         return exec_err!(
                             "Unsupported data type {v:?} for function {name}, arg # {pos}"
                         );
                     };
 
                     if let Some(s) = x {
-                        match op(a, &[s.as_str()]) {
-                            Ok(r) => {
-                                let result = op2(r).to_i64();
-                                let s = scalar_value(dt, result)?;
-                                ret = Some(Ok(ColumnarValue::Scalar(s)));
-                                break;
-                            }
-                            Err(e) => ret = Some(Err(e)),
-                        }
+                        formats.push(s.as_str());
                     }
                 }
 
-                unwrap_or_internal_err!(ret)
+                if formats.is_empty() {
+                    return exec_err!("No valid format strings provided for function {name}");
+                }
+
+                let r = op(a, &formats)?;
+                let result = op2(r).to_i64();
+                let s = scalar_value(dt, result)?;
+                Ok(ColumnarValue::Scalar(s))
             }
🤖 Fix all issues with AI agents
In @datafusion/functions/src/datetime/parser.rs:
- Around line 24-63: The comment contains a typo and the trait methods use
redundant Result type parameters: fix the comment "PartialEaq" to "PartialEq"
and update the three method signatures on the DateTimeParser trait
(string_to_timestamp_nanos, string_to_timestamp_nanos_formatted,
string_to_timestamp_millis_formatted) to return datafusion_common::Result<i64>
(remove the explicit ", DataFusionError" type argument). Ensure the surrounding
dyn_eq::eq_trait_object!(DateTimeParser) and
dyn_hash::hash_trait_object!(DateTimeParser) usage remains unchanged.
- Around line 120-148: The code assumes a space exists before the timezone when
format.ends_with(" %Z") and directly indexes parts[1], which panics if the input
lacks that space; change the parsing to avoid Vec/indexing by using the rsplitn
iterator and pattern matching: call let mut parts = datetime_str.rsplitn(2, '
'); match (parts.next(), parts.next()) { (Some(timezone_name), Some(rest)) => {
datetime_str = rest; let result: Result<chrono_tz::Tz, ParseError> =
timezone_name.parse(); let Ok(tz) = result else { err =
Some(result.unwrap_err().to_string()); continue; }; format =
&format[..format.len() - 3]; Some(tz) }, _ => { err = Some("expected a space
before timezone when format ends with ' %Z'".into()); continue; } }, ensuring
you reference the existing variables format, datetime_str, tz, result and err.
- Around line 64-80: Change get_date_time_parser to return Result<Box<dyn
DateTimeParser>, DataFusionError> instead of panicking on unknown/unsupported
parser: replace the panic!("Unknown/unsupported date time parser: {p}") branch
with an Err(DataFusionError::Plan(format!(...))) (or other appropriate
DataFusionError) and adjust the None branch to return Ok(Box::new(...)). Update
all call sites that use get_date_time_parser to handle the Result. Additionally
change the constructors/new_with_config() signatures used by
chrono::ChronoDateTimeParser::new(), jiff::JiffDateTimeParser::new() and any
macro-generated constructors (notably new_with_config() used by ToDateFunc and
all ToTimestamp* functions) to return Result<Self, DataFusionError> and
propagate the Result through their callers and macro expansions so the factory
can construct parsers via ?.
🧹 Nitpick comments (6)
datafusion/functions/Cargo.toml (1)

122-124: Consider loosening bench feature requirements (optional).
Requiring jiff for the to_timestamp bench blocks running benches in chrono-only builds; if the bench can conditionally include jiff comparisons when enabled, it’s nicer to keep jiff optional here.

datafusion/common/src/config.rs (1)

508-516: Normalize/validate datafusion.execution.date_time_parser at config parse time.

Right now this is a free-form Option<String> with no transform/validation, so values like "Chrono" (case) or typos will parse fine and only fail later (or silently fall back, depending on the factory). Consider normalizing to lowercase and/or using a small enum for immediate validation.

Proposed minimal change: lowercase normalization
-        pub date_time_parser: Option<String>, default = Some("chrono".to_string())
+        pub date_time_parser: Option<String>, transform = str::to_lowercase, default = Some("chrono".to_string())
datafusion/functions/benches/to_timestamp.rs (2)

189-220: Remove println! from benchmark hot path.

The println! statements inside the benchmark iteration will affect timing accuracy and pollute benchmark output. These should be removed or moved outside the measured loop.

♻️ Suggested fix
                 b.iter(|| {
                     for (input, format3, jiff_format3) in
                         izip!(inputs.iter(), format3.iter(), jiffformat3.iter())
                     {
                         black_box(match parser {
                             "chrono" => {
-                                let t = datetime_parser
+                                datetime_parser
                                     .string_to_timestamp_nanos_formatted(
                                         "UTC",
                                         input.unwrap(),
                                         &[format3.unwrap()],
-                                    );
-                                if t.is_err() {
-                                    println!("Error: {:?}", t);
-                                }
+                                    )
                             }
                             "jiff" => {
-                                let t = datetime_parser
+                                datetime_parser
                                     .string_to_timestamp_nanos_formatted(
                                         "UTC",
                                         input.unwrap(),
                                         &[jiff_format3.unwrap()],
-                                    );
-                                if t.is_err() {
-                                    println!("Error: {:?}", t);
-                                }
+                                    )
                             }
                             _ => unreachable!(),
                         })
                     }
                 })

248-300: Same issue: remove println! from benchmark hot path.

The println! statements in the multiple-formats benchmark also affect timing accuracy.

♻️ Suggested fix
                 b.iter(|| {
                     for (
                         input,
                         format1,
                         format2,
                         format3,
                         jiff_format1,
                         jiff_format2,
                         jiff_format3,
                     ) in izip!(
                         inputs.iter(),
                         format1.iter(),
                         format2.iter(),
                         format3.iter(),
                         jiff_format1.iter(),
                         jiff_format2.iter(),
                         jiff_format3.iter()
                     ) {
                         black_box(match parser {
                             "chrono" => {
-                                let t = datetime_parser
+                                datetime_parser
                                     .string_to_timestamp_nanos_formatted(
                                         "UTC",
                                         input.unwrap(),
                                         &[
                                             format1.unwrap(),
                                             format2.unwrap(),
                                             format3.unwrap(),
                                         ],
-                                    );
-                                if t.is_err() {
-                                    println!("Error: {:?}", t);
-                                }
+                                    )
                             }
                             "jiff" => {
-                                let t = datetime_parser
+                                datetime_parser
                                     .string_to_timestamp_nanos_formatted(
                                         "UTC",
                                         input.unwrap(),
                                         &[
                                             jiff_format1.unwrap(),
                                             jiff_format2.unwrap(),
                                             jiff_format3.unwrap(),
                                         ],
-                                    );
-                                if t.is_err() {
-                                    println!("Error: {:?}", t);
-                                }
+                                    )
                             }
                             _ => unreachable!(),
                         })
                     }
                 })
datafusion/functions/src/datetime/to_date.rs (2)

69-103: Consider returning &mut Self for builder-style chaining.

The with_datetime_parser method takes &mut self but returns &Self, which prevents fluent builder patterns on owned values. Consider returning &mut Self instead for consistency with common Rust builder patterns.

♻️ Suggested fix
     /// Set a custom date time parser.
-    pub fn with_datetime_parser(&mut self, parser: Box<dyn DateTimeParser>) -> &Self {
+    pub fn with_datetime_parser(&mut self, parser: Box<dyn DateTimeParser>) -> &mut Self {
         self.parser = parser;

         self
     }

417-424: Debug print in test should use eprintln! or be removed.

The println! for debugging test failures is fine for development, but consider using eprintln! or including the error in the panic message directly for cleaner test output.

♻️ Suggested fix
-                e => {
-                    println!("e was {e:?}");
-                    panic!(
-                        "Could not convert '{}' with format string '{}' to Date",
-                        tc.formatted_date, tc.format_str
-                    )
-                }
+                e => panic!(
+                    "Could not convert '{}' with format string '{}' to Date: {e:?}",
+                    tc.formatted_date, tc.format_str
+                ),
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51b9553 and c124860.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • datafusion-examples/examples/builtin_functions/date_time.rs
  • datafusion/common/src/config.rs
  • datafusion/core/Cargo.toml
  • datafusion/functions/Cargo.toml
  • datafusion/functions/benches/to_timestamp.rs
  • datafusion/functions/src/datetime/common.rs
  • datafusion/functions/src/datetime/mod.rs
  • datafusion/functions/src/datetime/parser.rs
  • datafusion/functions/src/datetime/to_date.rs
  • datafusion/functions/src/datetime/to_timestamp.rs
  • datafusion/sqllogictest/test_files/datetime/dates.slt
  • datafusion/sqllogictest/test_files/datetime/timestamps.slt
  • datafusion/sqllogictest/test_files/information_schema.slt
  • datafusion/sqllogictest/test_files/to_timestamp_timezone.slt
  • docs/source/user-guide/configs.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

Applied to files:

  • datafusion/functions/src/datetime/to_timestamp.rs
🧬 Code graph analysis (5)
datafusion/functions/src/datetime/to_date.rs (1)
datafusion/functions/src/datetime/parser.rs (3)
  • new (97-99)
  • new (376-378)
  • get_date_time_parser (64-80)
datafusion/functions/src/datetime/mod.rs (2)
datafusion/functions/src/datetime/to_date.rs (3)
  • to_date (105-136)
  • default (76-78)
  • args (234-237)
datafusion/functions/src/datetime/now.rs (1)
  • default (50-52)
datafusion/functions/src/datetime/parser.rs (1)
datafusion/functions/src/datetime/to_date.rs (1)
  • new (87-89)
datafusion/functions/benches/to_timestamp.rs (2)
datafusion/functions/src/datetime/to_timestamp.rs (1)
  • to_timestamp (848-858)
datafusion/functions/src/datetime/parser.rs (2)
  • new (97-99)
  • new (376-378)
datafusion/functions/src/datetime/to_timestamp.rs (2)
datafusion/functions/src/datetime/parser.rs (3)
  • get_date_time_parser (64-80)
  • new (97-99)
  • new (376-378)
datafusion/functions/src/datetime/to_date.rs (2)
  • with_datetime_parser (99-103)
  • new (87-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Analyze (rust)
🔇 Additional comments (26)
datafusion/functions/src/datetime/common.rs (1)

291-324: LGTM on the array path accumulation logic.

The refactored approach correctly accumulates all format strings into a vector before invoking op once. This allows the parser implementation to try all formats and produce comprehensive error messages.

datafusion/core/Cargo.toml (1)

111-113: jiff feature re-export looks correct.
Not being part of default is the right posture (keeps optional dependency surface small).

datafusion-examples/examples/builtin_functions/date_time.rs (1)

413-422: Example expectation updated appropriately for “formats: [...]”.
No concerns with the new string shape in this example.

docs/source/user-guide/configs.md (1)

80-83: Docs entry for datafusion.execution.date_time_parser looks consistent.
Only nit: consider quoting values consistently (e.g. "chrono", "jiff"), but content is correct.

datafusion/sqllogictest/test_files/information_schema.slt (1)

216-223: SQL logic tests updated correctly for new config parameter visibility.

Also applies to: 356-360

datafusion/sqllogictest/test_files/to_timestamp_timezone.slt (1)

140-147: Updated error expectation matches new error message shape.

datafusion/functions/Cargo.toml (1)

55-57: jiff 0.2.18 is MSRV-compatible, but dyn-eq and dyn-hash lack published MSRVs.
jiff's MSRV (1.70.0) is well below DataFusion's (1.88.0). All three crates carry permissive licenses (jiff: Unlicense OR MIT; dyn-eq: MPL-2.0; dyn-hash: MIT OR Apache-2.0), compatible with Apache-2.0.
However, dyn-eq and dyn-hash do not publish documented MSRVs; verify these crates build on Rust 1.88.0. Additionally, jiff implements POSIX/GNU-compatible strftime with non-standard extensions (like %Q for IANA timezone IDs, %N, %q, %::z / %:::z); strftime format strings are mostly compatible with Chrono's but have differences. Ensure any documentation claiming exact parity between jiff and Chrono datetime formatting is reviewed.

datafusion/sqllogictest/test_files/datetime/dates.slt (2)

294-300: Updated to_date(..., '%q') failure expectation looks consistent with new formats-array messaging.
The escaping for ["%q"] and the input string looks correct for regex matching.


373-378: Potential brittleness: exact error block may diverge across parser backends / error paths.
Given the new date_time_parser selection and two implementations, please verify this query error block remains stable with both chrono default and jiff-enabled runs (if CI exercises both).

datafusion/sqllogictest/test_files/datetime/timestamps.slt (2)

2545-2564: Error expectations updated correctly for formats-array reporting across to_timestamp*.
Regex escaping for ["%q"] is consistent with other patterns in this file.


2666-2670: Potential brittleness: multi-line asserted error output could differ between chrono vs jiff implementations.
Please verify this remains stable (or relax to a regex match) when running tests with the optional jiff feature / parser selection.

datafusion/functions/src/datetime/mod.rs (3)

34-34: LGTM!

New parser module addition is appropriate for the DateTimeParser trait abstraction.


53-53: LGTM!

The to_date function is now consistently wired through config-based construction, matching the pattern used by other timestamp functions in this module.

Also applies to: 271-274


277-301: LGTM!

The functions() registry correctly passes the config reference to to_date, consistent with other config-aware functions like now, to_timestamp, etc.

datafusion/functions/benches/to_timestamp.rs (3)

30-34: LGTM!

Appropriate imports for the DateTimeParser trait and both implementations to enable comparative benchmarking.


53-170: LGTM!

The expanded data_with_formats() function properly handles format differences between chrono and jiff parsers (e.g., %+ vs %Y-%m-%dT%H:%M:%S%Q, %c vs %a %b %e %H:%M:%S %Y).


309-390: LGTM!

The UDF-based benchmarks correctly configure the parser via ConfigOptions and properly set up the ScalarFunctionArgs for both chrono and jiff parsers.

datafusion/functions/src/datetime/to_date.rs (3)

18-32: LGTM!

Appropriate imports added for the DateTimeParser abstraction and ConfigOptions.


105-136: LGTM!

The to_date method correctly delegates to self.parser.string_to_timestamp_millis_formatted for multi-argument cases, properly integrating the parser abstraction.


219-247: LGTM!

Tests appropriately updated to use new_with_config constructor.

datafusion/functions/src/datetime/to_timestamp.rs (6)

22-22: LGTM!

DateTimeParser import and field addition to ToTimestampFunc struct appropriately integrate the parser abstraction.

Also applies to: 88-93


294-339: LGTM with same note on builder pattern.

The macro-generated with_datetime_parser method has the same &mut self -> &Self return type pattern as to_date.rs. Consider changing to &mut Self for consistency with common builder patterns, but this is a minor point.


796-827: LGTM!

The #[expect(clippy::borrowed_box)] annotation is appropriate here - passing &Box<dyn DateTimeParser> is functionally equivalent to &dyn DateTimeParser and the lint suppression is justified for API consistency. The implementation correctly delegates to the parser for both single-argument and multi-format cases.


830-909: LGTM!

Test helper functions properly instantiate ChronoDateTimeParser and pass it through the parser abstraction, ensuring test coverage for the new parser-based implementation.


1520-1520: LGTM!

Updated expected error message correctly reflects the new format - errors now show the array of attempted formats ([\"%s\", \"%c\", \"%H:%M:%S\"]) rather than a single format string.


1567-1638: LGTM!

Test functions updated to use the parser-based API correctly, with ChronoDateTimeParser instantiated and passed to string_to_timestamp_nanos_formatted.

Comment on lines +24 to +63
/// A trait for parsing timestamps from strings. Two implementations are provided:
/// `ChronoDateTimeParser` (the default) which uses [Chrono] to parse timestamps, and
/// `JiffDateTimeParser` (via the `jiff` feature flag) which uses [Jiff] to parse
/// timestamps.
///
/// While both implementations are capable of parsing timestamps, the `ChronoDateTimeParser`
/// is a bit more lenient wrt formats at the cost of being slightly slower than Jiff.
/// Jiff, on the other hand, has slightly less support for some format options than Chrono
/// but is measurably faster than Chrono in some situations.
///
/// [Chrono]: https://docs.rs/chrono/latest/chrono/
/// [Jiff]: https://docs.rs/jiff/latest/jiff/
pub trait DateTimeParser: Debug + DynEq + DynHash + Send + Sync {
fn string_to_timestamp_nanos(
&self,
tz: &str,
s: &str,
) -> datafusion_common::Result<i64, DataFusionError>;

fn string_to_timestamp_nanos_formatted(
&self,
tz: &str,
s: &str,
formats: &[&str],
) -> datafusion_common::Result<i64, DataFusionError>;

fn string_to_timestamp_millis_formatted(
&self,
tz: &str,
s: &str,
formats: &[&str],
) -> datafusion_common::Result<i64, DataFusionError>;
}

// impl Eq and PartialEaq for dyn DateTimeParser
dyn_eq::eq_trait_object!(DateTimeParser);

// Implement std::hash::Hash for dyn DateTimeParser
dyn_hash::hash_trait_object!(DateTimeParser);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: typos + simplify Result types.

  • “PartialEaq” typo in comment (Line 58).
  • Consider standardizing signatures to datafusion_common::Result<i64> (the extra , DataFusionError is redundant and makes signatures noisier).
🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 24 - 63, The
comment contains a typo and the trait methods use redundant Result type
parameters: fix the comment "PartialEaq" to "PartialEq" and update the three
method signatures on the DateTimeParser trait (string_to_timestamp_nanos,
string_to_timestamp_nanos_formatted, string_to_timestamp_millis_formatted) to
return datafusion_common::Result<i64> (remove the explicit ", DataFusionError"
type argument). Ensure the surrounding dyn_eq::eq_trait_object!(DateTimeParser)
and dyn_hash::hash_trait_object!(DateTimeParser) usage remains unchanged.

Comment on lines +64 to +80
pub fn get_date_time_parser(config_options: &ConfigOptions) -> Box<dyn DateTimeParser> {
let parser_cfg = config_options.execution.date_time_parser.as_ref();

match parser_cfg {
Some(p) => match p.as_str() {
"chrono" => {
Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>
}
#[cfg(feature = "jiff")]
"jiff" => {
Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser>
}
_ => panic!("Unknown/unsupported date time parser: {p}"),
},
None => Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n datafusion/functions/src/datetime/parser.rs | sed -n '64,80p'

Repository: martin-augment/datafusion

Length of output: 194


🏁 Script executed:

rg -n --type=rust '\bget_date_time_parser\s*\('

Repository: martin-augment/datafusion

Length of output: 454


🏁 Script executed:

ls -la datafusion/functions/src/datetime/

Repository: martin-augment/datafusion

Length of output: 1394


🏁 Script executed:

wc -l datafusion/functions/src/datetime/parser.rs

Repository: martin-augment/datafusion

Length of output: 116


🏁 Script executed:

cat -n datafusion/functions/src/datetime/parser.rs | sed -n '64,80p'

Repository: martin-augment/datafusion

Length of output: 879


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '90,98p'

Repository: martin-augment/datafusion

Length of output: 410


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '320,328p'

Repository: martin-augment/datafusion

Length of output: 447


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '82,105p'

Repository: martin-augment/datafusion

Length of output: 1092


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '310,335p'

Repository: martin-augment/datafusion

Length of output: 1255


🏁 Script executed:

rg -A 5 "impl.*ToDate\|struct ToDate" datafusion/functions/src/datetime/to_date.rs | head -20

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n --type=rust 'new_with_config' datafusion/functions/src/datetime/ | head -20

Repository: martin-augment/datafusion

Length of output: 2519


🏁 Script executed:

rg -n --type=rust 'panic!' datafusion/functions/src/datetime/parser.rs

Repository: martin-augment/datafusion

Length of output: 141


🏁 Script executed:

rg -B 2 -A 2 'struct.*DateTimeParser\|pub struct' datafusion/functions/src/datetime/to_date.rs | head -30

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '60,80p'

Repository: martin-augment/datafusion

Length of output: 919


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '280,310p'

Repository: martin-augment/datafusion

Length of output: 1703


🏁 Script executed:

rg -n 'exec_err!' datafusion/functions/src/datetime/ | head -10

Repository: martin-augment/datafusion

Length of output: 993


🏁 Script executed:

rg -n 'ConfigOptions' datafusion/functions/src/datetime/parser.rs | head -5

Repository: martin-augment/datafusion

Length of output: 209


🏁 Script executed:

rg -B 5 'ToDateFunc::new_with_config\|ToTimestampFunc::new_with_config' datafusion/functions/src/datetime/ | head -40

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n 'date_time_parser' --type=rust datafusion/functions/src/ | grep -E '\.(rs|toml):'

Repository: martin-augment/datafusion

Length of output: 574


🏁 Script executed:

rg -n 'impl.*ToTimestampFunc\|fn.*to_timestamp_impl' datafusion/functions/src/datetime/to_timestamp.rs | head -10

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '390,410p'

Repository: martin-augment/datafusion

Length of output: 885


🏁 Script executed:

cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '1,30p' | grep -E 'use|mod'

Repository: martin-augment/datafusion

Length of output: 468


🏁 Script executed:

rg -n 'with_updated_config' datafusion/functions/src/datetime/to_timestamp.rs | head -5

Repository: martin-augment/datafusion

Length of output: 360


🏁 Script executed:

grep -n 'date_time_parser' datafusion/core/src/config.rs 2>/dev/null | head -10

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n 'date_time_parser.*jiff\|jiff.*date_time_parser' datafusion/ --type=rust 2>/dev/null | head -10

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

find datafusion -name '*.rs' -type f -exec grep -l 'get_date_time_parser\|new_with_config' {} \; | head -20

Repository: martin-augment/datafusion

Length of output: 2719


Replace panic! with DataFusionError for unknown date_time_parser config.

The panic! at line 76 is triggered by user-controlled config (e.g., date_time_parser="jiff" with the feature disabled or any unknown value). This should return an error instead of crashing. The proposed fix to return Result is correct, but note that this requires changing the return type of new_with_config() methods in ToDateFunc and all ToTimestamp* functions, as these currently return Self directly. The changes would need to propagate through the macro-generated constructors and their call sites.

Proposed fix (change factory to return a Result)
-pub fn get_date_time_parser(config_options: &ConfigOptions) -> Box<dyn DateTimeParser> {
+pub fn get_date_time_parser(
+    config_options: &ConfigOptions,
+) -> datafusion_common::Result<Box<dyn DateTimeParser>> {
     let parser_cfg = config_options.execution.date_time_parser.as_ref();

     match parser_cfg {
         Some(p) => match p.as_str() {
-            "chrono" => {
-                Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>
-            }
+            "chrono" => Ok(Box::new(chrono::ChronoDateTimeParser::new())
+                as Box<dyn DateTimeParser>),
             #[cfg(feature = "jiff")]
-            "jiff" => {
-                Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser>
-            }
-            _ => panic!("Unknown/unsupported date time parser: {p}"),
+            "jiff" => Ok(Box::new(jiff::JiffDateTimeParser::new())
+                as Box<dyn DateTimeParser>),
+            _ => datafusion_common::exec_err!("Unknown/unsupported date time parser: {p}"),
         },
-        None => Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>,
+        None => Ok(Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>),
     }
 }
🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 64 - 80, Change
get_date_time_parser to return Result<Box<dyn DateTimeParser>, DataFusionError>
instead of panicking on unknown/unsupported parser: replace the
panic!("Unknown/unsupported date time parser: {p}") branch with an
Err(DataFusionError::Plan(format!(...))) (or other appropriate DataFusionError)
and adjust the None branch to return Ok(Box::new(...)). Update all call sites
that use get_date_time_parser to handle the Result. Additionally change the
constructors/new_with_config() signatures used by
chrono::ChronoDateTimeParser::new(), jiff::JiffDateTimeParser::new() and any
macro-generated constructors (notably new_with_config() used by ToDateFunc and
all ToTimestamp* functions) to return Result<Self, DataFusionError> and
propagate the Result through their callers and macro expansions so the factory
can construct parsers via ?.

Comment on lines +120 to +148
// we manually handle the most common case of a named timezone at the end of the timestamp
// since chrono doesn't support that directly. Note, however, that %+ does handle 'Z' at the
// end of the string.
//
// This code doesn't handle named timezones with no preceding space since that would require
// writing a custom parser.
let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") {
// grab the string after the last space as the named timezone
let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
let timezone_name = parts[0];
datetime_str = parts[1];

// attempt to parse the timezone name
let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse();
let Ok(tz) = result else {
err = Some(result.unwrap_err().to_string());
continue;
};

// successfully parsed the timezone name, remove the ' %Z' from the format
format = &format[..format.len() - 3];

Some(tz)
} else if format.contains("%Z") {
err = Some("'%Z' is only supported at the end of the format string preceded by a space".into());
continue;
} else {
None
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Chrono %Z parsing can panic on malformed input (missing space).
When the format ends with " %Z" (Line 126), the code assumes the input has a space before the timezone and does parts[1] (Line 130). If the input is missing that space, this will panic (index out of bounds) instead of returning an execution error.

Proposed fix (avoid Vec + avoid indexing)
-                    let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
-                    let timezone_name = parts[0];
-                    datetime_str = parts[1];
+                    let Some((datetime_part, timezone_name)) = datetime_str.rsplit_once(' ') else {
+                        err = Some("expected a space before the timezone name".to_string());
+                        continue;
+                    };
+                    datetime_str = datetime_part;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// we manually handle the most common case of a named timezone at the end of the timestamp
// since chrono doesn't support that directly. Note, however, that %+ does handle 'Z' at the
// end of the string.
//
// This code doesn't handle named timezones with no preceding space since that would require
// writing a custom parser.
let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") {
// grab the string after the last space as the named timezone
let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
let timezone_name = parts[0];
datetime_str = parts[1];
// attempt to parse the timezone name
let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse();
let Ok(tz) = result else {
err = Some(result.unwrap_err().to_string());
continue;
};
// successfully parsed the timezone name, remove the ' %Z' from the format
format = &format[..format.len() - 3];
Some(tz)
} else if format.contains("%Z") {
err = Some("'%Z' is only supported at the end of the format string preceded by a space".into());
continue;
} else {
None
};
let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") {
// grab the string after the last space as the named timezone
let Some((datetime_part, timezone_name)) = datetime_str.rsplit_once(' ') else {
err = Some("expected a space before the timezone name".to_string());
continue;
};
datetime_str = datetime_part;
// attempt to parse the timezone name
let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse();
let Ok(tz) = result else {
err = Some(result.unwrap_err().to_string());
continue;
};
// successfully parsed the timezone name, remove the ' %Z' from the format
format = &format[..format.len() - 3];
Some(tz)
} else if format.contains("%Z") {
err = Some("'%Z' is only supported at the end of the format string preceded by a space".into());
continue;
} else {
None
};
🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 120 - 148, The code
assumes a space exists before the timezone when format.ends_with(" %Z") and
directly indexes parts[1], which panics if the input lacks that space; change
the parsing to avoid Vec/indexing by using the rsplitn iterator and pattern
matching: call let mut parts = datetime_str.rsplitn(2, ' '); match
(parts.next(), parts.next()) { (Some(timezone_name), Some(rest)) => {
datetime_str = rest; let result: Result<chrono_tz::Tz, ParseError> =
timezone_name.parse(); let Ok(tz) = result else { err =
Some(result.unwrap_err().to_string()); continue; }; format =
&format[..format.len() - 3]; Some(tz) }, _ => { err = Some("expected a space
before timezone when format ends with ' %Z'".into()); continue; } }, ensuring
you reference the existing variables format, datetime_str, tz, result and err.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.

Comment on lines +357 to +702
#[cfg(feature = "jiff")]
pub mod jiff {
use crate::datetime::parser::DateTimeParser;
use arrow::array::timezone::Tz;
use arrow::error::ArrowError;
use chrono::format::{Parsed, StrftimeItems, parse};
use datafusion_common::{DataFusionError, exec_datafusion_err, exec_err};
use jiff::civil::Time;
use jiff::fmt::temporal::Pieces;
use jiff::tz::{Disambiguation, TimeZone};
use jiff::{Timestamp, Zoned};
use num_traits::ToPrimitive;

const ERR_NANOSECONDS_NOT_SUPPORTED: &str = "The dates that can be represented as nanoseconds have to be between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.854775804";

#[derive(Debug, PartialEq, Eq, Hash)]
pub struct JiffDateTimeParser {}

impl JiffDateTimeParser {
pub fn new() -> Self {
Self {}
}

/// Attempts to parse a given string representation of a timestamp into a `Timestamp` object.
/// The function also adjusts the datetime for the specified timezone.
///
/// # Parameters
/// - `timezone`: A reference to the `TimeZone` object used to adjust the parsed datetime to the desired timezone.
/// - `s`: A string slice holding the timestamp to be parsed.
///
/// # Returns
/// - `Ok(Timestamp)`: Contains the parsed timestamp in seconds since the Unix epoch.
/// - `Err(ArrowError)`: Returned in the event of errors in parsing
/// the timestamp string or computing the timezone offset.
///
/// # Errors
/// This function will return an `ArrowError` if:
///
/// - The string `s` is shorter than 10 characters.
/// - The format of the string does not match expected timestamp patterns.
/// - An invalid/unknown time zone or offset is provided during parsing.
/// - Errors occur while converting the datetime to a specific time zone.
fn string_to_datetime(
&self,
timezone: &TimeZone,
s: &str,
) -> datafusion_common::Result<Timestamp, ArrowError> {
let err = |ctx: &str| {
ArrowError::ParseError(format!(
"Error parsing timestamp from '{s}': {ctx}"
))
};

let bytes = s.as_bytes();
if bytes.len() < 10 {
return Err(err("timestamp must contain at least 10 characters"));
}

let pieces = Pieces::parse(bytes).map_err(|e| err(&format!("{e:?}")))?;
let time = pieces.time().unwrap_or_else(Time::midnight);
let dt = pieces.date().to_datetime(time);
let tz = match pieces
.to_time_zone()
.map_err(|e| err(&format!("unknown time zone: {e:?}")))?
{
Some(tz) => tz,
None => match pieces.to_numeric_offset() {
Some(offset) => TimeZone::fixed(offset),
None => timezone.to_owned(),
},
};
let zdt = tz
.to_zoned(dt)
.map_err(|e| err(&format!("error computing timezone offset: {e:?}")))?;

Ok(zdt.timestamp())
}

/// Attempts to parse a given string representation of a timestamp into a `Zoned` datetime object
/// using a list of provided formats. The function also adjusts the datetime for the specified timezone.
///
/// # Parameters
/// - `timezone`: A reference to the `TimeZone` object used to adjust the parsed datetime to the desired timezone.
/// - `s`: A string slice holding the timestamp to be parsed.
/// - `formats`: A slice of string slices representing the accepted datetime formats to use when parsing.
///
/// # Returns
/// - `Ok(Zoned)` if the string is successfully parsed into a `Zoned` datetime object using one of the formats provided.
/// - `Err(DataFusionError)` if the string cannot be parsed or if there are issues related to the timezone or format handling.
///
/// # Behavior
/// 1. Iterates through the list of provided formats.
/// 2. Handles special cases such as `%Z` (timezone) or `%c` (locale-aware datetime representation)
/// by either modifying the format string or switching to different parsing logic.
/// 3. Attempts to resolve timezone-related issues:
/// - Calculates a fixed offset for the given `timezone`.
/// - Handles attempts to parse timezones using IANA names or offsets.
/// 4. If parsing fails for the current format, the function moves to the next format and aggregates any parsing errors.
/// 5. If no formats succeed, an error is returned summarizing the failure.
///
/// # Errors
/// - Returns a descriptive error wrapped in `DataFusionError` if:
/// - The input string is not compatible with any of the provided formats.
/// - There are issues parsing the timezone from the input string or adjusting to the provided `TimeZone`.
/// - There are issues resolving ambiguous datetime representations.
fn string_to_datetime_formatted(
&self,
timezone: &TimeZone,
s: &str,
formats: &[&str],
) -> datafusion_common::Result<Zoned, DataFusionError> {
let mut err: Option<String> = None;

for &format in formats.iter() {
let mut format = if format.contains("%Z") {
&format.replace("%Z", "%Q")
} else {
format
};

if format == "%c" {
// switch to chrono, jiff doesn't support parsing with %c
let mut parsed = Parsed::new();
let result = parse(&mut parsed, s, StrftimeItems::new(format));

if result.is_err() {
err = Some(format!("error parsing timestamp: {result:?}"));
continue;
}

let offset = timezone.to_fixed_offset();
let tz: &str = if let Ok(offset) = offset {
&offset.to_string()
} else {
let result = timezone.iana_name();
match result {
Some(tz) => tz,
None => {
err =
Some(format!("error parsing timezone: {timezone:?}"));
continue;
}
}
};
let tz: Tz = tz.parse::<Tz>().ok().unwrap_or("UTC".parse::<Tz>()?);

match parsed.to_datetime_with_timezone(&tz) {
Ok(dt) => {
let dt = dt.fixed_offset();
let result =
Timestamp::from_nanosecond(dt.timestamp() as i128);
match result {
Ok(result) => {
return Ok(result.to_zoned(timezone.to_owned()));
}
Err(e) => {
err = Some(e.to_string());
continue;
}
}
}
Err(e) => {
err = Some(e.to_string());
continue;
}
};
}

let result = if format == "%+" {
// jiff doesn't support parsing with %+, but the equivalent of %+ is
// somewhat rfc3389 which is what is used by the parse method
let result: Result<Zoned, _> = s.parse();

if let Ok(r) = result {
return Ok(r);
} else {
// however, the above only works with timezone names, not offsets
let s = if s.ends_with("Z") {
&(s.trim_end_matches("Z").to_owned() + "+00:00")
} else {
s
};

// try again with fractional seconds
format = "%Y-%m-%dT%H:%M:%S%.f%:z";

jiff::fmt::strtime::parse(format, s)
}
} else {
jiff::fmt::strtime::parse(format, s)
};

match result {
Ok(bdt) => {
if bdt.iana_time_zone().is_some() || bdt.offset().is_some() {
if let Ok(zoned) = bdt.to_zoned() {
return Ok(zoned.with_time_zone(timezone.to_owned()));
}
}

let result = bdt.to_datetime();
let datetime = match result {
Ok(datetime) => datetime,
Err(e) => {
err = Some(e.to_string());
continue;
}
};

let zoned = timezone
.to_owned()
.into_ambiguous_zoned(datetime)
.disambiguate(Disambiguation::Compatible);

match zoned {
Ok(z) => return Ok(z),
Err(e) => {
err = Some(e.to_string());
continue;
}
}
}
Err(e) => {
err = Some(e.to_string());
continue;
}
}
}

match err {
Some(e) => exec_err!("{e}"),
None => {
exec_err!("Unable to parse timestamp: {s} with formats: {formats:?}")
}
}
}
}

impl DateTimeParser for JiffDateTimeParser {
/// Accepts a string with a `jiff` format and converts it to a
/// nanosecond precision timestamp according to the rules
/// defined by `jiff`.
///
/// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications>
/// for the full set of supported formats.
///
/// ## Timestamp Precision
///
/// This function uses the maximum precision timestamps supported by
/// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This
/// means the range of dates that timestamps can represent is ~1677 AD
/// to 2262 AM.
///
/// ## Timezone / Offset Handling
///
/// Numerical values of timestamps are stored compared to offset UTC.
#[inline]
fn string_to_timestamp_nanos(
&self,
tz: &str,
s: &str,
) -> Result<i64, DataFusionError> {
let result = TimeZone::get(tz.as_ref());
let timezone = match result {
Ok(tz) => tz,
Err(e) => return exec_err!("Invalid timezone {tz}: {e}"),
};

let timestamp = self.string_to_datetime(&timezone, s)?;
let parsed = timestamp
.as_nanosecond()
.to_i64()
.ok_or_else(|| exec_datafusion_err!("{ERR_NANOSECONDS_NOT_SUPPORTED}"))?;

Ok(parsed)
}

/// Accepts a string with an array of `jiff` formats and converts it to a
/// nanosecond precision timestamp according to the rules
/// defined by `jiff`.
///
/// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications>
/// for the full set of supported formats.
///
/// ## Timestamp Precision
///
/// This function uses the maximum precision timestamps supported by
/// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This
/// means the range of dates that timestamps can represent is ~1677 AD
/// to 2262 AM.
///
/// ## Timezone / Offset Handling
///
/// Numerical values of timestamps are stored compared to offset UTC.
#[inline]
fn string_to_timestamp_nanos_formatted(
&self,
tz: &str,
s: &str,
formats: &[&str],
) -> Result<i64, DataFusionError> {
let result = TimeZone::get(tz.as_ref());
let timezone = match result {
Ok(tz) => tz,
Err(e) => return exec_err!("Invalid timezone {tz}: {e}"),
};

let zoned = self.string_to_datetime_formatted(&timezone, s, formats)?;
let parsed =
zoned.timestamp().as_nanosecond().to_i64().ok_or_else(|| {
exec_datafusion_err!("{ERR_NANOSECONDS_NOT_SUPPORTED}")
})?;

Ok(parsed)
}

/// Accepts a string with an array of `jiff` formats and converts it to a
/// millisecond precision timestamp according to the rules
/// defined by `jiff`.
///
/// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications>
/// for the full set of supported formats.
///
/// ## Timezone / Offset Handling
///
/// Numerical values of timestamps are stored compared to offset UTC.
#[inline]
fn string_to_timestamp_millis_formatted(
&self,
tz: &str,
s: &str,
formats: &[&str],
) -> Result<i64, DataFusionError> {
let result = TimeZone::get(tz.as_ref());
let timezone = match result {
Ok(tz) => tz,
Err(e) => return exec_err!("Invalid timezone {tz}: {e}"),
};

let dt = self.string_to_datetime_formatted(&timezone, s, formats)?;
let parsed = dt.timestamp().as_millisecond();

Ok(parsed)
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and look for the Chrono implementation to compare error messages
cd datafusion/functions && find . -name "parser.rs" -type f | head -5

Repository: martin-augment/datafusion

Length of output: 93


🌐 Web query:

Jiff strtime parsing timezone directive %Q documentation and supported format codes

💡 Result:

Summary — %Q in Jiff

  • %Q (and %:Q) produce or parse an IANA time‑zone identifier when available; if no identifier exists they fall back to a numeric offset (%z or %:z respectively). [1]
  • %Q and %:Q require the crate's alloc feature; they are unavailable when alloc is disabled. [1]

Relevant timezone/offset/zone-format specifiers

  • %Q — IANA tz id, or %z fallback.
  • %:Q — IANA tz id, or %:z fallback (colon in offset).
  • %Z — time‑zone abbreviation (formatting only; not parsable).
  • %z — numeric offset as [+-]HHMM[SS].
  • %:z — numeric offset as [+-]HH:MM[:SS].
  • %::z — numeric offset as [+-]HH:MM:SS.
  • %:::z — compact numeric offset variants like -04 or +05:30.
    (Parsing/formatting support and exact behaviors are documented in the strtime module.) [1][2]

Other notes

  • Jiff supports a broad set of strftime/strptime conversion specifiers (week/day/month/year/time, %s Unix timestamp, %Q/%:Q, etc.); see the conversion‑specifiers table in the fmt::strtime docs for the full list and parsing vs. formatting differences. [1]

Sources
[1] jiff::fmt::strtime documentation (conversion specifiers, %Q notes).
[2] jiff crate overview/examples (shows %Q usage).


🏁 Script executed:

# Let's examine the parser.rs file around the Chrono implementation to see error message format
cat -n datafusion/functions/src/datetime/parser.rs | head -100

Repository: martin-augment/datafusion

Length of output: 4616


🏁 Script executed:

# Let's see the Chrono implementation to compare error message handling
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '82,356p'

Repository: martin-augment/datafusion

Length of output: 14113


🏁 Script executed:

# Now let's check the specific lines of the Jiff implementation mentioned in the review
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '357,702p'

Repository: martin-augment/datafusion

Length of output: 17028


🏁 Script executed:

# Let's verify if the code compiles - check if there are any test files or build artifacts
cd datafusion/functions && cargo check --features jiff 2>&1 | head -200

Repository: martin-augment/datafusion

Length of output: 178


Fix invalid temporary borrows and align error message format with Chrono implementation.

The code has three invalid lifetime issues where references are created to temporary strings:

  • Line 471–475: &format.replace("%Z", "%Q") creates a reference to a temporary
  • Line 488–490: &offset.to_string() creates a reference to a temporary
  • Line 534–539: &(s.trim_end_matches("Z").to_owned() + "+00:00") creates a reference to a temporary

Use Cow<'_, str> for format and owned String for timezone and offset strings to preserve lifetimes. Additionally, unify error messages with Chrono's format. Chrono returns "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}" when an error occurs (lines 217–218), but Jiff returns raw {e} (line 587), which creates inconsistent behavior when users switch parsers.

The %Q directive is correct for Jiff—it parses IANA timezone identifiers with fallback to numeric offset.

Proposed fix
 pub mod jiff {
+    use std::borrow::Cow;
     use crate::datetime::parser::DateTimeParser;
     use arrow::array::timezone::Tz;
     use arrow::error::ArrowError;
     use chrono::format::{Parsed, StrftimeItems, parse};
     use datafusion_common::{DataFusionError, exec_datafusion_err, exec_err};
@@ -469,10 +470,12 @@ pub mod jiff {
             let mut err: Option<String> = None;
 
             for &format in formats.iter() {
-                let mut format = if format.contains("%Z") {
-                    &format.replace("%Z", "%Q")
+                let mut format: Cow<'_, str> = if format.contains("%Z") {
+                    Cow::Owned(format.replace("%Z", "%Q"))
                 } else {
-                    format
+                    Cow::Borrowed(format)
                 };
 
-                if format == "%c" {
+                if format.as_ref() == "%c" {
                     // switch to chrono, jiff doesn't support parsing with %c
                     let mut parsed = Parsed::new();
-                    let result = parse(&mut parsed, s, StrftimeItems::new(format));
+                    let result = parse(&mut parsed, s, StrftimeItems::new(format.as_ref()));
 
                     if result.is_err() {
                         err = Some(format!("error parsing timestamp: {result:?}"));
@@ -486,12 +489,13 @@ pub mod jiff {
                     }
 
                     let offset = timezone.to_fixed_offset();
-                    let tz: &str = if let Ok(offset) = offset {
-                        &offset.to_string()
+                    let tz_owned: String;
+                    let tz: &str = if let Ok(offset) = offset {
+                        tz_owned = offset.to_string();
+                        &tz_owned
                     } else {
                         let result = timezone.iana_name();
                         match result {
                             Some(tz) => tz,
                             None => {
@@ -524,10 +528,12 @@ pub mod jiff {
                     };
                 }
 
-                let result = if format == "%+" {
+                let result = if format.as_ref() == "%+" {
                     // jiff doesn't support parsing with %+, but the equivalent of %+ is
                     // somewhat rfc3389 which is what is used by the parse method
                     let result: Result<Zoned, _> = s.parse();
 
                     if let Ok(r) = result {
                         return Ok(r);
                     } else {
                         // however, the above only works with timezone names, not offsets
-                        let s = if s.ends_with("Z") {
-                            &(s.trim_end_matches("Z").to_owned() + "+00:00")
+                        let s: Cow<'_, str> = if s.ends_with('Z') {
+                            Cow::Owned(format!("{}+00:00", s.trim_end_matches('Z')))
                         } else {
-                            s
+                            Cow::Borrowed(s)
                         };
 
                         // try again with fractional seconds
-                        format = "%Y-%m-%dT%H:%M:%S%.f%:z";
+                        format = Cow::Borrowed("%Y-%m-%dT%H:%M:%S%.f%:z");
 
-                        jiff::fmt::strtime::parse(format, s)
+                        jiff::fmt::strtime::parse(format.as_ref(), s.as_ref())
                     }
                 } else {
-                    jiff::fmt::strtime::parse(format, s)
+                    jiff::fmt::strtime::parse(format.as_ref(), s)
                 };
 
                 match result {
@@ -586,10 +594,12 @@ pub mod jiff {
             }
 
             match err {
-                Some(e) => exec_err!("{e}"),
+                Some(e) => exec_err!(
+                    "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}"
+                ),
                 None => {
-                    exec_err!("Unable to parse timestamp: {s} with formats: {formats:?}")
+                    exec_err!("Error parsing timestamp from '{s}' using formats: {formats:?}")
                 }
             }
         }

@martin-augment
Copy link
Owner Author

1. Panic in production code (High Priority)

Location: datafusion/functions/src/datetime/parser.rs:76

_ => panic!("Unknown/unsupported date time parser: {p}"),

Issue: Using panic! for invalid configuration can crash the application. This should return a proper error.

Recommendation:

_ => return exec_err!("Unknown/unsupported date time parser: {p}. Supported: chrono, jiff"),

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Crashing the application by using panic!() is usually not desired but here the crash is caused due to misconfiguration, so the application will stop early with an explanation that there is no such parser with that name. Still, it would be better to return an Err and fail in more friendly manner than a crash.

@martin-augment
Copy link
Owner Author

2. Trait object equality/hashing concerns (Medium Priority)

Location: Uses dyn-eq and dyn-hash crates for trait object comparison

Issue: The DateTimeParser trait requires DynEq + DynHash to enable comparison of trait objects. While functional, this adds external dependencies and may have subtle edge cases.

Questions:

  • Are these crates widely used/maintained?
  • Is equality comparison actually needed for the parser? Could we use Arc::ptr_eq instead?
  • The trait objects are compared in PartialEq implementations - is this for caching/memoization?

Recommendation: Consider if equality/hash is truly necessary, or if identity comparison would suffice.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Using Arc::ptr_eq() should be sufficient for this case because the datetime parsers are stateless and there is no need of deep object equality. Both dyn-eq and dyn-hash are relatively known in the Rust ecosystem, so their usage is justified too.

@martin-augment
Copy link
Owner Author

3. Inconsistent error messages (Low Priority)

Location: datafusion/functions/src/datetime/parser.rs:218-222

The error messages differ between Chrono and Jiff:

  • Chrono: "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}"
  • Jiff: Just "{e}" (line 587)

Recommendation: Standardize error message format across implementations for better UX.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The error message for the Jiff implementation does not provide the problematic datetime and the attempted formats as a context to the Jiff's error message and this makes it harder to debug and fix than the Chrono error message.

@martin-augment
Copy link
Owner Author

4. Benchmark data validity (Low Priority)

Location: datafusion/functions/benches/to_timestamp.rs:63

inputs.append_value("1997-01-31T09:26:56UTC");

format1_builder.append_value("%m");  // Won't match full timestamp
format2_builder.append_value("%H");  // Won't match full timestamp  
format3_builder.append_value("%+");

Issue: The first two formats intentionally don't match the input, so they'll always fail. This benchmarks error handling rather than successful parsing.

Recommendation: Use valid formats for all entries, or clearly document that this benchmarks format fallback behavior.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! The benchmark should not test the error paths. Only the happy path makes sense to be tested in a benchmark. In addition any errors are currently printed with println!() which additionally slows down the benchmark.

@martin-augment
Copy link
Owner Author

5. Missing feature flag guards (Low Priority)

Location: datafusion/functions/src/datetime/parser.rs:74

#[cfg(feature = "jiff")]
"jiff" => {
    Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser>
}

Issue: If someone sets date_time_parser = "jiff" without enabling the feature flag, they'll hit the panic mentioned in issue #1.

Recommendation: Add a helpful compile-time error or runtime message:

#[cfg(not(feature = "jiff"))]
"jiff" => {
    return exec_err!("jiff parser requested but 'jiff' feature is not enabled. Enable with --features jiff");
}

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Currently the "jiff" march arm is available only if the "jiff" feature is enabled. Trying to use it will fail with an error saying that only "chrono" and "jiff" are supported. Prevents a confusion in the developer/admin of the application who tries to use the "jiff" parser and the error says that there is no such but it is a valid value.

@martin-augment
Copy link
Owner Author

6. Typo in comment (Trivial)

Location: datafusion/functions/src/datetime/parser.rs:58

// impl Eq and PartialEaq for dyn DateTimeParser

Should be PartialEq not PartialEaq.

value:good-to-have; category:documentation; feedback:The Claude AI reviewer is correct! There is a minor typo in the name of the PartialEq trait in an inline comment. It does not do any harm but it would be nice to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments