19755: Extract out date time parsing to a DateTimeParser trait#177
19755: Extract out date time parsing to a DateTimeParser trait#177martin-augment wants to merge 1 commit intomainfrom
Conversation
… jiff implementations. Overall jiff seems slightly faster in the case where parsing has no errors but slower when using multiple formats (higher cost for Error handling).
WalkthroughThis change introduces a pluggable date-time parsing abstraction system to DataFusion. A new ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review: Extract datetime parsing to DateTimeParser traitSummaryThis PR introduces a Strengths ✅
Issues & Concerns
|
🤖 Augment PR SummarySummary: This PR extracts datetime parsing into a dedicated Changes:
Technical Notes: Jiff support is opt-in via Cargo features, and the parser choice is read from 🤖 Was this summary useful? React with 👍 or 👎 |
| // writing a custom parser. | ||
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | ||
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); |
There was a problem hiding this comment.
parts[1] will panic if the input string doesn’t contain a space (e.g. format ends with " %Z" but the value is missing the trailing timezone), turning a parse error into a hard crash. Consider handling the “no separator / missing timezone” case explicitly and returning an error instead of indexing into parts.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.
| } else { | ||
| val = Some(r); | ||
| } | ||
| let r = op(x, &v); |
There was a problem hiding this comment.
If all provided format arguments are NULL for a given row, v becomes empty and op(x, &v) gets called anyway, which likely returns an error; previously this path would propagate NULL without attempting to parse. Consider short-circuiting when v.is_empty() to preserve NULL semantics for nullable format arguments (also applies to other varargs datetime functions using this helper).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! There is no point in calling the parse function if there are no formats to use. The logic should return an Err early with an explanation that a format is missing.
| Ok(dt) => { | ||
| let dt = dt.fixed_offset(); | ||
| let result = | ||
| Timestamp::from_nanosecond(dt.timestamp() as i128); |
There was a problem hiding this comment.
Jiff parser uses seconds instead of nanoseconds
High Severity
In the JiffDateTimeParser::string_to_datetime_formatted method, when handling the %c format, chrono's dt.timestamp() returns seconds since epoch but is passed to Timestamp::from_nanosecond() which expects nanoseconds. This causes parsed timestamps to be off by a factor of 1 billion, resulting in completely incorrect dates (e.g., a year 2001 date would become approximately 1970). The code should use dt.timestamp_nanos_opt() instead of dt.timestamp().
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Cursor AI reviewer is correct! chrono::DateTime::timestamp() returns the seconds since epoch, not the nanoseconds, so it is invalid input. Prevents using a wrong value for instantiating a Timestamp object and this will lead to wrong calculations afterwards.
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | ||
| let timezone_name = parts[0]; | ||
| datetime_str = parts[1]; |
There was a problem hiding this comment.
Chrono parser panics when input lacks expected space
High Severity
When the format string ends with " %Z" but the input datetime_str contains no space, rsplitn(2, ' ') returns a single-element vector. Accessing parts[1] then causes an index out of bounds panic. The old code safely used rsplit_once(' ') which returns an Option and handled the no-space case gracefully. This is a regression that can crash query execution when users provide mismatched format strings and input.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
datafusion/functions/src/datetime/common.rs (1)
167-194: Scalar path passes formats one-at-a-time, inconsistent with array path.The scalar handling loop calls
op(a, &[s.as_str()])with a single-element slice for each format argument, iterating externally. However, the array path inhandle_array_op(line 319) collects all formats into a vector and callsop(x, &v)once with all formats.This inconsistency means:
- Scalar inputs:
opis invoked multiple times, each with one format- Array inputs:
opis invoked once with all formatsIf the parser implementation depends on seeing all formats at once (e.g., for error messages listing all attempted formats), behavior will diverge between scalar and array inputs.
Proposed fix to align scalar path with array path
ColumnarValue::Scalar(scalar) => match scalar.try_as_str() { Some(a) => { let a = a.as_ref(); // ASK: Why do we trust `a` to be non-null at this point? let a = unwrap_or_internal_err!(a); - let mut ret = None; + let mut formats: Vec<&str> = Vec::new(); for (pos, v) in args.iter().enumerate().skip(1) { let ColumnarValue::Scalar( ScalarValue::Utf8View(x) | ScalarValue::LargeUtf8(x) | ScalarValue::Utf8(x), ) = v else { return exec_err!( "Unsupported data type {v:?} for function {name}, arg # {pos}" ); }; if let Some(s) = x { - match op(a, &[s.as_str()]) { - Ok(r) => { - let result = op2(r).to_i64(); - let s = scalar_value(dt, result)?; - ret = Some(Ok(ColumnarValue::Scalar(s))); - break; - } - Err(e) => ret = Some(Err(e)), - } + formats.push(s.as_str()); } } - unwrap_or_internal_err!(ret) + if formats.is_empty() { + return exec_err!("No valid format strings provided for function {name}"); + } + + let r = op(a, &formats)?; + let result = op2(r).to_i64(); + let s = scalar_value(dt, result)?; + Ok(ColumnarValue::Scalar(s)) }
🤖 Fix all issues with AI agents
In @datafusion/functions/src/datetime/parser.rs:
- Around line 24-63: The comment contains a typo and the trait methods use
redundant Result type parameters: fix the comment "PartialEaq" to "PartialEq"
and update the three method signatures on the DateTimeParser trait
(string_to_timestamp_nanos, string_to_timestamp_nanos_formatted,
string_to_timestamp_millis_formatted) to return datafusion_common::Result<i64>
(remove the explicit ", DataFusionError" type argument). Ensure the surrounding
dyn_eq::eq_trait_object!(DateTimeParser) and
dyn_hash::hash_trait_object!(DateTimeParser) usage remains unchanged.
- Around line 120-148: The code assumes a space exists before the timezone when
format.ends_with(" %Z") and directly indexes parts[1], which panics if the input
lacks that space; change the parsing to avoid Vec/indexing by using the rsplitn
iterator and pattern matching: call let mut parts = datetime_str.rsplitn(2, '
'); match (parts.next(), parts.next()) { (Some(timezone_name), Some(rest)) => {
datetime_str = rest; let result: Result<chrono_tz::Tz, ParseError> =
timezone_name.parse(); let Ok(tz) = result else { err =
Some(result.unwrap_err().to_string()); continue; }; format =
&format[..format.len() - 3]; Some(tz) }, _ => { err = Some("expected a space
before timezone when format ends with ' %Z'".into()); continue; } }, ensuring
you reference the existing variables format, datetime_str, tz, result and err.
- Around line 64-80: Change get_date_time_parser to return Result<Box<dyn
DateTimeParser>, DataFusionError> instead of panicking on unknown/unsupported
parser: replace the panic!("Unknown/unsupported date time parser: {p}") branch
with an Err(DataFusionError::Plan(format!(...))) (or other appropriate
DataFusionError) and adjust the None branch to return Ok(Box::new(...)). Update
all call sites that use get_date_time_parser to handle the Result. Additionally
change the constructors/new_with_config() signatures used by
chrono::ChronoDateTimeParser::new(), jiff::JiffDateTimeParser::new() and any
macro-generated constructors (notably new_with_config() used by ToDateFunc and
all ToTimestamp* functions) to return Result<Self, DataFusionError> and
propagate the Result through their callers and macro expansions so the factory
can construct parsers via ?.
🧹 Nitpick comments (6)
datafusion/functions/Cargo.toml (1)
122-124: Consider loosening bench feature requirements (optional).
Requiringjifffor theto_timestampbench blocks running benches in chrono-only builds; if the bench can conditionally include jiff comparisons when enabled, it’s nicer to keepjiffoptional here.datafusion/common/src/config.rs (1)
508-516: Normalize/validatedatafusion.execution.date_time_parserat config parse time.Right now this is a free-form
Option<String>with notransform/validation, so values like"Chrono"(case) or typos will parse fine and only fail later (or silently fall back, depending on the factory). Consider normalizing to lowercase and/or using a small enum for immediate validation.Proposed minimal change: lowercase normalization
- pub date_time_parser: Option<String>, default = Some("chrono".to_string()) + pub date_time_parser: Option<String>, transform = str::to_lowercase, default = Some("chrono".to_string())datafusion/functions/benches/to_timestamp.rs (2)
189-220: Removeprintln!from benchmark hot path.The
println!statements inside the benchmark iteration will affect timing accuracy and pollute benchmark output. These should be removed or moved outside the measured loop.♻️ Suggested fix
b.iter(|| { for (input, format3, jiff_format3) in izip!(inputs.iter(), format3.iter(), jiffformat3.iter()) { black_box(match parser { "chrono" => { - let t = datetime_parser + datetime_parser .string_to_timestamp_nanos_formatted( "UTC", input.unwrap(), &[format3.unwrap()], - ); - if t.is_err() { - println!("Error: {:?}", t); - } + ) } "jiff" => { - let t = datetime_parser + datetime_parser .string_to_timestamp_nanos_formatted( "UTC", input.unwrap(), &[jiff_format3.unwrap()], - ); - if t.is_err() { - println!("Error: {:?}", t); - } + ) } _ => unreachable!(), }) } })
248-300: Same issue: removeprintln!from benchmark hot path.The
println!statements in the multiple-formats benchmark also affect timing accuracy.♻️ Suggested fix
b.iter(|| { for ( input, format1, format2, format3, jiff_format1, jiff_format2, jiff_format3, ) in izip!( inputs.iter(), format1.iter(), format2.iter(), format3.iter(), jiff_format1.iter(), jiff_format2.iter(), jiff_format3.iter() ) { black_box(match parser { "chrono" => { - let t = datetime_parser + datetime_parser .string_to_timestamp_nanos_formatted( "UTC", input.unwrap(), &[ format1.unwrap(), format2.unwrap(), format3.unwrap(), ], - ); - if t.is_err() { - println!("Error: {:?}", t); - } + ) } "jiff" => { - let t = datetime_parser + datetime_parser .string_to_timestamp_nanos_formatted( "UTC", input.unwrap(), &[ jiff_format1.unwrap(), jiff_format2.unwrap(), jiff_format3.unwrap(), ], - ); - if t.is_err() { - println!("Error: {:?}", t); - } + ) } _ => unreachable!(), }) } })datafusion/functions/src/datetime/to_date.rs (2)
69-103: Consider returning&mut Selffor builder-style chaining.The
with_datetime_parsermethod takes&mut selfbut returns&Self, which prevents fluent builder patterns on owned values. Consider returning&mut Selfinstead for consistency with common Rust builder patterns.♻️ Suggested fix
/// Set a custom date time parser. - pub fn with_datetime_parser(&mut self, parser: Box<dyn DateTimeParser>) -> &Self { + pub fn with_datetime_parser(&mut self, parser: Box<dyn DateTimeParser>) -> &mut Self { self.parser = parser; self }
417-424: Debug print in test should useeprintln!or be removed.The
println!for debugging test failures is fine for development, but consider usingeprintln!or including the error in the panic message directly for cleaner test output.♻️ Suggested fix
- e => { - println!("e was {e:?}"); - panic!( - "Could not convert '{}' with format string '{}' to Date", - tc.formatted_date, tc.format_str - ) - } + e => panic!( + "Could not convert '{}' with format string '{}' to Date: {e:?}", + tc.formatted_date, tc.format_str + ),
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
datafusion-examples/examples/builtin_functions/date_time.rsdatafusion/common/src/config.rsdatafusion/core/Cargo.tomldatafusion/functions/Cargo.tomldatafusion/functions/benches/to_timestamp.rsdatafusion/functions/src/datetime/common.rsdatafusion/functions/src/datetime/mod.rsdatafusion/functions/src/datetime/parser.rsdatafusion/functions/src/datetime/to_date.rsdatafusion/functions/src/datetime/to_timestamp.rsdatafusion/sqllogictest/test_files/datetime/dates.sltdatafusion/sqllogictest/test_files/datetime/timestamps.sltdatafusion/sqllogictest/test_files/information_schema.sltdatafusion/sqllogictest/test_files/to_timestamp_timezone.sltdocs/source/user-guide/configs.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
Applied to files:
datafusion/functions/src/datetime/to_timestamp.rs
🧬 Code graph analysis (5)
datafusion/functions/src/datetime/to_date.rs (1)
datafusion/functions/src/datetime/parser.rs (3)
new(97-99)new(376-378)get_date_time_parser(64-80)
datafusion/functions/src/datetime/mod.rs (2)
datafusion/functions/src/datetime/to_date.rs (3)
to_date(105-136)default(76-78)args(234-237)datafusion/functions/src/datetime/now.rs (1)
default(50-52)
datafusion/functions/src/datetime/parser.rs (1)
datafusion/functions/src/datetime/to_date.rs (1)
new(87-89)
datafusion/functions/benches/to_timestamp.rs (2)
datafusion/functions/src/datetime/to_timestamp.rs (1)
to_timestamp(848-858)datafusion/functions/src/datetime/parser.rs (2)
new(97-99)new(376-378)
datafusion/functions/src/datetime/to_timestamp.rs (2)
datafusion/functions/src/datetime/parser.rs (3)
get_date_time_parser(64-80)new(97-99)new(376-378)datafusion/functions/src/datetime/to_date.rs (2)
with_datetime_parser(99-103)new(87-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: Analyze (rust)
🔇 Additional comments (26)
datafusion/functions/src/datetime/common.rs (1)
291-324: LGTM on the array path accumulation logic.The refactored approach correctly accumulates all format strings into a vector before invoking
oponce. This allows the parser implementation to try all formats and produce comprehensive error messages.datafusion/core/Cargo.toml (1)
111-113:jifffeature re-export looks correct.
Not being part ofdefaultis the right posture (keeps optional dependency surface small).datafusion-examples/examples/builtin_functions/date_time.rs (1)
413-422: Example expectation updated appropriately for “formats: [...]”.
No concerns with the new string shape in this example.docs/source/user-guide/configs.md (1)
80-83: Docs entry fordatafusion.execution.date_time_parserlooks consistent.
Only nit: consider quoting values consistently (e.g."chrono","jiff"), but content is correct.datafusion/sqllogictest/test_files/information_schema.slt (1)
216-223: SQL logic tests updated correctly for new config parameter visibility.Also applies to: 356-360
datafusion/sqllogictest/test_files/to_timestamp_timezone.slt (1)
140-147: Updated error expectation matches new error message shape.datafusion/functions/Cargo.toml (1)
55-57: jiff 0.2.18 is MSRV-compatible, but dyn-eq and dyn-hash lack published MSRVs.
jiff's MSRV (1.70.0) is well below DataFusion's (1.88.0). All three crates carry permissive licenses (jiff: Unlicense OR MIT; dyn-eq: MPL-2.0; dyn-hash: MIT OR Apache-2.0), compatible with Apache-2.0.
However, dyn-eq and dyn-hash do not publish documented MSRVs; verify these crates build on Rust 1.88.0. Additionally, jiff implements POSIX/GNU-compatible strftime with non-standard extensions (like %Q for IANA timezone IDs, %N, %q, %::z / %:::z); strftime format strings are mostly compatible with Chrono's but have differences. Ensure any documentation claiming exact parity between jiff and Chrono datetime formatting is reviewed.datafusion/sqllogictest/test_files/datetime/dates.slt (2)
294-300: Updatedto_date(..., '%q')failure expectation looks consistent with new formats-array messaging.
The escaping for["%q"]and the input string looks correct for regex matching.
373-378: Potential brittleness: exact error block may diverge across parser backends / error paths.
Given the newdate_time_parserselection and two implementations, please verify thisquery errorblock remains stable with both chrono default and jiff-enabled runs (if CI exercises both).datafusion/sqllogictest/test_files/datetime/timestamps.slt (2)
2545-2564: Error expectations updated correctly for formats-array reporting acrossto_timestamp*.
Regex escaping for["%q"]is consistent with other patterns in this file.
2666-2670: Potential brittleness: multi-line asserted error output could differ between chrono vs jiff implementations.
Please verify this remains stable (or relax to a regex match) when running tests with the optionaljifffeature / parser selection.datafusion/functions/src/datetime/mod.rs (3)
34-34: LGTM!New parser module addition is appropriate for the DateTimeParser trait abstraction.
53-53: LGTM!The
to_datefunction is now consistently wired through config-based construction, matching the pattern used by other timestamp functions in this module.Also applies to: 271-274
277-301: LGTM!The
functions()registry correctly passes the config reference toto_date, consistent with other config-aware functions likenow,to_timestamp, etc.datafusion/functions/benches/to_timestamp.rs (3)
30-34: LGTM!Appropriate imports for the DateTimeParser trait and both implementations to enable comparative benchmarking.
53-170: LGTM!The expanded
data_with_formats()function properly handles format differences between chrono and jiff parsers (e.g.,%+vs%Y-%m-%dT%H:%M:%S%Q,%cvs%a %b %e %H:%M:%S %Y).
309-390: LGTM!The UDF-based benchmarks correctly configure the parser via
ConfigOptionsand properly set up theScalarFunctionArgsfor both chrono and jiff parsers.datafusion/functions/src/datetime/to_date.rs (3)
18-32: LGTM!Appropriate imports added for the DateTimeParser abstraction and ConfigOptions.
105-136: LGTM!The
to_datemethod correctly delegates toself.parser.string_to_timestamp_millis_formattedfor multi-argument cases, properly integrating the parser abstraction.
219-247: LGTM!Tests appropriately updated to use
new_with_configconstructor.datafusion/functions/src/datetime/to_timestamp.rs (6)
22-22: LGTM!DateTimeParser import and field addition to
ToTimestampFuncstruct appropriately integrate the parser abstraction.Also applies to: 88-93
294-339: LGTM with same note on builder pattern.The macro-generated
with_datetime_parsermethod has the same&mut self -> &Selfreturn type pattern asto_date.rs. Consider changing to&mut Selffor consistency with common builder patterns, but this is a minor point.
796-827: LGTM!The
#[expect(clippy::borrowed_box)]annotation is appropriate here - passing&Box<dyn DateTimeParser>is functionally equivalent to&dyn DateTimeParserand the lint suppression is justified for API consistency. The implementation correctly delegates to the parser for both single-argument and multi-format cases.
830-909: LGTM!Test helper functions properly instantiate
ChronoDateTimeParserand pass it through the parser abstraction, ensuring test coverage for the new parser-based implementation.
1520-1520: LGTM!Updated expected error message correctly reflects the new format - errors now show the array of attempted formats (
[\"%s\", \"%c\", \"%H:%M:%S\"]) rather than a single format string.
1567-1638: LGTM!Test functions updated to use the parser-based API correctly, with
ChronoDateTimeParserinstantiated and passed tostring_to_timestamp_nanos_formatted.
| /// A trait for parsing timestamps from strings. Two implementations are provided: | ||
| /// `ChronoDateTimeParser` (the default) which uses [Chrono] to parse timestamps, and | ||
| /// `JiffDateTimeParser` (via the `jiff` feature flag) which uses [Jiff] to parse | ||
| /// timestamps. | ||
| /// | ||
| /// While both implementations are capable of parsing timestamps, the `ChronoDateTimeParser` | ||
| /// is a bit more lenient wrt formats at the cost of being slightly slower than Jiff. | ||
| /// Jiff, on the other hand, has slightly less support for some format options than Chrono | ||
| /// but is measurably faster than Chrono in some situations. | ||
| /// | ||
| /// [Chrono]: https://docs.rs/chrono/latest/chrono/ | ||
| /// [Jiff]: https://docs.rs/jiff/latest/jiff/ | ||
| pub trait DateTimeParser: Debug + DynEq + DynHash + Send + Sync { | ||
| fn string_to_timestamp_nanos( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| ) -> datafusion_common::Result<i64, DataFusionError>; | ||
|
|
||
| fn string_to_timestamp_nanos_formatted( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| formats: &[&str], | ||
| ) -> datafusion_common::Result<i64, DataFusionError>; | ||
|
|
||
| fn string_to_timestamp_millis_formatted( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| formats: &[&str], | ||
| ) -> datafusion_common::Result<i64, DataFusionError>; | ||
| } | ||
|
|
||
| // impl Eq and PartialEaq for dyn DateTimeParser | ||
| dyn_eq::eq_trait_object!(DateTimeParser); | ||
|
|
||
| // Implement std::hash::Hash for dyn DateTimeParser | ||
| dyn_hash::hash_trait_object!(DateTimeParser); | ||
|
|
There was a problem hiding this comment.
Minor: typos + simplify Result types.
- “PartialEaq” typo in comment (Line 58).
- Consider standardizing signatures to
datafusion_common::Result<i64>(the extra, DataFusionErroris redundant and makes signatures noisier).
🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 24 - 63, The
comment contains a typo and the trait methods use redundant Result type
parameters: fix the comment "PartialEaq" to "PartialEq" and update the three
method signatures on the DateTimeParser trait (string_to_timestamp_nanos,
string_to_timestamp_nanos_formatted, string_to_timestamp_millis_formatted) to
return datafusion_common::Result<i64> (remove the explicit ", DataFusionError"
type argument). Ensure the surrounding dyn_eq::eq_trait_object!(DateTimeParser)
and dyn_hash::hash_trait_object!(DateTimeParser) usage remains unchanged.
| pub fn get_date_time_parser(config_options: &ConfigOptions) -> Box<dyn DateTimeParser> { | ||
| let parser_cfg = config_options.execution.date_time_parser.as_ref(); | ||
|
|
||
| match parser_cfg { | ||
| Some(p) => match p.as_str() { | ||
| "chrono" => { | ||
| Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser> | ||
| } | ||
| #[cfg(feature = "jiff")] | ||
| "jiff" => { | ||
| Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser> | ||
| } | ||
| _ => panic!("Unknown/unsupported date time parser: {p}"), | ||
| }, | ||
| None => Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '64,80p'Repository: martin-augment/datafusion
Length of output: 194
🏁 Script executed:
rg -n --type=rust '\bget_date_time_parser\s*\('Repository: martin-augment/datafusion
Length of output: 454
🏁 Script executed:
ls -la datafusion/functions/src/datetime/Repository: martin-augment/datafusion
Length of output: 1394
🏁 Script executed:
wc -l datafusion/functions/src/datetime/parser.rsRepository: martin-augment/datafusion
Length of output: 116
🏁 Script executed:
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '64,80p'Repository: martin-augment/datafusion
Length of output: 879
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '90,98p'Repository: martin-augment/datafusion
Length of output: 410
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '320,328p'Repository: martin-augment/datafusion
Length of output: 447
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '82,105p'Repository: martin-augment/datafusion
Length of output: 1092
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '310,335p'Repository: martin-augment/datafusion
Length of output: 1255
🏁 Script executed:
rg -A 5 "impl.*ToDate\|struct ToDate" datafusion/functions/src/datetime/to_date.rs | head -20Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
rg -n --type=rust 'new_with_config' datafusion/functions/src/datetime/ | head -20Repository: martin-augment/datafusion
Length of output: 2519
🏁 Script executed:
rg -n --type=rust 'panic!' datafusion/functions/src/datetime/parser.rsRepository: martin-augment/datafusion
Length of output: 141
🏁 Script executed:
rg -B 2 -A 2 'struct.*DateTimeParser\|pub struct' datafusion/functions/src/datetime/to_date.rs | head -30Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_date.rs | sed -n '60,80p'Repository: martin-augment/datafusion
Length of output: 919
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '280,310p'Repository: martin-augment/datafusion
Length of output: 1703
🏁 Script executed:
rg -n 'exec_err!' datafusion/functions/src/datetime/ | head -10Repository: martin-augment/datafusion
Length of output: 993
🏁 Script executed:
rg -n 'ConfigOptions' datafusion/functions/src/datetime/parser.rs | head -5Repository: martin-augment/datafusion
Length of output: 209
🏁 Script executed:
rg -B 5 'ToDateFunc::new_with_config\|ToTimestampFunc::new_with_config' datafusion/functions/src/datetime/ | head -40Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
rg -n 'date_time_parser' --type=rust datafusion/functions/src/ | grep -E '\.(rs|toml):'Repository: martin-augment/datafusion
Length of output: 574
🏁 Script executed:
rg -n 'impl.*ToTimestampFunc\|fn.*to_timestamp_impl' datafusion/functions/src/datetime/to_timestamp.rs | head -10Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '390,410p'Repository: martin-augment/datafusion
Length of output: 885
🏁 Script executed:
cat -n datafusion/functions/src/datetime/to_timestamp.rs | sed -n '1,30p' | grep -E 'use|mod'Repository: martin-augment/datafusion
Length of output: 468
🏁 Script executed:
rg -n 'with_updated_config' datafusion/functions/src/datetime/to_timestamp.rs | head -5Repository: martin-augment/datafusion
Length of output: 360
🏁 Script executed:
grep -n 'date_time_parser' datafusion/core/src/config.rs 2>/dev/null | head -10Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
rg -n 'date_time_parser.*jiff\|jiff.*date_time_parser' datafusion/ --type=rust 2>/dev/null | head -10Repository: martin-augment/datafusion
Length of output: 51
🏁 Script executed:
find datafusion -name '*.rs' -type f -exec grep -l 'get_date_time_parser\|new_with_config' {} \; | head -20Repository: martin-augment/datafusion
Length of output: 2719
Replace panic! with DataFusionError for unknown date_time_parser config.
The panic! at line 76 is triggered by user-controlled config (e.g., date_time_parser="jiff" with the feature disabled or any unknown value). This should return an error instead of crashing. The proposed fix to return Result is correct, but note that this requires changing the return type of new_with_config() methods in ToDateFunc and all ToTimestamp* functions, as these currently return Self directly. The changes would need to propagate through the macro-generated constructors and their call sites.
Proposed fix (change factory to return a Result)
-pub fn get_date_time_parser(config_options: &ConfigOptions) -> Box<dyn DateTimeParser> {
+pub fn get_date_time_parser(
+ config_options: &ConfigOptions,
+) -> datafusion_common::Result<Box<dyn DateTimeParser>> {
let parser_cfg = config_options.execution.date_time_parser.as_ref();
match parser_cfg {
Some(p) => match p.as_str() {
- "chrono" => {
- Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>
- }
+ "chrono" => Ok(Box::new(chrono::ChronoDateTimeParser::new())
+ as Box<dyn DateTimeParser>),
#[cfg(feature = "jiff")]
- "jiff" => {
- Box::new(jiff::JiffDateTimeParser::new()) as Box<dyn DateTimeParser>
- }
- _ => panic!("Unknown/unsupported date time parser: {p}"),
+ "jiff" => Ok(Box::new(jiff::JiffDateTimeParser::new())
+ as Box<dyn DateTimeParser>),
+ _ => datafusion_common::exec_err!("Unknown/unsupported date time parser: {p}"),
},
- None => Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>,
+ None => Ok(Box::new(chrono::ChronoDateTimeParser::new()) as Box<dyn DateTimeParser>),
}
}🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 64 - 80, Change
get_date_time_parser to return Result<Box<dyn DateTimeParser>, DataFusionError>
instead of panicking on unknown/unsupported parser: replace the
panic!("Unknown/unsupported date time parser: {p}") branch with an
Err(DataFusionError::Plan(format!(...))) (or other appropriate DataFusionError)
and adjust the None branch to return Ok(Box::new(...)). Update all call sites
that use get_date_time_parser to handle the Result. Additionally change the
constructors/new_with_config() signatures used by
chrono::ChronoDateTimeParser::new(), jiff::JiffDateTimeParser::new() and any
macro-generated constructors (notably new_with_config() used by ToDateFunc and
all ToTimestamp* functions) to return Result<Self, DataFusionError> and
propagate the Result through their callers and macro expansions so the factory
can construct parsers via ?.
| // we manually handle the most common case of a named timezone at the end of the timestamp | ||
| // since chrono doesn't support that directly. Note, however, that %+ does handle 'Z' at the | ||
| // end of the string. | ||
| // | ||
| // This code doesn't handle named timezones with no preceding space since that would require | ||
| // writing a custom parser. | ||
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | ||
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | ||
| let timezone_name = parts[0]; | ||
| datetime_str = parts[1]; | ||
|
|
||
| // attempt to parse the timezone name | ||
| let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse(); | ||
| let Ok(tz) = result else { | ||
| err = Some(result.unwrap_err().to_string()); | ||
| continue; | ||
| }; | ||
|
|
||
| // successfully parsed the timezone name, remove the ' %Z' from the format | ||
| format = &format[..format.len() - 3]; | ||
|
|
||
| Some(tz) | ||
| } else if format.contains("%Z") { | ||
| err = Some("'%Z' is only supported at the end of the format string preceded by a space".into()); | ||
| continue; | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
Chrono %Z parsing can panic on malformed input (missing space).
When the format ends with " %Z" (Line 126), the code assumes the input has a space before the timezone and does parts[1] (Line 130). If the input is missing that space, this will panic (index out of bounds) instead of returning an execution error.
Proposed fix (avoid Vec + avoid indexing)
- let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
- let timezone_name = parts[0];
- datetime_str = parts[1];
+ let Some((datetime_part, timezone_name)) = datetime_str.rsplit_once(' ') else {
+ err = Some("expected a space before the timezone name".to_string());
+ continue;
+ };
+ datetime_str = datetime_part;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // we manually handle the most common case of a named timezone at the end of the timestamp | |
| // since chrono doesn't support that directly. Note, however, that %+ does handle 'Z' at the | |
| // end of the string. | |
| // | |
| // This code doesn't handle named timezones with no preceding space since that would require | |
| // writing a custom parser. | |
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | |
| // grab the string after the last space as the named timezone | |
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | |
| let timezone_name = parts[0]; | |
| datetime_str = parts[1]; | |
| // attempt to parse the timezone name | |
| let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse(); | |
| let Ok(tz) = result else { | |
| err = Some(result.unwrap_err().to_string()); | |
| continue; | |
| }; | |
| // successfully parsed the timezone name, remove the ' %Z' from the format | |
| format = &format[..format.len() - 3]; | |
| Some(tz) | |
| } else if format.contains("%Z") { | |
| err = Some("'%Z' is only supported at the end of the format string preceded by a space".into()); | |
| continue; | |
| } else { | |
| None | |
| }; | |
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | |
| // grab the string after the last space as the named timezone | |
| let Some((datetime_part, timezone_name)) = datetime_str.rsplit_once(' ') else { | |
| err = Some("expected a space before the timezone name".to_string()); | |
| continue; | |
| }; | |
| datetime_str = datetime_part; | |
| // attempt to parse the timezone name | |
| let result: Result<chrono_tz::Tz, ParseError> = timezone_name.parse(); | |
| let Ok(tz) = result else { | |
| err = Some(result.unwrap_err().to_string()); | |
| continue; | |
| }; | |
| // successfully parsed the timezone name, remove the ' %Z' from the format | |
| format = &format[..format.len() - 3]; | |
| Some(tz) | |
| } else if format.contains("%Z") { | |
| err = Some("'%Z' is only supported at the end of the format string preceded by a space".into()); | |
| continue; | |
| } else { | |
| None | |
| }; |
🤖 Prompt for AI Agents
In @datafusion/functions/src/datetime/parser.rs around lines 120 - 148, The code
assumes a space exists before the timezone when format.ends_with(" %Z") and
directly indexes parts[1], which panics if the input lacks that space; change
the parsing to avoid Vec/indexing by using the rsplitn iterator and pattern
matching: call let mut parts = datetime_str.rsplitn(2, ' '); match
(parts.next(), parts.next()) { (Some(timezone_name), Some(rest)) => {
datetime_str = rest; let result: Result<chrono_tz::Tz, ParseError> =
timezone_name.parse(); let Ok(tz) = result else { err =
Some(result.unwrap_err().to_string()); continue; }; format =
&format[..format.len() - 3]; Some(tz) }, _ => { err = Some("expected a space
before timezone when format ends with ' %Z'".into()); continue; } }, ensuring
you reference the existing variables format, datetime_str, tz, result and err.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The logic checks that the "format" value has a space character but later tries to split the datetime_str by space and read its second part. Since there is no guarantee that datatime_str has a space in it it may lead to an application crash. Prevents an internal error when the user provides an invalid value.
| #[cfg(feature = "jiff")] | ||
| pub mod jiff { | ||
| use crate::datetime::parser::DateTimeParser; | ||
| use arrow::array::timezone::Tz; | ||
| use arrow::error::ArrowError; | ||
| use chrono::format::{Parsed, StrftimeItems, parse}; | ||
| use datafusion_common::{DataFusionError, exec_datafusion_err, exec_err}; | ||
| use jiff::civil::Time; | ||
| use jiff::fmt::temporal::Pieces; | ||
| use jiff::tz::{Disambiguation, TimeZone}; | ||
| use jiff::{Timestamp, Zoned}; | ||
| use num_traits::ToPrimitive; | ||
|
|
||
| const ERR_NANOSECONDS_NOT_SUPPORTED: &str = "The dates that can be represented as nanoseconds have to be between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.854775804"; | ||
|
|
||
| #[derive(Debug, PartialEq, Eq, Hash)] | ||
| pub struct JiffDateTimeParser {} | ||
|
|
||
| impl JiffDateTimeParser { | ||
| pub fn new() -> Self { | ||
| Self {} | ||
| } | ||
|
|
||
| /// Attempts to parse a given string representation of a timestamp into a `Timestamp` object. | ||
| /// The function also adjusts the datetime for the specified timezone. | ||
| /// | ||
| /// # Parameters | ||
| /// - `timezone`: A reference to the `TimeZone` object used to adjust the parsed datetime to the desired timezone. | ||
| /// - `s`: A string slice holding the timestamp to be parsed. | ||
| /// | ||
| /// # Returns | ||
| /// - `Ok(Timestamp)`: Contains the parsed timestamp in seconds since the Unix epoch. | ||
| /// - `Err(ArrowError)`: Returned in the event of errors in parsing | ||
| /// the timestamp string or computing the timezone offset. | ||
| /// | ||
| /// # Errors | ||
| /// This function will return an `ArrowError` if: | ||
| /// | ||
| /// - The string `s` is shorter than 10 characters. | ||
| /// - The format of the string does not match expected timestamp patterns. | ||
| /// - An invalid/unknown time zone or offset is provided during parsing. | ||
| /// - Errors occur while converting the datetime to a specific time zone. | ||
| fn string_to_datetime( | ||
| &self, | ||
| timezone: &TimeZone, | ||
| s: &str, | ||
| ) -> datafusion_common::Result<Timestamp, ArrowError> { | ||
| let err = |ctx: &str| { | ||
| ArrowError::ParseError(format!( | ||
| "Error parsing timestamp from '{s}': {ctx}" | ||
| )) | ||
| }; | ||
|
|
||
| let bytes = s.as_bytes(); | ||
| if bytes.len() < 10 { | ||
| return Err(err("timestamp must contain at least 10 characters")); | ||
| } | ||
|
|
||
| let pieces = Pieces::parse(bytes).map_err(|e| err(&format!("{e:?}")))?; | ||
| let time = pieces.time().unwrap_or_else(Time::midnight); | ||
| let dt = pieces.date().to_datetime(time); | ||
| let tz = match pieces | ||
| .to_time_zone() | ||
| .map_err(|e| err(&format!("unknown time zone: {e:?}")))? | ||
| { | ||
| Some(tz) => tz, | ||
| None => match pieces.to_numeric_offset() { | ||
| Some(offset) => TimeZone::fixed(offset), | ||
| None => timezone.to_owned(), | ||
| }, | ||
| }; | ||
| let zdt = tz | ||
| .to_zoned(dt) | ||
| .map_err(|e| err(&format!("error computing timezone offset: {e:?}")))?; | ||
|
|
||
| Ok(zdt.timestamp()) | ||
| } | ||
|
|
||
| /// Attempts to parse a given string representation of a timestamp into a `Zoned` datetime object | ||
| /// using a list of provided formats. The function also adjusts the datetime for the specified timezone. | ||
| /// | ||
| /// # Parameters | ||
| /// - `timezone`: A reference to the `TimeZone` object used to adjust the parsed datetime to the desired timezone. | ||
| /// - `s`: A string slice holding the timestamp to be parsed. | ||
| /// - `formats`: A slice of string slices representing the accepted datetime formats to use when parsing. | ||
| /// | ||
| /// # Returns | ||
| /// - `Ok(Zoned)` if the string is successfully parsed into a `Zoned` datetime object using one of the formats provided. | ||
| /// - `Err(DataFusionError)` if the string cannot be parsed or if there are issues related to the timezone or format handling. | ||
| /// | ||
| /// # Behavior | ||
| /// 1. Iterates through the list of provided formats. | ||
| /// 2. Handles special cases such as `%Z` (timezone) or `%c` (locale-aware datetime representation) | ||
| /// by either modifying the format string or switching to different parsing logic. | ||
| /// 3. Attempts to resolve timezone-related issues: | ||
| /// - Calculates a fixed offset for the given `timezone`. | ||
| /// - Handles attempts to parse timezones using IANA names or offsets. | ||
| /// 4. If parsing fails for the current format, the function moves to the next format and aggregates any parsing errors. | ||
| /// 5. If no formats succeed, an error is returned summarizing the failure. | ||
| /// | ||
| /// # Errors | ||
| /// - Returns a descriptive error wrapped in `DataFusionError` if: | ||
| /// - The input string is not compatible with any of the provided formats. | ||
| /// - There are issues parsing the timezone from the input string or adjusting to the provided `TimeZone`. | ||
| /// - There are issues resolving ambiguous datetime representations. | ||
| fn string_to_datetime_formatted( | ||
| &self, | ||
| timezone: &TimeZone, | ||
| s: &str, | ||
| formats: &[&str], | ||
| ) -> datafusion_common::Result<Zoned, DataFusionError> { | ||
| let mut err: Option<String> = None; | ||
|
|
||
| for &format in formats.iter() { | ||
| let mut format = if format.contains("%Z") { | ||
| &format.replace("%Z", "%Q") | ||
| } else { | ||
| format | ||
| }; | ||
|
|
||
| if format == "%c" { | ||
| // switch to chrono, jiff doesn't support parsing with %c | ||
| let mut parsed = Parsed::new(); | ||
| let result = parse(&mut parsed, s, StrftimeItems::new(format)); | ||
|
|
||
| if result.is_err() { | ||
| err = Some(format!("error parsing timestamp: {result:?}")); | ||
| continue; | ||
| } | ||
|
|
||
| let offset = timezone.to_fixed_offset(); | ||
| let tz: &str = if let Ok(offset) = offset { | ||
| &offset.to_string() | ||
| } else { | ||
| let result = timezone.iana_name(); | ||
| match result { | ||
| Some(tz) => tz, | ||
| None => { | ||
| err = | ||
| Some(format!("error parsing timezone: {timezone:?}")); | ||
| continue; | ||
| } | ||
| } | ||
| }; | ||
| let tz: Tz = tz.parse::<Tz>().ok().unwrap_or("UTC".parse::<Tz>()?); | ||
|
|
||
| match parsed.to_datetime_with_timezone(&tz) { | ||
| Ok(dt) => { | ||
| let dt = dt.fixed_offset(); | ||
| let result = | ||
| Timestamp::from_nanosecond(dt.timestamp() as i128); | ||
| match result { | ||
| Ok(result) => { | ||
| return Ok(result.to_zoned(timezone.to_owned())); | ||
| } | ||
| Err(e) => { | ||
| err = Some(e.to_string()); | ||
| continue; | ||
| } | ||
| } | ||
| } | ||
| Err(e) => { | ||
| err = Some(e.to_string()); | ||
| continue; | ||
| } | ||
| }; | ||
| } | ||
|
|
||
| let result = if format == "%+" { | ||
| // jiff doesn't support parsing with %+, but the equivalent of %+ is | ||
| // somewhat rfc3389 which is what is used by the parse method | ||
| let result: Result<Zoned, _> = s.parse(); | ||
|
|
||
| if let Ok(r) = result { | ||
| return Ok(r); | ||
| } else { | ||
| // however, the above only works with timezone names, not offsets | ||
| let s = if s.ends_with("Z") { | ||
| &(s.trim_end_matches("Z").to_owned() + "+00:00") | ||
| } else { | ||
| s | ||
| }; | ||
|
|
||
| // try again with fractional seconds | ||
| format = "%Y-%m-%dT%H:%M:%S%.f%:z"; | ||
|
|
||
| jiff::fmt::strtime::parse(format, s) | ||
| } | ||
| } else { | ||
| jiff::fmt::strtime::parse(format, s) | ||
| }; | ||
|
|
||
| match result { | ||
| Ok(bdt) => { | ||
| if bdt.iana_time_zone().is_some() || bdt.offset().is_some() { | ||
| if let Ok(zoned) = bdt.to_zoned() { | ||
| return Ok(zoned.with_time_zone(timezone.to_owned())); | ||
| } | ||
| } | ||
|
|
||
| let result = bdt.to_datetime(); | ||
| let datetime = match result { | ||
| Ok(datetime) => datetime, | ||
| Err(e) => { | ||
| err = Some(e.to_string()); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let zoned = timezone | ||
| .to_owned() | ||
| .into_ambiguous_zoned(datetime) | ||
| .disambiguate(Disambiguation::Compatible); | ||
|
|
||
| match zoned { | ||
| Ok(z) => return Ok(z), | ||
| Err(e) => { | ||
| err = Some(e.to_string()); | ||
| continue; | ||
| } | ||
| } | ||
| } | ||
| Err(e) => { | ||
| err = Some(e.to_string()); | ||
| continue; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| match err { | ||
| Some(e) => exec_err!("{e}"), | ||
| None => { | ||
| exec_err!("Unable to parse timestamp: {s} with formats: {formats:?}") | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl DateTimeParser for JiffDateTimeParser { | ||
| /// Accepts a string with a `jiff` format and converts it to a | ||
| /// nanosecond precision timestamp according to the rules | ||
| /// defined by `jiff`. | ||
| /// | ||
| /// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications> | ||
| /// for the full set of supported formats. | ||
| /// | ||
| /// ## Timestamp Precision | ||
| /// | ||
| /// This function uses the maximum precision timestamps supported by | ||
| /// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This | ||
| /// means the range of dates that timestamps can represent is ~1677 AD | ||
| /// to 2262 AM. | ||
| /// | ||
| /// ## Timezone / Offset Handling | ||
| /// | ||
| /// Numerical values of timestamps are stored compared to offset UTC. | ||
| #[inline] | ||
| fn string_to_timestamp_nanos( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| ) -> Result<i64, DataFusionError> { | ||
| let result = TimeZone::get(tz.as_ref()); | ||
| let timezone = match result { | ||
| Ok(tz) => tz, | ||
| Err(e) => return exec_err!("Invalid timezone {tz}: {e}"), | ||
| }; | ||
|
|
||
| let timestamp = self.string_to_datetime(&timezone, s)?; | ||
| let parsed = timestamp | ||
| .as_nanosecond() | ||
| .to_i64() | ||
| .ok_or_else(|| exec_datafusion_err!("{ERR_NANOSECONDS_NOT_SUPPORTED}"))?; | ||
|
|
||
| Ok(parsed) | ||
| } | ||
|
|
||
| /// Accepts a string with an array of `jiff` formats and converts it to a | ||
| /// nanosecond precision timestamp according to the rules | ||
| /// defined by `jiff`. | ||
| /// | ||
| /// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications> | ||
| /// for the full set of supported formats. | ||
| /// | ||
| /// ## Timestamp Precision | ||
| /// | ||
| /// This function uses the maximum precision timestamps supported by | ||
| /// Arrow (nanoseconds stored as a 64-bit integer) timestamps. This | ||
| /// means the range of dates that timestamps can represent is ~1677 AD | ||
| /// to 2262 AM. | ||
| /// | ||
| /// ## Timezone / Offset Handling | ||
| /// | ||
| /// Numerical values of timestamps are stored compared to offset UTC. | ||
| #[inline] | ||
| fn string_to_timestamp_nanos_formatted( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| formats: &[&str], | ||
| ) -> Result<i64, DataFusionError> { | ||
| let result = TimeZone::get(tz.as_ref()); | ||
| let timezone = match result { | ||
| Ok(tz) => tz, | ||
| Err(e) => return exec_err!("Invalid timezone {tz}: {e}"), | ||
| }; | ||
|
|
||
| let zoned = self.string_to_datetime_formatted(&timezone, s, formats)?; | ||
| let parsed = | ||
| zoned.timestamp().as_nanosecond().to_i64().ok_or_else(|| { | ||
| exec_datafusion_err!("{ERR_NANOSECONDS_NOT_SUPPORTED}") | ||
| })?; | ||
|
|
||
| Ok(parsed) | ||
| } | ||
|
|
||
| /// Accepts a string with an array of `jiff` formats and converts it to a | ||
| /// millisecond precision timestamp according to the rules | ||
| /// defined by `jiff`. | ||
| /// | ||
| /// See <https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications> | ||
| /// for the full set of supported formats. | ||
| /// | ||
| /// ## Timezone / Offset Handling | ||
| /// | ||
| /// Numerical values of timestamps are stored compared to offset UTC. | ||
| #[inline] | ||
| fn string_to_timestamp_millis_formatted( | ||
| &self, | ||
| tz: &str, | ||
| s: &str, | ||
| formats: &[&str], | ||
| ) -> Result<i64, DataFusionError> { | ||
| let result = TimeZone::get(tz.as_ref()); | ||
| let timezone = match result { | ||
| Ok(tz) => tz, | ||
| Err(e) => return exec_err!("Invalid timezone {tz}: {e}"), | ||
| }; | ||
|
|
||
| let dt = self.string_to_datetime_formatted(&timezone, s, formats)?; | ||
| let parsed = dt.timestamp().as_millisecond(); | ||
|
|
||
| Ok(parsed) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file structure and look for the Chrono implementation to compare error messages
cd datafusion/functions && find . -name "parser.rs" -type f | head -5Repository: martin-augment/datafusion
Length of output: 93
🌐 Web query:
Jiff strtime parsing timezone directive %Q documentation and supported format codes
💡 Result:
Summary — %Q in Jiff
- %Q (and %:Q) produce or parse an IANA time‑zone identifier when available; if no identifier exists they fall back to a numeric offset (%z or %:z respectively). [1]
- %Q and %:Q require the crate's alloc feature; they are unavailable when alloc is disabled. [1]
Relevant timezone/offset/zone-format specifiers
- %Q — IANA tz id, or %z fallback.
- %:Q — IANA tz id, or %:z fallback (colon in offset).
- %Z — time‑zone abbreviation (formatting only; not parsable).
- %z — numeric offset as [+-]HHMM[SS].
- %:z — numeric offset as [+-]HH:MM[:SS].
- %::z — numeric offset as [+-]HH:MM:SS.
- %:::z — compact numeric offset variants like -04 or +05:30.
(Parsing/formatting support and exact behaviors are documented in the strtime module.) [1][2]
Other notes
- Jiff supports a broad set of strftime/strptime conversion specifiers (week/day/month/year/time, %s Unix timestamp, %Q/%:Q, etc.); see the conversion‑specifiers table in the fmt::strtime docs for the full list and parsing vs. formatting differences. [1]
Sources
[1] jiff::fmt::strtime documentation (conversion specifiers, %Q notes).
[2] jiff crate overview/examples (shows %Q usage).
🏁 Script executed:
# Let's examine the parser.rs file around the Chrono implementation to see error message format
cat -n datafusion/functions/src/datetime/parser.rs | head -100Repository: martin-augment/datafusion
Length of output: 4616
🏁 Script executed:
# Let's see the Chrono implementation to compare error message handling
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '82,356p'Repository: martin-augment/datafusion
Length of output: 14113
🏁 Script executed:
# Now let's check the specific lines of the Jiff implementation mentioned in the review
cat -n datafusion/functions/src/datetime/parser.rs | sed -n '357,702p'Repository: martin-augment/datafusion
Length of output: 17028
🏁 Script executed:
# Let's verify if the code compiles - check if there are any test files or build artifacts
cd datafusion/functions && cargo check --features jiff 2>&1 | head -200Repository: martin-augment/datafusion
Length of output: 178
Fix invalid temporary borrows and align error message format with Chrono implementation.
The code has three invalid lifetime issues where references are created to temporary strings:
- Line 471–475:
&format.replace("%Z", "%Q")creates a reference to a temporary - Line 488–490:
&offset.to_string()creates a reference to a temporary - Line 534–539:
&(s.trim_end_matches("Z").to_owned() + "+00:00")creates a reference to a temporary
Use Cow<'_, str> for format and owned String for timezone and offset strings to preserve lifetimes. Additionally, unify error messages with Chrono's format. Chrono returns "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}" when an error occurs (lines 217–218), but Jiff returns raw {e} (line 587), which creates inconsistent behavior when users switch parsers.
The %Q directive is correct for Jiff—it parses IANA timezone identifiers with fallback to numeric offset.
Proposed fix
pub mod jiff {
+ use std::borrow::Cow;
use crate::datetime::parser::DateTimeParser;
use arrow::array::timezone::Tz;
use arrow::error::ArrowError;
use chrono::format::{Parsed, StrftimeItems, parse};
use datafusion_common::{DataFusionError, exec_datafusion_err, exec_err};
@@ -469,10 +470,12 @@ pub mod jiff {
let mut err: Option<String> = None;
for &format in formats.iter() {
- let mut format = if format.contains("%Z") {
- &format.replace("%Z", "%Q")
+ let mut format: Cow<'_, str> = if format.contains("%Z") {
+ Cow::Owned(format.replace("%Z", "%Q"))
} else {
- format
+ Cow::Borrowed(format)
};
- if format == "%c" {
+ if format.as_ref() == "%c" {
// switch to chrono, jiff doesn't support parsing with %c
let mut parsed = Parsed::new();
- let result = parse(&mut parsed, s, StrftimeItems::new(format));
+ let result = parse(&mut parsed, s, StrftimeItems::new(format.as_ref()));
if result.is_err() {
err = Some(format!("error parsing timestamp: {result:?}"));
@@ -486,12 +489,13 @@ pub mod jiff {
}
let offset = timezone.to_fixed_offset();
- let tz: &str = if let Ok(offset) = offset {
- &offset.to_string()
+ let tz_owned: String;
+ let tz: &str = if let Ok(offset) = offset {
+ tz_owned = offset.to_string();
+ &tz_owned
} else {
let result = timezone.iana_name();
match result {
Some(tz) => tz,
None => {
@@ -524,10 +528,12 @@ pub mod jiff {
};
}
- let result = if format == "%+" {
+ let result = if format.as_ref() == "%+" {
// jiff doesn't support parsing with %+, but the equivalent of %+ is
// somewhat rfc3389 which is what is used by the parse method
let result: Result<Zoned, _> = s.parse();
if let Ok(r) = result {
return Ok(r);
} else {
// however, the above only works with timezone names, not offsets
- let s = if s.ends_with("Z") {
- &(s.trim_end_matches("Z").to_owned() + "+00:00")
+ let s: Cow<'_, str> = if s.ends_with('Z') {
+ Cow::Owned(format!("{}+00:00", s.trim_end_matches('Z')))
} else {
- s
+ Cow::Borrowed(s)
};
// try again with fractional seconds
- format = "%Y-%m-%dT%H:%M:%S%.f%:z";
+ format = Cow::Borrowed("%Y-%m-%dT%H:%M:%S%.f%:z");
- jiff::fmt::strtime::parse(format, s)
+ jiff::fmt::strtime::parse(format.as_ref(), s.as_ref())
}
} else {
- jiff::fmt::strtime::parse(format, s)
+ jiff::fmt::strtime::parse(format.as_ref(), s)
};
match result {
@@ -586,10 +594,12 @@ pub mod jiff {
}
match err {
- Some(e) => exec_err!("{e}"),
+ Some(e) => exec_err!(
+ "Error parsing timestamp from '{s}' using formats: {formats:?}: {e}"
+ ),
None => {
- exec_err!("Unable to parse timestamp: {s} with formats: {formats:?}")
+ exec_err!("Error parsing timestamp from '{s}' using formats: {formats:?}")
}
}
}
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Crashing the application by using panic!() is usually not desired but here the crash is caused due to misconfiguration, so the application will stop early with an explanation that there is no such parser with that name. Still, it would be better to return an Err and fail in more friendly manner than a crash. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Using Arc::ptr_eq() should be sufficient for this case because the datetime parsers are stateless and there is no need of deep object equality. Both dyn-eq and dyn-hash are relatively known in the Rust ecosystem, so their usage is justified too. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The error message for the Jiff implementation does not provide the problematic datetime and the attempted formats as a context to the Jiff's error message and this makes it harder to debug and fix than the Chrono error message. |
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The benchmark should not test the error paths. Only the happy path makes sense to be tested in a benchmark. In addition any errors are currently printed with println!() which additionally slows down the benchmark. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Currently the "jiff" march arm is available only if the "jiff" feature is enabled. Trying to use it will fail with an error saying that only "chrono" and "jiff" are supported. Prevents a confusion in the developer/admin of the application who tries to use the "jiff" parser and the error says that there is no such but it is a valid value. |
value:good-to-have; category:documentation; feedback:The Claude AI reviewer is correct! There is a minor typo in the name of the PartialEq trait in an inline comment. It does not do any harm but it would be nice to fix it. |
19755: To review by AI