19078: Respect execution timezone in to_timestamp and related functions#101
19078: Respect execution timezone in to_timestamp and related functions#101martin-augment wants to merge 2 commits intomainfrom
Conversation
Refactor shared helpers to ensure naïve strings are interpreted using the configured execution zone. Extend unit tests to cover naïve and formatted inputs respecting non-UTC execution timezones.
WalkthroughThis pull request introduces timezone awareness to DataFusion's timestamp functions. The changes include adding the chrono-tz dependency, refactoring datetime parsing utilities to support timezone-aware operations via a new generic ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review: Respect execution timezone in to_timestamp functionsSummaryThis PR adds timezone awareness to the Code Quality and Best PracticesStrengths ✅
Potential Issues🔴 Critical Issues
🟡 Medium Issues
🟢 Minor Issues
Performance ConsiderationsPrimary Concerns:
Recommendations:
Security Concerns✅ No significant security issues identified
Test Coverage✅ Excellent test coverage The new
Suggestions:
Additional Recommendations
Overall AssessmentThis is a well-designed and necessary feature with good code quality. The main concern is the performance impact of repeated timezone parsing. I recommend addressing the critical issues (especially #1 and #2) before merging. Rating: ⭐⭐⭐⭐☆ (4/5) - Good implementation with performance optimization needed Review generated by Claude Code |
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | ||
| let timezone_name = parts[0]; | ||
| datetime_str = parts[1]; |
There was a problem hiding this comment.
Potential panic: datetime_str = parts[1]; assumes a space exists in the input when format ends with %Z. Consider validating parts.len() == 2 and returning an error if the input doesn't contain the expected space-delimited timezone (also applies to the preceding timezone_name = parts[0]).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The allocation of the Vector data structure could be avoided by using rsplit_once(' ') instead. This could save some CPU cycles for allocating/deallocating the temporary Vec.
| doc_section(label = "Time and Date Functions"), | ||
| description = "Converts a value to a timestamp (`YYYY-MM-DDT00:00:00.000000000Z`). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no [Chrono format](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)s are provided. Integers and unsigned integers are interpreted as nanoseconds since the unix epoch (`1970-01-01T00:00:00Z`). Returns the corresponding timestamp.", | ||
| description = r#" | ||
| Converts a value to a timestamp (`YYYY-MM-DDT00:00:00.000000000Z<TZ>`) in the session time zone. Supports strings, |
There was a problem hiding this comment.
Docs nit: the example format shows YYYY-MM-DDT00:00:00.000000000Z<TZ>, which suggests both Z and a timezone; it should likely be either ...Z or ...<TZ> but not both. Please update to avoid confusion (also applies to the user guide).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! The extra Z in the datetime pattern should not be there and should be removed because it makes the pattern invalid.
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | ||
| let timezone_name = parts[0]; | ||
| datetime_str = parts[1]; |
There was a problem hiding this comment.
Bug: Potential panic when parsing timestamp with %Z format
When the format string ends with " %Z" but the input datetime_str contains no spaces, rsplitn(2, ' ') returns only one element (the original string). Accessing parts[1] at line 121 will panic with an index-out-of-bounds error. For example, to_timestamp('2020UTC', '%Y %Z') would crash because the input string has no space to split on.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The Bugbot AI reviewer is correct! The allocation of the Vector data structure could be avoided by using rsplit_once(' ') instead. This could save some CPU cycles for allocating/deallocating the temporary Vec.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
datafusion/functions/src/datetime/to_unixtime.rs (1)
107-125: Add support for Utf8View and LargeUtf8 string types.The implementation only explicitly handles
DataType::Utf8, causingUtf8ViewandLargeUtf8inputs to fall through to the error case. Add these types to the match pattern alongsideUtf8(similar to the pattern into_timestamp.rslines 497, 578, 640, 702, 764).
🧹 Nitpick comments (10)
datafusion/functions/src/datetime/to_unixtime.rs (1)
117-121: Potential performance consideration: new instance created per invocation.The current implementation creates a new
ToTimestampSecondsFuncinstance for eachto_unixtimecall with Utf8 input. While this correctly propagates config options, it may have minor performance implications for high-frequency calls.Consider whether caching or storing the
ToTimestampSecondsFuncwithinToUnixtimeFunc(and updating viawith_updated_config) would be beneficial. However, this is likely a minor concern since:
- Instance creation is cheap (just signature setup)
- The string parsing cost dominates
datafusion/functions/Cargo.toml (1)
74-74: Makechrono-tzoptional behind thedatetime_expressionsfeature for consistency.The
chrono-tzdependency is currently required unconditionally, but it is only used indatafusion/functions/src/datetime/common.rsfor timezone parsing. Following the pattern used for other optional functionality (e.g.,uuidbehindstring_expressions,regexbehindregex_expressions),chrono-tzshould be made optional:-chrono-tz = "0.10.4" +chrono-tz = { version = "0.10.4", optional = true }Then add it to the
datetime_expressionsfeature:# enable datetime functions -datetime_expressions = [] +datetime_expressions = ["chrono-tz"]This reduces compile times for users who don't need datetime functions while maintaining consistency with the feature flag structure.
datafusion/functions/src/datetime/to_date.rs (1)
85-111: Clarify Date32 parse/cast error messagesThe new range checking is good, but both error paths use the message
"Unable to cast to Date32 for converting from i64 to i32 failed", even in the single‑argument branch where no i64→i32 cast is involved. Consider tightening the wording (e.g. distinguish “failed to parse date string” vs “out of Date32 range during i64→i32 cast”) so downstream users can tell parse failures from overflow.datafusion/sqllogictest/test_files/to_timestamp_timezone.slt (1)
64-83: Excellent coverage; fix small comment typo for clarityThe timezone sqllogictests are thorough and nicely exercise naive vs TZ‑bearing inputs and session time zone changes. One tiny nit: the comment for Test 7 says the value “should become
2020-09-08T13:42:29Z-04:00”, but the actual expected output (and format) is2020-09-08T13:42:29-04:00. Dropping the strayZin the comment would avoid confusion when reading the test.datafusion/functions/src/datetime/to_timestamp.rs (2)
287-343: Consider caching parsedTzalongside the string timezone
to_timestamp_implparsesself.timezoneinto aTz(tz.parse::<Tz>()?) on every invocation and then captures it in the closures used byhandle/handle_multiple. Given that aScalarUDFImplinstance is tied to a specificConfigOptions, the timezone doesn’t change for the lifetime of that instance.To avoid repeated parsing overhead, you could store both the original string and a parsed
Option<Tz>in the structs (populated innew_with_config) and pass the cachedOption<Tz>into*_with_timezonehelpers. That would make parsing cost proportional to config changes rather than query invocations.Also applies to: 374-401, 403-430, 783-818
432-438: Update inline comment to mention timezone-aware return typeThe doc comment above
impl ScalarUDFImpl for ToTimestampFuncstill saysto_timestampreturnsTimestamp(Nanosecond), butreturn_typenow returnsTimestamp(Nanosecond, self.timezone.clone()). To avoid confusion for readers, consider updating this comment to reflect the timezone-aware return type (and perhaps just reference theuser_docdescription to avoid duplication).datafusion/functions/src/datetime/common.rs (4)
51-62: Consider caching the UTC timezone constant.Parsing
"UTC"to aTzon every call whentimezoneisNoneis inefficient. Thechrono_tzcrate provideschrono_tz::UTCas a constant.pub(crate) fn string_to_timestamp_nanos_with_timezone( timezone: &Option<Tz>, s: &str, ) -> Result<i64> { - let tz = timezone.unwrap_or("UTC".parse()?); + let tz = timezone.unwrap_or(chrono_tz::UTC); let dt = string_to_datetime(&tz, s)?;
215-227: Same UTC parsing optimization applies here.Line 221 also parses
"UTC"on every call. Apply the samechrono_tz::UTCconstant pattern suggested above.pub(crate) fn string_to_timestamp_nanos_formatted_with_timezone( timezone: &Option<Tz>, s: &str, format: &str, ) -> Result<i64, DataFusionError> { let dt = - string_to_datetime_formatted(&timezone.unwrap_or("UTC".parse()?), s, format)?; + string_to_datetime_formatted(&timezone.unwrap_or(chrono_tz::UTC), s, format)?;
253-286: Unused type parameterTinScalarDataType<T>.The generic parameter
T: PrimIntwithPhantomData<T>is declared but never used in the struct's implementation. Thescalarmethod operates solely onOption<i64>. IfTisn't needed for type-level invariants or future expansion, consider removing it to simplify the API.If
Tis intentionally reserved for future use, add a doc comment explaining the purpose.
45-49: Consider documenting why#[expect(unused)]functions are retained.
string_to_timestamp_nanos_shimandstring_to_timestamp_nanos_formattedare marked as intentionally unused. A brief doc comment explaining whether these are deprecated, reserved for future use, or kept for API stability would improve maintainability.Also applies to: 203-213
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
datafusion/functions/Cargo.toml(1 hunks)datafusion/functions/benches/to_timestamp.rs(9 hunks)datafusion/functions/src/datetime/common.rs(9 hunks)datafusion/functions/src/datetime/mod.rs(4 hunks)datafusion/functions/src/datetime/to_date.rs(3 hunks)datafusion/functions/src/datetime/to_timestamp.rs(31 hunks)datafusion/functions/src/datetime/to_unixtime.rs(1 hunks)datafusion/functions/src/macros.rs(2 hunks)datafusion/sqllogictest/test_files/timestamps.slt(5 hunks)datafusion/sqllogictest/test_files/to_timestamp_timezone.slt(1 hunks)docs/source/user-guide/sql/scalar_functions.md(10 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
Applied to files:
datafusion/functions/benches/to_timestamp.rsdatafusion/functions/src/datetime/to_timestamp.rs
🧬 Code graph analysis (6)
datafusion/functions/benches/to_timestamp.rs (1)
datafusion/functions/src/datetime/to_timestamp.rs (1)
to_timestamp(838-841)
datafusion/functions/src/datetime/to_unixtime.rs (1)
datafusion/functions/src/datetime/to_timestamp.rs (5)
new_with_config(304-313)new_with_config(333-342)new_with_config(362-371)new_with_config(391-400)new_with_config(420-429)
datafusion/functions/src/datetime/mod.rs (1)
datafusion/functions/src/datetime/now.rs (1)
default(50-52)
datafusion/functions/src/macros.rs (3)
datafusion/core/src/execution/session_state.rs (3)
config(235-237)config(753-755)config(1618-1620)datafusion/expr/src/udf.rs (1)
call(151-156)datafusion/catalog/src/table.rs (1)
call(458-458)
datafusion/functions/src/datetime/common.rs (3)
datafusion/functions/src/datetime/to_timestamp.rs (6)
tz(796-796)new(300-302)new(329-331)new(358-360)new(387-389)new(416-418)datafusion/core/src/execution/context/mod.rs (1)
err(2308-2308)datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs (1)
new(67-75)
datafusion/functions/src/datetime/to_timestamp.rs (1)
datafusion/functions/src/datetime/common.rs (2)
string_to_timestamp_nanos_with_timezone(51-62)string_to_timestamp_nanos_formatted_with_timezone(215-227)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: Analyze (rust)
🔇 Additional comments (11)
datafusion/sqllogictest/test_files/timestamps.slt (2)
196-197: LGTM - Good test isolation practice.Adding
RESET datafusion.execution.time_zoneafter timezone-specific tests ensures proper test isolation and prevents timezone settings from leaking into subsequent tests.
2700-2702: Verify the expected output format change is intentional.The expected output changed from including timezone info (like
Timestamp(ns, "+00")) to justTimestamp(ns). This aligns with the PR's objective to respect execution timezone, but please confirm this is the intended behavior for::timestamptzcasts.datafusion/functions/src/macros.rs (2)
44-54: LGTM - Clear documentation of configuration construction paths.The added documentation clearly explains the dual construction paths (convenience wrappers with default config vs. session registration with actual config). This will help maintainers understand the design intent.
73-89: LGTM - Macro extensions follow existing patterns.The new config-enabled variants properly mirror the existing non-config patterns:
@config $arg,for vector arguments (parallel to line 92)@config $($arg)*for variadic arguments (parallel to line 100)Both correctly use
ConfigOptions::default()for convenience wrappers.datafusion/functions/benches/to_timestamp.rs (2)
117-118: LGTM - Config-aware UDF initialization.The benchmark correctly initializes the UDF with config options, aligning with the new config-aware construction pattern introduced in this PR.
120-138: Arc clone placement is correct for benchmarking.Cloning the
Arcoutside theb.iter()closure (line 121) is the right approach. This ensures the clone cost isn't included in the measured benchmark time, which should focus only on theinvoke_with_argsexecution.datafusion/functions/src/datetime/mod.rs (1)
52-59: Config-aware registration forto_timestamp*looks consistentWiring the to_timestamp* UDFs through
make_udf_function_with_config!and@configmirrors the existingnowpattern and ensures sessionConfigOptions(includingexecution.time_zone) are consistently propagated. Thefunctions()helper constructing a defaultConfigOptionsand passing it into these builders also looks correct.Also applies to: 111-130, 262-284
datafusion/functions/src/datetime/to_timestamp.rs (1)
879-911: Fix small test issues in UDF coverage for timezone behaviorThe new tests around execution time zone handling are strong and exercise both direct calls and formatted parsing across all time units, but there are a couple of minor issues:
- In
udfs_and_timeunit()and intest_no_tz,ToTimestampSecondsFuncis included twice andToTimestampMicrosFuncis omitted. This means some assertions (e.g. return type / TZ behavior) don’t cover the micros UDF.- In
test_no_tz, the behavior being asserted (that return types and arrays areTimestamp(_, None)whenConfigOptions::default()is used) is correct, but once you addToTimestampMicrosFuncyou should ensure the expectations include it as well.Adding
ToTimestampMicrosFunc::new_with_config(&ConfigOptions::default())to the relevantudfsvectors (and removing the duplicate seconds entry) would fully align coverage with the five implemented ToTimestamp variants.- fn udfs_and_timeunit() -> Vec<(Box<dyn ScalarUDFImpl>, TimeUnit)> { - let udfs: Vec<(Box<dyn ScalarUDFImpl>, TimeUnit)> = vec![ + fn udfs_and_timeunit() -> Vec<(Box<dyn ScalarUDFImpl>, TimeUnit)> { + let udfs: Vec<(Box<dyn ScalarUDFImpl>, TimeUnit)> = vec![ ( Box::new(ToTimestampFunc::new_with_config(&ConfigOptions::default())), Nanosecond, ), ( Box::new(ToTimestampSecondsFunc::new_with_config( &ConfigOptions::default(), )), Second, ), ( Box::new(ToTimestampMillisFunc::new_with_config( &ConfigOptions::default(), )), Millisecond, ), ( - Box::new(ToTimestampMicrosFunc::new_with_config( - &ConfigOptions::default(), - )), - Microsecond, - ), - ( - Box::new(ToTimestampNanosFunc::new_with_config( - &ConfigOptions::default(), - )), - Nanosecond, - ), - ]; + Box::new(ToTimestampMicrosFunc::new_with_config( + &ConfigOptions::default(), + )), + Microsecond, + ), + ( + Box::new(ToTimestampNanosFunc::new_with_config( + &ConfigOptions::default(), + )), + Nanosecond, + ), + ]; udfs }And similarly, update the
udfsvector intest_no_tzto includeToTimestampMicrosFuncinstead of a secondToTimestampSecondsFunc.Also applies to: 1005-1113, 1115-1258, 1260-1365, 1606-1721
⛔ Skipped due to learnings
Learnt from: martin-augment Repo: martin-augment/datafusion PR: 5 File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005 Timestamp: 2025-11-06T13:37:46.028Z Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.datafusion/functions/src/datetime/common.rs (3)
18-40: Imports align with new timezone-aware functionality.The added imports (
PhantomData,Tz,TimeUnit,PrimInt,ToPrimitive,internal_datafusion_err) are appropriate for the newScalarDataType<T>generic type and timezone-aware parsing helpers.
288-334: Handle function correctly integratesScalarDataType<T>.The scalar path now uses
sdt.scalar()for result construction, with properto_i64()conversion. The conversion fromO::Nativetoi64is safe for the supported timestamp and date types.
339-422: Handle_multiple function correctly mirrors thehandlerefactoring.The scalar path at lines 404-407 uses
sdt.scalar()consistently with thehandlefunction, centralizing result construction.
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | ||
| // grab the string after the last space as the named timezone | ||
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | ||
| let timezone_name = parts[0]; | ||
| datetime_str = parts[1]; | ||
|
|
||
| // attempt to parse the timezone name | ||
| let result: Result<chrono_tz::Tz, chrono_tz::ParseError> = timezone_name.parse(); | ||
| let Ok(tz) = result else { | ||
| return Err(err(&result.unwrap_err().to_string())); | ||
| }; | ||
|
|
||
| // successfully parsed the timezone name, remove the ' %Z' from the format | ||
| format = format.trim_end_matches(" %Z"); | ||
|
|
||
| Some(tz) |
There was a problem hiding this comment.
Potential panic if input string lacks a space before the timezone.
If datetime_str contains no space (e.g., "UTC" with format " %Z"), rsplitn(2, ' ') returns a single element, and accessing parts[1] at line 121 will panic with an index out-of-bounds error.
Add a bounds check or return an error for malformed input:
let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") {
// grab the string after the last space as the named timezone
let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect();
+ if parts.len() < 2 {
+ return Err(err("expected a space before the timezone name"));
+ }
let timezone_name = parts[0];
datetime_str = parts[1];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | |
| // grab the string after the last space as the named timezone | |
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | |
| let timezone_name = parts[0]; | |
| datetime_str = parts[1]; | |
| // attempt to parse the timezone name | |
| let result: Result<chrono_tz::Tz, chrono_tz::ParseError> = timezone_name.parse(); | |
| let Ok(tz) = result else { | |
| return Err(err(&result.unwrap_err().to_string())); | |
| }; | |
| // successfully parsed the timezone name, remove the ' %Z' from the format | |
| format = format.trim_end_matches(" %Z"); | |
| Some(tz) | |
| let tz: Option<chrono_tz::Tz> = if format.ends_with(" %Z") { | |
| // grab the string after the last space as the named timezone | |
| let parts: Vec<&str> = datetime_str.rsplitn(2, ' ').collect(); | |
| if parts.len() < 2 { | |
| return Err(err("expected a space before the timezone name")); | |
| } | |
| let timezone_name = parts[0]; | |
| datetime_str = parts[1]; | |
| // attempt to parse the timezone name | |
| let result: Result<chrono_tz::Tz, chrono_tz::ParseError> = timezone_name.parse(); | |
| let Ok(tz) = result else { | |
| return Err(err(&result.unwrap_err().to_string())); | |
| }; | |
| // successfully parsed the timezone name, remove the ' %Z' from the format | |
| format = format.trim_end_matches(" %Z"); | |
| Some(tz) |
🤖 Prompt for AI Agents
In datafusion/functions/src/datetime/common.rs around lines 117 to 132, the code
assumes rsplitn(2, ' ') yields two parts and directly indexes parts[1], which
will panic if datetime_str contains no space; add a bounds check on parts.len()
(or use pattern matching) and return an Err for malformed input when there is no
space-separated timezone, e.g., detect when parts.len() < 2 and call the
existing err(...) with a clear message instead of indexing, otherwise proceed to
parse timezone_name and adjust format as before.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The allocation of the Vector data structure could be avoided by using rsplit_once(' ') instead. This could save some CPU cycles for allocating/deallocating the temporary Vec.
| #[user_doc( | ||
| doc_section(label = "Time and Date Functions"), | ||
| description = r#" | ||
| Converts a value to a timestamp (`YYYY-MM-DDT00:00:00Z`). Supports strings, integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no [Chrono formats] are provided. Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (`1970-01-01T00:00:00Z`). Returns the corresponding timestamp. | ||
|
|
||
| Note: `to_timestamp` returns `Timestamp(ns)`. The supported range for integer input is between `-9223372037` and `9223372036`. Supported range for string input is between `1677-09-21T00:12:44.0` and `2262-04-11T23:47:16.0`. Please use `to_timestamp_seconds` for the input outside of supported bounds. | ||
| Converts a value to a timestamp (`YYYY-MM-DDT00:00:00.000000<TZ>`) in the session time zone. Supports strings, | ||
| integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') | ||
| if no [Chrono formats] are provided. Strings that parse without a time zone are treated as if they are in the | ||
| session time zone, or UTC if no session time zone is set. | ||
| Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (`1970-01-01T00:00:00Z`). | ||
|
|
||
| Note: `to_timestamp` returns `Timestamp(ns, TimeZone)` where the time zone is the session time zone. The supported range | ||
| for integer input is between`-9223372037` and `9223372036`. Supported range for string input is between | ||
| `1677-09-21T00:12:44.0` and `2262-04-11T23:47:16.0`. Please use `to_timestamp_seconds` | ||
| for the input outside of supported bounds. | ||
|
|
||
| The session time zone can be set using the statement `SET TIMEZONE = 'desired time zone'`. | ||
| The time zone can be a value like +00:00, 'Europe/London' etc. | ||
| "#, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for both SET TIMEZONE and SET datafusion.execution.time_zone patterns
echo "=== Searching for SET TIMEZONE patterns ==="
rg -n "SET\s+TIMEZONE" datafusion/ -S
echo -e "\n=== Searching for datafusion.execution.time_zone patterns ==="
rg -n "datafusion\.execution\.time_zone" datafusion/ -S
echo -e "\n=== Searching for SET.*time.*zone (case-insensitive) ==="
rg -n "SET.*time.*zone" datafusion/ -i | head -30Repository: martin-augment/datafusion
Length of output: 8594
🏁 Script executed:
#!/bin/bash
# Check the actual file to see the documentation wording and format examples
echo "=== Checking to_timestamp.rs file structure and doc comments ==="
head -300 datafusion/functions/src/datetime/to_timestamp.rs | cat -nRepository: martin-augment/datafusion
Length of output: 19194
🏁 Script executed:
#!/bin/bash
# Look for the doc comment issue about Timestamp(Nanosecond) without TZ
echo "=== Searching for impl ScalarUDFImpl for ToTimestampFunc ==="
rg -A 10 "impl ScalarUDFImpl for ToTimestampFunc" datafusion/functions/src/datetime/to_timestamp.rsRepository: martin-augment/datafusion
Length of output: 348
🏁 Script executed:
#!/bin/bash
# Check test_no_tz for the coverage issue (ToTimestampSecondsFunc twice, missing ToTimestampMicrosFunc)
echo "=== Searching for test_no_tz ==="
rg -A 30 "fn test_no_tz" datafusion/functions/src/datetime/to_timestamp.rsRepository: martin-augment/datafusion
Length of output: 1465
🏁 Script executed:
#!/bin/bash
# Get complete test_no_tz function to verify ToTimestampMicrosFunc presence
rg -A 50 "fn test_no_tz" datafusion/functions/src/datetime/to_timestamp.rs | head -80Repository: martin-augment/datafusion
Length of output: 2375
🏁 Script executed:
#!/bin/bash
# Search for the impl ScalarUDFImpl doc and return_type
rg -B 5 "impl ScalarUDFImpl for ToTimestampFunc" datafusion/functions/src/datetime/to_timestamp.rs | head -20Repository: martin-augment/datafusion
Length of output: 490
🏁 Script executed:
#!/bin/bash
# Search for return_type method implementations to see what they return
rg -A 3 "fn return_type" datafusion/functions/src/datetime/to_timestamp.rs | head -50Repository: martin-augment/datafusion
Length of output: 763
Align timezone configuration syntax and fix format example inconsistencies in user_doc
Two documentation issues and one test issue warrant attention:
-
Format example confusion (line 242, to_timestamp_nanos): The pattern
YYYY-MM-DDT00:00:00.000000000Z<TZ>mixes a literalZwith a<TZ>placeholder. This is inconsistent with the other timestamp functions (lines 41, 95, 144, 193), which use only<TZ>. Change line 242 toYYYY-MM-DDT00:00:00.000000000<TZ>for consistency. -
Documentation syntax inconsistency: Lines 52, 101, 150, 199, and 247 document
SET TIMEZONE = 'desired time zone', but other datetime functions (e.g.,current_time.rs,current_date.rs) documentSET datafusion.execution.time_zone = .... Both syntaxes are supported and work as aliases in the codebase. Consider standardizing on the canonical config syntax (datafusion.execution.time_zone) across all datetime function docs for consistency with system documentation. -
Test coverage bug in
test_no_tz: The udfs vector includesToTimestampSecondsFunctwice but omitsToTimestampMicrosFunc. The test creates amicros_timestampsarray and includes it in the arrays loop, but no corresponding UDF instance tests it. AddBox::new(ToTimestampMicrosFunc::new_with_config(&ConfigOptions::default()))and remove the duplicateToTimestampSecondsFunc.
| Converts a value to a timestamp (`YYYY-MM-DDT00:00:00.000000<TZ>`) in the session time zone. Supports strings, | ||
| integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') | ||
| if no [Chrono formats] are provided. Strings that parse without a time zone are treated as if they are in the | ||
| session time zone, or UTC if no session time zone is set. | ||
| Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (`1970-01-01T00:00:00Z`). | ||
|
|
||
| Note: `to_timestamp` returns `Timestamp(ns)`. The supported range for integer input is between `-9223372037` and `9223372036`. Supported range for string input is between `1677-09-21T00:12:44.0` and `2262-04-11T23:47:16.0`. Please use `to_timestamp_seconds` for the input outside of supported bounds. | ||
| Note: `to_timestamp` returns `Timestamp(ns, TimeZone)` where the time zone is the session time zone. The supported range | ||
| for integer input is between`-9223372037` and `9223372036`. Supported range for string input is between | ||
| `1677-09-21T00:12:44.0` and `2262-04-11T23:47:16.0`. Please use `to_timestamp_seconds` | ||
| for the input outside of supported bounds. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for SET TIMEZONE syntax patterns in the codebase
echo "=== Searching for SET TIMEZONE patterns ==="
rg "SET\s+TIMEZONE" -S -B 1 -A 1 2>/dev/null | head -40
echo -e "\n=== Searching for datafusion.execution.time_zone ==="
rg "datafusion\.execution\.time_zone|execution\.time_zone" -S -B 1 -A 1 2>/dev/null | head -40
echo -e "\n=== Checking format patterns in scalar_functions.md ==="
sed -n '2778,2787p' docs/source/user-guide/sql/scalar_functions.md
echo "---"
sed -n '2869,2877p' docs/source/user-guide/sql/scalar_functions.md
echo "---"
rg "current_time|current_timestamp" docs/source/user-guide/sql/scalar_functions.md -A 10 | head -50Repository: martin-augment/datafusion
Length of output: 9658
🏁 Script executed:
#!/bin/bash
# Find where this markdown file is generated from (look for Rust source)
echo "=== Looking for user_doc attributes in Rust code ==="
find . -name "*.rs" -type f -exec rg -l "user_doc.*to_timestamp" {} \; 2>/dev/null | head -10
echo -e "\n=== Check if md file has generation comment ==="
head -20 docs/source/user-guide/sql/scalar_functions.md
echo -e "\n=== Search in Rust files for SET TIMEZONE syntax ==="
rg "SET.*TIMEZONE|execution.*time_zone" --type rs -B 2 -A 2 2>/dev/null | head -60Repository: martin-augment/datafusion
Length of output: 1024
🏁 Script executed:
#!/bin/bash
# Verify the exact format pattern used - check for Z<TZ> vs just <TZ>
echo "=== Check to_timestamp_nanos pattern at lines 2869-2877 ==="
sed -n '2869,2877p' docs/source/user-guide/sql/scalar_functions.md
echo -e "\n=== Check to_timestamp_micros ==="
sed -n '2826,2834p' docs/source/user-guide/sql/scalar_functions.md
echo -e "\n=== Check to_timestamp_millis ==="
sed -n '2912,2919p' docs/source/user-guide/sql/scalar_functions.md
echo -e "\n=== Check to_timestamp_seconds ==="
sed -n '2954,2962p' docs/source/user-guide/sql/scalar_functions.md
echo -e "\n=== Look for actual Z<TZ> pattern in markdown ==="
rg "000.*Z<TZ>|Z<TZ>" docs/source/user-guide/sql/scalar_functions.mdRepository: martin-augment/datafusion
Length of output: 3021
🏁 Script executed:
#!/bin/bash
# Verify what the Rust source says about format
echo "=== Checking Rust source for to_timestamp functions ==="
rg -A 5 "user_doc.*to_timestamp" datafusion/functions/src/datetime/to_timestamp.rs | head -100Repository: martin-augment/datafusion
Length of output: 124
🏁 Script executed:
#!/bin/bash
# Find the to_timestamp.rs file with more aggressive search
echo "=== Finding datetime files ==="
find . -name "*timestamp*" -type f | grep -v ".git" | grep -v node_modules
echo -e "\n=== Search in datetime directory ==="
ls -la datafusion/functions/src/datetime/ 2>/dev/null || echo "Path not found"
echo -e "\n=== Search for user_doc with to_timestamp ==="
rg -l "to_timestamp" --type rs | head -10Repository: martin-augment/datafusion
Length of output: 2932
🏁 Script executed:
#!/bin/bash
# Get content of to_timestamp.rs to see user_doc attributes
echo "=== Content of to_timestamp.rs (first 200 lines) ==="
head -200 datafusion/functions/src/datetime/to_timestamp.rs 2>/dev/null | cat -n
echo -e "\n=== Look for Z<TZ> pattern in Rust source ==="
rg "Z<TZ>" --type rs -B 2 -A 2Repository: martin-augment/datafusion
Length of output: 12948
🏁 Script executed:
#!/bin/bash
# Search for more to_timestamp_millis and to_timestamp_nanos definitions in Rust
echo "=== Check to_timestamp_millis in Rust (around line 141-189) ==="
sed -n '141,189p' datafusion/functions/src/datetime/to_timestamp.rs
echo -e "\n=== Check to_timestamp_nanos in Rust ==="
rg -n "to_timestamp_nanos" datafusion/functions/src/datetime/to_timestamp.rs -A 15 | head -50
echo -e "\n=== Check if there's a build script that generates the markdown ==="
find . -name "build.rs" -o -name "*generate*" -o -name "*doc*" | grep -E "(build|generate|doc)" | head -20Repository: martin-augment/datafusion
Length of output: 6497
🏁 Script executed:
#!/bin/bash
# Find to_timestamp_nanos description in Rust
echo "=== Search for to_timestamp_nanos description (user_doc) ==="
rg -B 15 "to_timestamp_nanos.*expression" datafusion/functions/src/datetime/to_timestamp.rs | grep -A 15 "user_doc" | head -30
echo -e "\n=== Get lines around 230-250 ==="
sed -n '230,250p' datafusion/functions/src/datetime/to_timestamp.rs
echo -e "\n=== Look at dev/update_function_docs.sh ==="
cat dev/update_function_docs.shRepository: martin-augment/datafusion
Length of output: 15745
Fix timestamp format pattern in to_timestamp_nanos
The format pattern for to_timestamp_nanos in the Rust source contains an erroneous Z<TZ> combination. The pattern should be either a literal Z (for UTC) or the <TZ> placeholder alone, not both. Update the user_doc description in datafusion/functions/src/datetime/to_timestamp.rs to use YYYY-MM-DDT00:00:00.000000000<TZ> instead of YYYY-MM-DDT00:00:00.000000000Z<TZ>, then regenerate the documentation using dev/update_function_docs.sh.
Also applies to: 2826-2834, 2869-2877, 2912-2919, 2954-2962 (note: other functions have correct patterns; only to_timestamp_nanos requires fixes).
🤖 Prompt for AI Agents
In docs/source/user-guide/sql/scalar_functions.md around lines 2778 to 2787, the
documented timestamp pattern for to_timestamp_nanos incorrectly shows a combined
literal Z and timezone placeholder (`Z<TZ>`); update the Rust source user_doc in
datafusion/functions/src/datetime/to_timestamp.rs to use the corrected pattern
YYYY-MM-DDT00:00:00.000000000<TZ> (remove the extra literal Z), then run
dev/update_function_docs.sh to regenerate the docs; repeat the same fix for the
other occurrences noted (lines ~2826-2834, 2869-2877, 2912-2919, 2954-2962) so
the generated documentation no longer contains `Z<TZ>`.
value:useful; category:bug; feedback:The Claude AI reviewer is correct! There is no need to re-parse "UTC" as a Tz (timezone). There is a chrono_tz::UTC constant that could be reused instead. |
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! There is no need to re-parse "UTC" as a Tz (timezone). There is a chrono_tz::UTC constant that could be reused instead. |
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! There is no need to re-parse "UTC" as a Tz (timezone). There is a chrono_tz::UTC constant that could be reused instead. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! The allocation of the Vector data structure could be avoided by using |
value:good-to-have; category:typo; feedback:The Claude AI reviewer is correct! There is a typo in the comment that looks like it is negating the rest of the sentence. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Instead of annotating those functions as unused it would be better to directly remove them since they are not part of the public APIs. Prevents adding technical dept. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! By using string slicing the new String heap allocation could be avoided. Prevents allocation of a small String for a short period of time. |
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The pattern match arm should be extended to support LargeUtf8 and Utf8View for consistency with the rest of text-like functionality. |
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The new dependency should be gated behind the "datetime_expressions" feature. Prevents bloating the library/application with extra dependency when it is not really used. |
value:good-to-have; category:typo; feedback:The CodeRabbit AI reviewer is correct! The example in the comments uses an extra 'Z' that is invalid in this position and it should be removed. |
19078: To review by AI