-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up string casts and improve performance #2284
Conversation
@@ -1247,19 +1251,6 @@ const fn time_unit_multiple(unit: &TimeUnit) -> i64 { | |||
} | |||
} | |||
|
|||
/// Number of seconds in a day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were already defined in temporal_conversion.rs
so use those definitions instead of duplicating them)
@@ -1463,35 +1454,28 @@ where | |||
<T as ArrowPrimitiveType>::Native: lexical_core::FromLexical, | |||
{ | |||
if cast_options.safe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically applied the same transformation to all the kernels I modified -- I changed the safe
path to use iter().map(|v| v.and_then)
and I changed the not safe
path to use iter().map(|v| v.map().transpose)
if from.is_null(i) { | ||
Ok(None) | ||
} else { | ||
let string = from.value(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this code was creating an owned String
for each element, so I removed it. The same pattern was repeated in all the other kernels I modified in this PR
5f14220
to
68df4fa
Compare
} else { | ||
let string = string_array | ||
.value(i); | ||
chrono::Duration::days(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be an unused random chrono::Duration::days(3);
construction 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh, I thought that was removed from my branch, but there were multiple instances. Sorry about that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are nice, concise improvements to the previous implementation and will provide good examples for future developers.
+ time.nanosecond() as i64 / NANOS_PER_MICRO) | ||
.map_err(|_| { | ||
ArrowError::CastError( | ||
format!("Cannot cast string '{}' to value of arrow::datatypes::types::Time64MicrosecondType type", v), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think Time64MicrosecondType
is enough. But just nitpicking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- they are heinous -- here is a proposal to fix them: #2295
As the title says, there should be performance improvement. However, I don't see it when doing benchmarks on my laptop:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting this will in most cases boil down to the same thing, unfortunately ArrayIter has the same naive null mask handling, but once we fix that this code will benefit 👍
The code is also more concise and easier to read so
Benchmark runs are scheduled for baseline = 1f9973c and contender = ec83638. ec83638 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thanks for checking @HaoYang670 |
Draft until #2283 is mergedWhich issue does this PR close?
Closes #2285
Rationale for this change
Here are some cleanups I suggested to @stuartcarnie of while reviewing #2251
-- #2251 (comment). Turns out not only did I give a bad suggestion (🤦 ), he was following the existing pattern in this file.
Since I was already working on this, I figured I would finish up the change so that new kernels don't get the same copy/paste code.
I think this both makes the code easier to read (it is less verbose) and it should improve performance by removing bounds checks (and unnecessary
String
construction and allocation)What changes are included in this PR?
Iterator
s rather thanvalue()
andis_null
and remove severalto_string()
callstemporal_conversion.rs
Are there any user-facing changes?
No (maybe faster cast kernels)