-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize make_date (#9089) #9600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vojtechtoman for your contribution
This looks like good change, would you mind to write a small benchmark or test to show the difference in make_date
?
I think there is already a benchmark: cargo bench --bench make_date |
Also I think #9601 moves this code and this will cause a conflict |
1c07d91
to
11a17ff
Compare
* replace the expensive calculation of unix_days_from_ce with a constant * do not use PrimitiveArray builder for the scalar case
11a17ff
to
4c3fa93
Compare
I am seeing these results:
|
Ok(value) | ||
} | ||
|
||
fn process_date( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have more intuitive name for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like make_date_inner
fn process_date( | |
/// Converts the year/month/day fields to a `i32` | |
/// representing the days from the unix epoch | |
/// and invokes date_consumer_fn with the value | |
fn make_date_inner( |
let Ok(m) = u32::try_from(month) else { | ||
return exec_err!("Month value '{month:?}' is out of range"); | ||
}; | ||
let Ok(d) = u32::try_from(day) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is u32
from i32
, perhaps its better to align types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what you are suggesting here @comphead I think this code is moved from above
The fact that arrow uses i32
for the subfields rather than u32
is somewhat non ideal I agree, but this code I believe just follows the standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u32:: because NaiveDate::from_ymd_opt(year, m, d) requires that for m & d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks folks, my point was prob to have the method signature to receive u32
instead of i32
, the method works with dates and u32 more natural for the date imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree u32
would be more natural. I think the i32
is used because that is what the underlying APIs require
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vojtechtoman for improving the PR and @Omega359 for running the bench, 15%-25% increase sounds thrilling
@vojtechtoman please address some minors
make_date migration to functions has been merged to main. |
If you are interested @vojtechtoman to_timestamp likely could use optimization as well (#9090) |
Sure, let me take a look once I am done with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vojtechtoman and @Omega359 and @comphead for the reviews
I think this PR could be merged as is, however I agree with @comphead that there are some small cleanups that would make it better.
Thanks again
Ok(value) | ||
} | ||
|
||
fn process_date( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like make_date_inner
fn process_date( | |
/// Converts the year/month/day fields to a `i32` | |
/// representing the days from the unix epoch | |
/// and invokes date_consumer_fn with the value | |
fn make_date_inner( |
let Ok(m) = u32::try_from(month) else { | ||
return exec_err!("Month value '{month:?}' is out of range"); | ||
}; | ||
let Ok(d) = u32::try_from(day) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what you are suggesting here @comphead I think this code is moved from above
The fact that arrow uses i32
for the subfields rather than u32
is somewhat non ideal I agree, but this code I believe just follows the standard
I took the liberty of merging this PR up from main to resolve conflicts, and implemented the suggestion in #9600 (comment) while I had it checked out |
Thanks again @vojtechtoman and @Omega359 |
Thanks everyone! |
Which issue does this PR close?
Closes #9089.
What changes are included in this PR?
This PR introduces further optimizations in
make_date
:unix_days_from_ce
with a constantPrimitiveArray
builder for the scalar caseAre these changes tested?
No new tests needed (no changes to functionality, all covered by existing tests).
Performance is tracked by existing benchmarks for
make_date
. Compared to the previous implementation, this PR shows (on my machine) about 10% improvement for the cases involving arrays and about 20% for the scalars-only case.Are there any user-facing changes?
N/A