Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize make_date (#9089) #9600

Merged
merged 3 commits into from
Mar 17, 2024
Merged

Conversation

vojtechtoman
Copy link
Contributor

Which issue does this PR close?

Closes #9089.

What changes are included in this PR?

This PR introduces further optimizations in make_date:

  • replace the expensive calculation of unix_days_from_ce with a constant
  • do not use PrimitiveArray builder for the scalar case

Are these changes tested?

No new tests needed (no changes to functionality, all covered by existing tests).

Performance is tracked by existing benchmarks for make_date. Compared to the previous implementation, this PR shows (on my machine) about 10% improvement for the cases involving arrays and about 20% for the scalars-only case.

Are there any user-facing changes?

N/A

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 13, 2024
@vojtechtoman vojtechtoman mentioned this pull request Mar 13, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vojtechtoman for your contribution
This looks like good change, would you mind to write a small benchmark or test to show the difference in make_date ?

@alamb
Copy link
Contributor

alamb commented Mar 13, 2024

I think there is already a benchmark:

cargo bench --bench make_date

@alamb
Copy link
Contributor

alamb commented Mar 13, 2024

Also I think #9601 moves this code and this will cause a conflict

@vojtechtoman vojtechtoman force-pushed the 9089-optimize-make-date branch from 1c07d91 to 11a17ff Compare March 14, 2024 07:50
* replace the expensive calculation of unix_days_from_ce with a constant

* do not use PrimitiveArray builder for the scalar case
@vojtechtoman vojtechtoman force-pushed the 9089-optimize-make-date branch from 11a17ff to 4c3fa93 Compare March 14, 2024 08:52
@Omega359
Copy link
Contributor

I am seeing these results:

❯ cargo bench --bench make_date
   Compiling datafusion-functions v36.0.0 (/opt/dev/arrow-datafusion/datafusion/functions)
    Finished bench [optimized] target(s) in 1m 08s
     Running benches/make_date.rs (target/release/deps/make_date-f60eb5f26259ede9)
Gnuplot not found, using plotters backend
make_date_col_col_col_1000
                        time:   [6.7777 µs 6.8467 µs 6.9200 µs]
                        change: [-25.673% -23.503% -21.299%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

make_date_scalar_col_col_1000
                        time:   [7.1573 µs 7.3016 µs 7.4539 µs]
                        change: [-18.183% -16.227% -14.454%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

make_date_scalar_scalar_col_1000
                        time:   [7.2130 µs 7.4215 µs 7.6690 µs]
                        change: [-14.990% -12.493% -9.6101%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Ok(value)
}

fn process_date(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have more intuitive name for the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like make_date_inner

Suggested change
fn process_date(
/// Converts the year/month/day fields to a `i32`
/// representing the days from the unix epoch
/// and invokes date_consumer_fn with the value
fn make_date_inner(

let Ok(m) = u32::try_from(month) else {
return exec_err!("Month value '{month:?}' is out of range");
};
let Ok(d) = u32::try_from(day) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is u32 from i32, perhaps its better to align types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you are suggesting here @comphead I think this code is moved from above

The fact that arrow uses i32 for the subfields rather than u32 is somewhat non ideal I agree, but this code I believe just follows the standard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u32:: because NaiveDate::from_ymd_opt(year, m, d) requires that for m & d.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks folks, my point was prob to have the method signature to receive u32 instead of i32, the method works with dates and u32 more natural for the date imho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree u32 would be more natural. I think the i32 is used because that is what the underlying APIs require

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vojtechtoman for improving the PR and @Omega359 for running the bench, 15%-25% increase sounds thrilling

@vojtechtoman please address some minors

@Omega359
Copy link
Contributor

make_date migration to functions has been merged to main.

@Omega359
Copy link
Contributor

If you are interested @vojtechtoman to_timestamp likely could use optimization as well (#9090)

@vojtechtoman
Copy link
Contributor Author

vojtechtoman commented Mar 15, 2024

If you are interested @vojtechtoman to_timestamp likely could use optimization as well (#9090)

Sure, let me take a look once I am done with this.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vojtechtoman and @Omega359 and @comphead for the reviews

I think this PR could be merged as is, however I agree with @comphead that there are some small cleanups that would make it better.

Thanks again

Ok(value)
}

fn process_date(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like make_date_inner

Suggested change
fn process_date(
/// Converts the year/month/day fields to a `i32`
/// representing the days from the unix epoch
/// and invokes date_consumer_fn with the value
fn make_date_inner(

let Ok(m) = u32::try_from(month) else {
return exec_err!("Month value '{month:?}' is out of range");
};
let Ok(d) = u32::try_from(day) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you are suggesting here @comphead I think this code is moved from above

The fact that arrow uses i32 for the subfields rather than u32 is somewhat non ideal I agree, but this code I believe just follows the standard

@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Mar 16, 2024
@alamb
Copy link
Contributor

alamb commented Mar 16, 2024

I took the liberty of merging this PR up from main to resolve conflicts, and implemented the suggestion in #9600 (comment) while I had it checked out

@alamb alamb merged commit 9f364a4 into apache:main Mar 17, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 17, 2024

Thanks again @vojtechtoman and @Omega359

@vojtechtoman
Copy link
Contributor Author

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize make_date
4 participants