Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize make_date #9089

Closed
Tracked by #10171
alamb opened this issue Jan 31, 2024 · 6 comments · Fixed by #9600
Closed
Tracked by #10171

Optimize make_date #9089

alamb opened this issue Jan 31, 2024 · 6 comments · Fixed by #9600
Labels
enhancement New feature or request good first issue Good for newcomers performance Make DataFusion faster

Comments

@alamb
Copy link
Contributor

alamb commented Jan 31, 2024

Is your feature request related to a problem or challenge?

I think there is significant performance to be had when creating dates and now thanks to @Omega359 there are benchmarks

To run the benchmarks, do:

cargo bench --bench make_date

Describe the solution you'd like

Make the function faster, as defined by the command above

The high level procedure for this is to profile the function and figure out how to make it go faster while still computing the same results

Describe alternatives you've considered

It may not be possible to make this faster and in that case no worries.

Additional context

I think this is actually a pretty good initial project for someone because

  1. The code is relatively straightforward
  2. The benchmark is in place
@alamb alamb added enhancement New feature or request good first issue Good for newcomers performance Make DataFusion faster labels Jan 31, 2024
@r3stl355
Copy link
Contributor

r3stl355 commented Feb 1, 2024

A quick experiment replacing value_fn with pre-computed arrays shows about 50% improvement for benchmarks using columns but about 20% regression for scalar only benchmark (I think I should be able to get that scalar version back to baseline). If you think this is significant I can carry on working on it

@alamb
Copy link
Contributor Author

alamb commented Feb 1, 2024

Sounds pretty great to me -- thanks @r3stl355

@Omega359
Copy link
Contributor

I think with #9112 this issue could be closed @alamb ?

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2024

I think with #9112 this issue could be closed @alamb ?

I agree -- thank you for the ping

@alamb alamb closed this as completed Mar 11, 2024
vojtechtoman pushed a commit to vojtechtoman/arrow-datafusion that referenced this issue Mar 13, 2024
* replace the expensive calculation of unix_days_from_ce with a constant

* do not use PrimitiveArray builder for the scalar case
@vojtechtoman
Copy link
Contributor

@alamb I was looking at this out of interest and I just noticed that this issue has been closed in the meantime. I introduced some further improvements in #9600 which lead to about 10% performance gain for the cases involving arrays and about 20% for the scalars-only case.

@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2024

Thank you @vojtechtoman -- very exciting

vojtechtoman pushed a commit to vojtechtoman/arrow-datafusion that referenced this issue Mar 14, 2024
* replace the expensive calculation of unix_days_from_ce with a constant

* do not use PrimitiveArray builder for the scalar case
vojtechtoman pushed a commit to vojtechtoman/arrow-datafusion that referenced this issue Mar 14, 2024
* replace the expensive calculation of unix_days_from_ce with a constant

* do not use PrimitiveArray builder for the scalar case
alamb added a commit that referenced this issue Mar 17, 2024
* Optimize make_date (#9089)

* replace the expensive calculation of unix_days_from_ce with a constant

* do not use PrimitiveArray builder for the scalar case

* Improve function name

---------

Co-authored-by: Vojtech Toman <vojtech.toman@graphaware.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers performance Make DataFusion faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants