-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-3320: [C++] Improve float parsing performance #2625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Is google/double-conversion a small enough codebase to vendor similar to what you did with Abseil (either submodule or simply |
|
I see this is in conda-forge so it may not be a big deal |
Codecov Report
@@ Coverage Diff @@
## master #2625 +/- ##
==========================================
+ Coverage 87.19% 88.24% +1.04%
==========================================
Files 381 319 -62
Lines 59223 55515 -3708
==========================================
- Hits 51642 48990 -2652
+ Misses 7507 6525 -982
+ Partials 74 0 -74Continue to review full report at Codecov.
|
|
The codebase looks small indeed. The OTOH, building double-conversion from scratch (as an external project) is reasonably quick too. What do you suggest? |
|
Let's stick with ExternalProject for now; this perhaps puts more pressure for the upstream project and conda-forge packages to be maintained |
cpp/src/arrow/util/parsing.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how double-conversion handle negative ints when length > 2^31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realistically, it probably doesn't and expects positive input.
da82bee to
e3be939
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except we should make sure that double-conversion does not leak into the public API. I might go so far as to rename as arrow/util/parser-internal.h
cpp/src/arrow/util/parsing.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow/util/parsing.h is being installed, since you are including a thirdparty header, we should stop installing the header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the header should not be installed anymore.
This relies on Google's double-conversion library at https://github.com/google/double-conversion Before: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- [...] BM_FloatParsing<FloatType> 4621 ns 4620 ns 151802 1.65125M items/s BM_FloatParsing<DoubleType> 4629 ns 4628 ns 150171 1.64846M items/s ``` After: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- [...] BM_FloatParsing<FloatType> 563 ns 563 ns 1240915 13.5616M items/s BM_FloatParsing<DoubleType> 313 ns 313 ns 2227610 24.3934M items/s ```
e3be939 to
1044598
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks! waiting for the build to run then will merge
|
Merging. The build only failing because of the known flakiness with the Java Flight tests and the new macOS failure that's shown up in the last 24 hours |
This relies on Google's double-conversion library at https://github.com/google/double-conversion
Before:
After: