-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-3321: [C++] Improve integer parsing performance #2619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before: --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_IntegerParsing<Int8Type> 1388 ns 1387 ns 502861 5.49911M items/s BM_IntegerParsing<Int16Type> 1475 ns 1475 ns 468724 5.17179M items/s BM_IntegerParsing<Int32Type> 1730 ns 1729 ns 405693 4.41194M items/s BM_IntegerParsing<Int64Type> 2131 ns 2131 ns 328192 3.58034M items/s BM_IntegerParsing<UInt8Type> 1238 ns 1238 ns 572573 6.16483M items/s BM_IntegerParsing<UInt16Type> 1302 ns 1301 ns 537960 5.86206M items/s BM_IntegerParsing<UInt32Type> 1391 ns 1391 ns 502859 5.4857M items/s BM_IntegerParsing<UInt64Type> 1637 ns 1637 ns 427832 4.661M items/s BM_FloatParsing<FloatType> 4437 ns 4436 ns 156887 1.71973M items/s BM_FloatParsing<DoubleType> 4593 ns 4592 ns 152459 1.66129M items/s After: --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_IntegerParsing<Int8Type> 23 ns 23 ns 29800687 324.788M items/s BM_IntegerParsing<Int16Type> 27 ns 27 ns 26593165 287.438M items/s BM_IntegerParsing<Int32Type> 34 ns 34 ns 20689813 226.211M items/s BM_IntegerParsing<Int64Type> 49 ns 49 ns 14256379 155.424M items/s BM_IntegerParsing<UInt8Type> 17 ns 17 ns 42295211 454.911M items/s BM_IntegerParsing<UInt16Type> 16 ns 16 ns 42663172 464.397M items/s BM_IntegerParsing<UInt32Type> 21 ns 21 ns 33372432 363.209M items/s BM_IntegerParsing<UInt64Type> 33 ns 33 ns 21502295 234.255M items/s BM_FloatParsing<FloatType> 4554 ns 4553 ns 153207 1.67565M items/s BM_FloatParsing<DoubleType> 4579 ns 4578 ns 152304 1.66651M items/s
fa5e2be to
9834b28
Compare
Codecov Report
@@ Coverage Diff @@
## master #2619 +/- ##
==========================================
+ Coverage 87.15% 88.22% +1.06%
==========================================
Files 380 318 -62
Lines 59037 55408 -3629
==========================================
- Hits 51456 48883 -2573
+ Misses 7507 6525 -982
+ Partials 74 0 -74Continue to review full report at Codecov.
|
|
That's an impressive speedup! |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, very nice! If you ever want to write a blog post illustrating your work on this (how does it compare with parsers in other performance-sensitive projects, like RapidJSON?), please go ahead
| #define PARSE_UNSIGNED_ITERATION_LAST(C_TYPE) \ | ||
| if (length > 0) { \ | ||
| if (ARROW_PREDICT_FALSE(result > std::numeric_limits<C_TYPE>::max() / 10U)) { \ | ||
| /* Overflow */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point do you think we might want to return a "reason" as an out argument when the parser returns false? Could be follow up work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, because if we're calling out to an external parser (as we'll keep doing for floating-point parsing), it may not give us a reason.
Before:
After: