Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Sep 25, 2018

This relies on Google's double-conversion library at https://github.com/google/double-conversion

Before:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
[...]
BM_FloatParsing<FloatType>          4621 ns       4620 ns     151802   1.65125M items/s
BM_FloatParsing<DoubleType>         4629 ns       4628 ns     150171   1.64846M items/s

After:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
[...]
BM_FloatParsing<FloatType>           563 ns        563 ns    1240915   13.5616M items/s
BM_FloatParsing<DoubleType>          313 ns        313 ns    2227610   24.3934M items/s

@wesm
Copy link
Member

wesm commented Sep 25, 2018

Is google/double-conversion a small enough codebase to vendor similar to what you did with Abseil (either submodule or simply cp -r && git add?

@wesm
Copy link
Member

wesm commented Sep 25, 2018

I see this is in conda-forge so it may not be a big deal

@codecov-io
Copy link

codecov-io commented Sep 25, 2018

Codecov Report

Merging #2625 into master will increase coverage by 1.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2625      +/-   ##
==========================================
+ Coverage   87.19%   88.24%   +1.04%     
==========================================
  Files         381      319      -62     
  Lines       59223    55515    -3708     
==========================================
- Hits        51642    48990    -2652     
+ Misses       7507     6525     -982     
+ Partials       74        0      -74
Impacted Files Coverage Δ
cpp/src/arrow/util/parsing.h 100% <100%> (ø) ⬆️
rust/src/record_batch.rs
go/arrow/datatype_nested.go
rust/src/util/bit_util.rs
go/arrow/math/uint64_amd64.go
go/arrow/internal/testing/tools/bool.go
go/arrow/internal/bitutil/bitutil.go
go/arrow/memory/memory_avx2_amd64.go
go/arrow/array/null.go
rust/src/lib.rs
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b796b57...e3be939. Read the comment docs.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2018

The codebase looks small indeed. The .git directory is 7.2MB and the source tree itself is 19M.
The conda-forge package is apparently not built for Windows, and it only provides the shared library.

OTOH, building double-conversion from scratch (as an external project) is reasonably quick too. What do you suggest?

@wesm
Copy link
Member

wesm commented Sep 25, 2018

Let's stick with ExternalProject for now; this perhaps puts more pressure for the upstream project and conda-forge packages to be maintained

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how double-conversion handle negative ints when length > 2^31.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically, it probably doesn't and expects positive input.

@pitrou pitrou force-pushed the ARROW-3320-faster-float-conversion branch from da82bee to e3be939 Compare September 27, 2018 19:19
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except we should make sure that double-conversion does not leak into the public API. I might go so far as to rename as arrow/util/parser-internal.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow/util/parsing.h is being installed, since you are including a thirdparty header, we should stop installing the header

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the header should not be installed anymore.

This relies on Google's double-conversion library at https://github.com/google/double-conversion

Before:
```
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
[...]
BM_FloatParsing<FloatType>          4621 ns       4620 ns     151802   1.65125M items/s
BM_FloatParsing<DoubleType>         4629 ns       4628 ns     150171   1.64846M items/s
```

After:
```
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
[...]
BM_FloatParsing<FloatType>           563 ns        563 ns    1240915   13.5616M items/s
BM_FloatParsing<DoubleType>          313 ns        313 ns    2227610   24.3934M items/s
```
@pitrou pitrou force-pushed the ARROW-3320-faster-float-conversion branch from e3be939 to 1044598 Compare September 28, 2018 08:37
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks! waiting for the build to run then will merge

@wesm
Copy link
Member

wesm commented Sep 28, 2018

Merging. The build only failing because of the known flakiness with the Java Flight tests and the new macOS failure that's shown up in the last 24 hours

@wesm wesm closed this in ceed385 Sep 28, 2018
@pitrou pitrou deleted the ARROW-3320-faster-float-conversion branch September 28, 2018 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants