Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Enable fast parsing/writing for double #7822

Closed
mgodwan opened this issue May 30, 2023 · 4 comments · Fixed by #7909
Closed

Performance: Enable fast parsing/writing for double #7822

mgodwan opened this issue May 30, 2023 · 4 comments · Fixed by #7909
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search Performance This is for any performance related enhancements or bugs

Comments

@mgodwan
Copy link
Member

mgodwan commented May 30, 2023

Is your feature request related to a problem? Please describe.
The default behaviour of jackson is to use JDK provided Double#parseDouble methods to convert string to double data types. Jackson also provides a fast parser for string to double conversion which can be leveraged to speed up the parsing process.
We should explore enabling the same and see if it may be beneficial to us in general, esp. for cases where we may deal with a lot of double data types such as vectors.

Describe the solution you'd like

  • Enable Jackson Features USE_FAST_DOUBLE_PARSER and USE_FAST_DOUBLE_WRITER
  • If this provides advantage, as next steps we can explore to replace usage of double parsing in OpenSearch code as well at multiple places.

Additional context
I ran a quick micro-benchmark after enabling the features for JsonParser in Jackson which yielded the following results:

Setup: MacBook Pro (13-inch, M1, 2020), 16G RAM

Doubles in range 0.0 - 1.0

Benchmark Mode Cnt Score Error Units
DocumentParsingBenchmark.doParseDoubleWithFastDoubleParser avgt 10 30.276 ? 4.795 ns/op
DocumentParsingBenchmark.doParseDoubleWithJdkParser avgt 10 106.918 ? 5.233 ns/op

Doubles in range -1E9 to +1E9

Benchmark Mode Cnt Score Error Units
DocumentParsingBenchmark.doParseDoubleWithFastDoubleParser avgt 15 32.087 ? 2.897 ns/op
DocumentParsingBenchmark.doParseDoubleWithJdkParser avgt 15 134.490 ? 25.577 ns/op
@mgodwan mgodwan added enhancement Enhancement or improvement to existing feature or request untriaged labels May 30, 2023
@mgodwan
Copy link
Member Author

mgodwan commented May 30, 2023

cc: @backslasht @shwetathareja

@andrross
Copy link
Member

@mgodwan What are the tradeoffs here? Seems like the fast parser/writer is indeed faster, so are there any downsides to enabling it?

@mgodwan
Copy link
Member Author

mgodwan commented Jun 4, 2023

I tested this with nyc_taxis benchmark which contains some GeoPoint data which is parsed using Double/Float parser.

Server Setup : 1 node r5.2xlarge OS 2.5, 32G Heap
Client Setup: 1 c6g.4xlarge
Workload Params: Default (1p0r, bulk size: 10k, bulk_indexing_clients: 8)
Testing Setup: 2 warmup, 3 test iterations for each baseline and candidate.

I found the following results when comparing the runs with and without fast double parser enabled.

Benchmark Mean IndexingThroughput p50 indexing latency
Baseline (Default parser setting) 63996 docs/sec 1121.89 ms
Candidate (Jackson Fast Double Parser feature enabled) 66611 docs/sec (1.04x) 1054.90ms (0.94x)

@mgodwan
Copy link
Member Author

mgodwan commented Jun 4, 2023

@mgodwan What are the tradeoffs here? Seems like the fast parser/writer is indeed faster, so are there any downsides to enabling it?

@andrross Jackson uses FastDoubleParser which is a Java port of fast_float cpp library.
I found an issue in the Github repo for FastDoubleParser which talks about it being slow for less frequent numbers (with greater than 1000 digits in the number) : wrandelshofer/FastDoubleParser#32

I believe this kind of input should should be very rare for us as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search Performance This is for any performance related enhancements or bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants