Performance compare with native dragonbox::to_chars #3675

zhiqiang-hhhh · 2023-10-09T07:49:39Z

Hello, I am using dragonbox::to_chars as my float pointer number to string method, and trying to replace dragonbox with lib fmt 10.x since it has already integrated with dragonbox and lib fmt has more output format control.

But according to my simple benchmark, dragonbox is almost x1.7 faster than lib fmt when doing float-point to string.

#include <gtest/gtest.h>
#include <fmt/format.h>
#include <dragonbox/dragonbox_to_chars.h>
#include <random>

class PerformanceTest : public ::testing::Test {
protected:
    void SetUp() override {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_real_distribution<double> dis_double(-100000.0, +100000.0);
        std::uniform_real_distribution<float> dis_float(-100000.0, +100000.0);
        
        for (int i = 0; i < 100000000; ++i) {
            values_double.push_back(dis_double(gen));
            values_float.push_back(dis_float(gen));
        }
    }

    void TearDown() override {
    }

    std::vector<double> values_double;
    std::vector<float> values_float;
};

TEST_F(PerformanceTest, FmtPerformanceDouble) {
    char buffer[20];

    for (const auto& value : values_double) {
        auto res = fmt::format_to(buffer, "{}", value);
        *res = '\0';
    }
}

TEST_F(PerformanceTest, DragonboxPerformanceDouble) {
    char buffer[20];

    for (const auto& value : values_double) {
        jkj::dragonbox::to_chars(value, buffer);
    }
}


TEST_F(PerformanceTest, FmtPerformanceFloat) {
    char buffer[20];

    for (const auto& value : values_float) {
        auto res = fmt::format_to(buffer, "{}", value);
        *res = '\0';
    }
}

TEST_F(PerformanceTest, DragonboxPerformanceFloat) {
    char buffer[20];

    for (const auto& value : values_double) {
        jkj::dragonbox::to_chars(value, buffer);
    }
}

int main(int argc, char** argv) {
    ::testing::InitGoogleTest(&argc, argv);
    return RUN_ALL_TESTS();
}

build with release, result:

[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from PerformanceTest
[ RUN      ] PerformanceTest.FmtPerformanceDouble
[       OK ] PerformanceTest.FmtPerformanceDouble (11391 ms)
[ RUN      ] PerformanceTest.DragonboxPerformanceDouble
[       OK ] PerformanceTest.DragonboxPerformanceDouble (6636 ms)
[ RUN      ] PerformanceTest.FmtPerformanceFloat
[       OK ] PerformanceTest.FmtPerformanceFloat (10185 ms)
[ RUN      ] PerformanceTest.DragonboxPerformanceFloat
[       OK ] PerformanceTest.DragonboxPerformanceFloat (6649 ms)
[----------] 4 tests from PerformanceTest (34864 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (34864 ms total)
[  PASSED  ] 4 tests.

According to my basic knowledge, time consumption of float point to decimal should be almost same after lib fmt integrated with dragon box. so the determining factor of performance difference here should be the output formatting control?

The text was updated successfully, but these errors were encountered:

vitaut · 2023-10-09T18:28:48Z

It is expected that the (default) runtime formatting will be slightly slower than calling Dragonbox directly because the formatting function has to do some extra work. This overhead can be reduced by using format string compilation: https://fmt.dev/latest/api.html#compile-api. It's hard to say anything more specific and numbers don't look particularly meaningful because your test is quite broken: {fmt} cases do additional nul termination, there is stack corruption and you are using gtest instead of a proper benchmark. I recommend looking at an existing benchmark, e.g. https://github.com/miloyip/dtoa-benchmark. Also keep in mind that {fmt} uses compact Dragonbox tables by default so if you want maximum perf at the cost of binary size you could switch to larger tables.

zhiqiang-hhhh · 2023-10-10T05:36:09Z

I tested agian with opt method and the benchmark tools you have mentioned, but result seems still about 2.x slower than dragonbox.

Verifying doubleconv... OK. Length Avg = 22.426, Max = 25
Verifying dragonbox... OK. Length Avg = 22.027, Max = 24
Verifying dragonbox_comp... OK. Length Avg = 22.027, Max = 24
Verifying fmt... OK. Length Avg = 22.445, Max = 24
Verifying fmt_full_cache_test... OK. Length Avg = 22.445, Max = 24
Verifying ostringstream... OK. Length Avg = 22.940, Max = 24
Verifying ostrstream... OK. Length Avg = 22.940, Max = 24
Verifying sprintf... OK. Length Avg = 22.940, Max = 24
Benchmarking randomdigit doubleconv... Done
Benchmarking randomdigit dragonbox... Done
Benchmarking randomdigit dragonbox_comp... Done
Benchmarking randomdigit fmt... Done
Benchmarking randomdigit fmt_full_cache_test... Done
Benchmarking randomdigit null... Done
Benchmarking randomdigit ostringstream... Done
Benchmarking randomdigit ostrstream... Done
Benchmarking randomdigit sprintf... Done
Function      |  Min ns |  RMS ns  |  Max ns |   Sum ns  | Speedup |
:-------------|--------:|---------:|--------:|----------:|--------:|
null          |     1.6 |    1.600 |     1.6 |      27.2 | ×597.4  |
dragonbox     |    28.4 |   30.379 |    33.6 |     515.9 | ×31.5   |
dragonbox_comp|    34.4 |   36.937 |    41.9 |     627.3 | ×25.9   |
fmt_full_cache_test|    53.4 |   59.377 |    68.7 |    1007.5 | ×16.1   |
fmt           |    53.5 |   59.513 |    68.1 |    1010.0 | ×16.1   |
doubleconv    |    82.9 |  129.439 |   168.7 |    2170.8 | ×7.5    |
sprintf       |   868.0 |  957.211 |  1028.4 |   16249.7 | ×1.0    |
ostrstream    |  1197.1 | 1285.831 |  1357.9 |   21841.8 | ×0.7    |
ostringstream |  1279.8 | 1377.940 |  1462.4 |   23401.2 | ×0.7    |

append null termination is necessary for correct
fmt_full_cache_test
fmttest
dragonbox

zhiqiang-hhhh · 2023-10-10T05:36:25Z

@vitaut

vitaut · 2023-10-10T16:45:54Z

Will need to look in more details but one surprising thing is that full and compact cache results are identical.

vitaut · 2023-10-14T14:59:05Z

So I looked in more details and one obvious problem with the new benchmark is ODR violation: you are trying to use {fmt} compiled with different configurations in different TUs. This is a UB. If you correctly enable full Dragonbox cache with

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-DFMT_USE_FULL_CACHE_DRAGONBOX=1 .

you'll get a noticeable speedup from

fmt           |    40.1 |   42.825 |    46.9 |     727.4 | ×17.3   |

to

fmt           |    34.0 |   36.018 |    39.3 |     611.8 | ×20.5   |

on my system.

It is still not as fast as calling Dragonbox directly which is worth investigating further.

vitaut · 2023-10-14T15:43:44Z

Looking at the CPU profile ~40% of time is spent postprocessing and writing the output in do_write_float:

Some of that is inevitable since we have to deal with all the formatting options but it could probably be improved for the common case.

jk-jeon · 2023-10-26T22:26:33Z

@zhiqiang-hhhh If you really want to test {fmt} with multiple different configurations in a single executable, you can do something like this to avoid the ODR issue: https://github.com/jk-jeon/dtoa-benchmark/blob/master/src/fmt_full_cachetest.cpp

This still feels like a terrible hack, but since it is just for testing I think it should be alright.

vitaut closed this as completed Oct 9, 2023

vitaut added the question label Oct 9, 2023

vitaut reopened this Oct 15, 2023

vitaut removed the question label Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance compare with native dragonbox::to_chars #3675

Performance compare with native dragonbox::to_chars #3675

zhiqiang-hhhh commented Oct 9, 2023

vitaut commented Oct 9, 2023 •

edited

Loading

zhiqiang-hhhh commented Oct 10, 2023

zhiqiang-hhhh commented Oct 10, 2023

vitaut commented Oct 10, 2023

vitaut commented Oct 14, 2023 •

edited

Loading

vitaut commented Oct 14, 2023

jk-jeon commented Oct 26, 2023

Performance compare with native dragonbox::to_chars #3675

Performance compare with native dragonbox::to_chars #3675

Comments

zhiqiang-hhhh commented Oct 9, 2023

vitaut commented Oct 9, 2023 • edited Loading

zhiqiang-hhhh commented Oct 10, 2023

zhiqiang-hhhh commented Oct 10, 2023

vitaut commented Oct 10, 2023

vitaut commented Oct 14, 2023 • edited Loading

vitaut commented Oct 14, 2023

jk-jeon commented Oct 26, 2023

vitaut commented Oct 9, 2023 •

edited

Loading

vitaut commented Oct 14, 2023 •

edited

Loading