Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance compared to lua-cjson. #275

Closed
xpol opened this issue Apr 1, 2015 · 10 comments
Closed

Performance compared to lua-cjson. #275

xpol opened this issue Apr 1, 2015 · 10 comments
Milestone

Comments

@xpol
Copy link

xpol commented Apr 1, 2015

I have write a json module for lua based on RapidJSON.

The performance test result over lua-cjson is:

On windows:

My json module performance a bit slower than cjson when processing booleans nulls strings. but much faster than cjson when processing numbers.

I guess is because the MSVC is slow when convert numbers <--> string.

On Linux (tested on Travis):

My json module only a bit faster when encoding numbers.

My question is:

  • Is there any performance advice and best practices when using RapidJSON?
  • Or did I do something wrong when using RapidJSON in my lua json module?
@miloyip
Copy link
Collaborator

miloyip commented Apr 1, 2015

Wow. This is what I have been planning to do with RapidJSON.

I have just peek into the code and find out it uses AutoUTFInputStream for file stream. This is slower than using statically bound stream.

I will need some time to do profiling with your performance tests.

P.S. Testing performance on Travis seems not a good practice. As the loading of the server node may be changing all the time.

@miloyip
Copy link
Collaborator

miloyip commented Apr 3, 2015

VC2013 32-bit /Ox

performance/nulls.json: (x10000)
           module            decoding        encoding
            cjson        0.1020126343    0.1040134430
        rapidjson        0.0870113373    0.1140136719
performance/booleans.json: (x10000)
           module            decoding        encoding
            cjson        0.0930118561    0.1130142212
        rapidjson        0.0810108185    0.0900135040
performance/guids.json: (x10000)
           module            decoding        encoding
            cjson        0.2480316162    0.1820220947
        rapidjson        0.3100395203    0.2260284424
performance/paragraphs.json: (x10000)
           module            decoding        encoding
            cjson        1.2351589203    0.9921264648
        rapidjson        1.2501602173    1.4281826019
performance/floats.json: (x10000)
           module            decoding        encoding
            cjson        0.7150917053    1.1291427612
        rapidjson        0.1330165863    0.3970508575
performance/integers.json: (x10000)
           module            decoding        encoding
            cjson        0.3780479431    0.9661235809
        rapidjson        0.1000137329    0.2930355072
performance/mixed.json: (x10000)
           module            decoding        encoding
            cjson        2.5523262024    2.1712779999
        rapidjson        2.2152843475    1.3771762848

It seems writing string in RapidJSON performance is worse than lua-cjson (e.g. paragraphs.json).
It needs more investigation. It may relate to putting characters to StringBuffer.

@miloyip
Copy link
Collaborator

miloyip commented Apr 3, 2015

I reviewed the code for stringifying string. They are actually very similar. The difference is that, cjson only writes to a string buffer, and it reserves the maximum possible number of encoded character in the buffer. So that the inner loop needs not check the buffer space.

static void json_append_string(lua_State *l, strbuf_t *json, int lindex)
{
    const char *escstr;
    int i;
    const char *str;
    size_t len;

    str = lua_tolstring(l, lindex, &len);

    /* Worst case is len * 6 (all unicode escapes).
     * This buffer is reused constantly for small strings
     * If there are any excess pages, they won't be hit anyway.
     * This gains ~5% speedup. */
    strbuf_ensure_empty_length(json, len * 6 + 2);

    strbuf_append_char_unsafe(json, '\"');
    for (i = 0; i < len; i++) {
        escstr = char2escape[(unsigned char)str[i]];
        if (escstr)
            strbuf_append_string(json, escstr);
        else
            strbuf_append_char_unsafe(json, str[i]);
    }
    strbuf_append_char_unsafe(json, '\"');
}

After trying to do the same logic in RapidJSON (just hacking, not fully implemented for all streams, and drop transcoding functionality), it gains some improvements but still worse than cjson:

performance/guids.json: (x10000)
           module            decoding        encoding
            cjson        0.2500324249    0.1980247498
        rapidjson        0.3100395203    0.2050266266
performance/paragraphs.json: (x10000)
           module            decoding        encoding
            cjson        1.2771625519    1.0441341400
        rapidjson        1.2451591492    1.2671623230

@xpol
Copy link
Author

xpol commented Apr 4, 2015

That nice improvement.
Milo Yip notifications@github.com于2015年4月3日 周五18:39写道:

I reviewed the code for stringifying string. They are actually very
similar. The difference is that, cjson only writes to a string buffer,
and it reserves the maximum possible number of encoded character in the
buffer. So that the inner loop needs not check the buffer space.

static void json_append_string(lua_State *l, strbuf_t *json, int lindex)
{
const char *escstr;
int i;
const char *str;
size_t len;

str = lua_tolstring(l, lindex, &len);

/* Worst case is len * 6 (all unicode escapes).     * This buffer is reused constantly for small strings     * If there are any excess pages, they won't be hit anyway.     * This gains ~5% speedup. */
strbuf_ensure_empty_length(json, len * 6 + 2);

strbuf_append_char_unsafe(json, '\"');
for (i = 0; i < len; i++) {
    escstr = char2escape[(unsigned char)str[i]];
    if (escstr)
        strbuf_append_string(json, escstr);
    else
        strbuf_append_char_unsafe(json, str[i]);
}
strbuf_append_char_unsafe(json, '\"');

}

After trying to do the same logic in RapidJSON (just hacking, not fully
implemented for all streams, and drop transcoding functionality), it gains
some improvements but still worse than cjson:

performance/guids.json: (x10000)
module decoding encoding
cjson 0.2500324249 0.1980247498
rapidjson 0.3100395203 0.2050266266
performance/paragraphs.json: (x10000)
module decoding encoding
cjson 1.2771625519 1.0441341400
rapidjson 1.2451591492 1.2671623230


Reply to this email directly or view it on GitHub
#275 (comment).

@miloyip miloyip added this to the v1.1 Beta milestone Apr 11, 2015
@miloyip
Copy link
Collaborator

miloyip commented Apr 18, 2015

Today I learnt about __builtin_expect in gcc/clang from chadaustin/sajson#7 (comment)

Just changing one line of code in internal::stack with the RAPIDJSON_UNLIKELY() macro:

    template<typename T>
    RAPIDJSON_FORCEINLINE T* Push(size_t count = 1) {
         // Expand the stack if needed
        if (RAPIDJSON_UNLIKELY(stackTop_ + sizeof(T) * count >= stackEnd_))
            Expand<T>(count);

        T* ret = reinterpret_cast<T*>(stackTop_);
        stackTop_ += sizeof(T) * count;
        return ret;
    }

The performance boosts for both parsing, and stringifying to StringBuffer, as they both uses stack:

Without likely/unlikely
[ RUN      ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (967 ms)
[ RUN      ] RapidJson.ReaderParseInsitu_DummyHandler_ValidateEncoding_SSE42
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_ValidateEncoding_SSE42 (1423 ms)
[ RUN      ] RapidJson.ReaderParse_DummyHandler_SSE42
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (923 ms)
[ RUN      ] RapidJson.ReaderParse_DummyHandler_FullPrecision_SSE42
[       OK ] RapidJson.ReaderParse_DummyHandler_FullPrecision_SSE42 (946 ms)

[ RUN      ] RapidJson.Writer_StringBuffer
[       OK ] RapidJson.Writer_StringBuffer (728 ms)
[ RUN      ] RapidJson.PrettyWriter_StringBuffer
[       OK ] RapidJson.PrettyWriter_StringBuffer (802 ms)

[ RUN      ] RapidJson.StringBuffer
[       OK ] RapidJson.StringBuffer (62 ms)

With likely/unlikely for stack
[ RUN      ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (927 ms)
[ RUN      ] RapidJson.ReaderParseInsitu_DummyHandler_ValidateEncoding_SSE42
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_ValidateEncoding_SSE42 (1413 ms)
[ RUN      ] RapidJson.ReaderParse_DummyHandler_SSE42
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (885 ms)
[ RUN      ] RapidJson.ReaderParse_DummyHandler_FullPrecision_SSE42
[       OK ] RapidJson.ReaderParse_DummyHandler_FullPrecision_SSE42 (948 ms)

[ RUN      ] RapidJson.Writer_StringBuffer
[       OK ] RapidJson.Writer_StringBuffer (633 ms)
[ RUN      ] RapidJson.PrettyWriter_StringBuffer
[       OK ] RapidJson.PrettyWriter_StringBuffer (724 ms)

[ RUN      ] RapidJson.StringBuffer
[       OK ] RapidJson.StringBuffer (55 ms)

This technique shall be applied to string encoding/decoding also.
The optimizations are put in optimzation branch, but may only be merged to master after v1.0.

@pah
Copy link
Contributor

pah commented Apr 21, 2015

Sounds like a good idea. 👍
Other obvious candidates for adding predictions hints could be the error checks (including RAPIDJSON_ASSERT, when building optimized code without defining NDEBUG).

@miloyip
Copy link
Collaborator

miloyip commented Feb 9, 2016

I try to rerun the benchmark on OS X:

After properly build in release configuration:

performance/nulls.json: (x10000)
    module      decoding        encoding
    dkjson      1.4447538853    0.0303778648
     cjson      0.0557808876    0.0431759357
    rapidjson   0.0417780876    0.0529091358
performance/booleans.json: (x10000)
    module      decoding        encoding
    dkjson      1.4324688911    0.5549669266
     cjson      0.0541689396    0.0515460968
    rapidjson   0.0495100021    0.0410890579
performance/guids.json: (x10000)
    module      decoding        encoding
    dkjson      2.2196581364    2.8518559933
     cjson      0.1630179882    0.1043770313
    rapidjson   0.1583569050    0.1000590324
performance/paragraphs.json: (x10000)
    module      decoding        encoding
    dkjson      8.0023329258    32.8837590218
     cjson      0.8166460991    0.6511371136
    rapidjson   0.8537089825    0.6238520145
performance/floats.json: (x10000)
    module      decoding        encoding
    dkjson      2.0216770172    1.6640810966
     cjson      0.1006298065    0.2850799561
    rapidjson   0.0698080063    0.1097550392
performance/integers.json: (x10000)
    module      decoding        encoding
    dkjson      1.6559410095    1.2621140480
     cjson      0.0738458633    0.2687399387
    rapidjson   0.0525319576    0.0461258888
performance/mixed.json: (x10000)
    module      decoding        encoding
    dkjson      16.3499078751   13.0333669186
     cjson      1.3119888306    0.7127361298
    rapidjson   1.2910170555    0.4844610691

Enable -march=native and -DRAPIDJSON_SSE42=1:

performance/nulls.json: (x10000)
    module      decoding        encoding
    dkjson      1.4519450665    0.0266160965
     cjson      0.0555098057    0.0437109470
    rapidjson   0.0502049923    0.0490658283
performance/booleans.json: (x10000)
    module      decoding        encoding
    dkjson      1.4480140209    0.5391879082
     cjson      0.0528380871    0.0492749214
    rapidjson   0.0455410480    0.0359258652
performance/guids.json: (x10000)
    module      decoding        encoding
    dkjson      2.2088551521    2.7952511311
     cjson      0.1659879684    0.1056418419
    rapidjson   0.1055088043    0.0912489891
performance/paragraphs.json: (x10000)
    module      decoding        encoding
    dkjson      7.9302279949    29.5428390503
     cjson      0.7868468761    0.6402869225
    rapidjson   0.2976741791    0.6354022026
performance/floats.json: (x10000)
    module      decoding        encoding
    dkjson      1.9752140045    1.6544151306
     cjson      0.1088299751    0.2867891788
    rapidjson   0.0695970058    0.1129419804
performance/integers.json: (x10000)
    module      decoding        encoding
    dkjson      1.6462109089    1.2243251801
     cjson      0.0825049877    0.2594978809
    rapidjson   0.0552508831    0.0440261364
performance/mixed.json: (x10000)
    module      decoding        encoding
    dkjson      15.8063831329   12.5736849308
     cjson      1.2482681274    0.6973659992
    rapidjson   1.2607920170    0.4846410751

The most obvious improvement is paragraphs.json: 0.8537089825 -> 0.2976741791. I think it is due to the escape character searching with SIMD.

@miloyip
Copy link
Collaborator

miloyip commented Feb 14, 2016

#544 Optimized Writer::WriteString() with SSE2.

Before:

[ RUN      ] RapidJson.Writer_NullStream
[       OK ] RapidJson.Writer_NullStream (74 ms)
[ RUN      ] RapidJson.Writer_StringBuffer
[       OK ] RapidJson.Writer_StringBuffer (379 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Booleans
[       OK ] RapidJson.Writer_StringBuffer_Booleans (18 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Floats
[       OK ] RapidJson.Writer_StringBuffer_Floats (78 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Guids
[       OK ] RapidJson.Writer_StringBuffer_Guids (79 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Integers
[       OK ] RapidJson.Writer_StringBuffer_Integers (22 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Mixed
[       OK ] RapidJson.Writer_StringBuffer_Mixed (249 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Nulls
[       OK ] RapidJson.Writer_StringBuffer_Nulls (15 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Paragraphs
[       OK ] RapidJson.Writer_StringBuffer_Paragraphs (596 ms)
[ RUN      ] RapidJson.PrettyWriter_StringBuffer
[       OK ] RapidJson.PrettyWriter_StringBuffer (416 ms)

After

[ RUN      ] RapidJson.Writer_NullStream
[       OK ] RapidJson.Writer_NullStream (85 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_SSE42
[       OK ] RapidJson.Writer_StringBuffer_SSE42 (283 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Booleans_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Booleans_SSE42 (24 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Floats_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Floats_SSE42 (81 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Guids_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Guids_SSE42 (45 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Integers_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Integers_SSE42 (20 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Mixed_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Mixed_SSE42 (188 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Nulls_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Nulls_SSE42 (16 ms)
[ RUN      ] RapidJson.Writer_StringBuffer_Paragraphs_SSE42
[       OK ] RapidJson.Writer_StringBuffer_Paragraphs_SSE42 (77 ms)
[ RUN      ] RapidJson.PrettyWriter_StringBuffer_SSE42
[       OK ] RapidJson.PrettyWriter_StringBuffer_SSE42 (334 ms)

Among above, the affected tests are:

  • Writer_StringBuffer: 379 -> 283
  • Guids 79 -> 45
  • Mixed: 249 -> 188
  • Paragraphs: 596 -> 77
  • PrettyWriter_StringBuffer: 416 -> 334

@miloyip
Copy link
Collaborator

miloyip commented Feb 14, 2016

rerun the lua performance test, showing the same kind of improvements:

performance/nulls.json: (x10000)
    module    decoding    encoding
    dkjson   1.5754740238    0.0276830196
     cjson   0.0551309586    0.0487358570
rapidjson    0.0545330048    0.0497119427
performance/booleans.json: (x10000)
    module    decoding    encoding
    dkjson   1.4916000366    0.5738999844
     cjson   0.0556819439    0.0495460033
rapidjson    0.0491378307    0.0377278328
performance/guids.json: (x10000)
    module    decoding    encoding
    dkjson   2.3570721149    3.0556299686
     cjson   0.1626739502    0.1092689037
rapidjson    0.1135668755    0.0830221176
performance/paragraphs.json: (x10000)
    module    decoding    encoding
    dkjson   8.5023150444    31.5211300850
     cjson   0.8555159569    0.6739630699
rapidjson    0.3181121349    0.1543619633
performance/floats.json: (x10000)
    module    decoding    encoding
    dkjson   2.1173479557    1.7756278515
     cjson   0.1096220016    0.3009288311
rapidjson    0.0733149052    0.1116719246
performance/integers.json: (x10000)
    module    decoding    encoding
    dkjson   1.7672340870    1.3564231396
     cjson   0.0811710358    0.2808339596
rapidjson    0.0581030846    0.0484278202
performance/mixed.json: (x10000)
    module    decoding    encoding
    dkjson   16.8118801117   13.2526481152
     cjson   1.3187069893    0.7380268574
rapidjson    1.3779308796    0.4688320160

@xpol
Copy link
Author

xpol commented Feb 14, 2016

@miloyip
That's great!

@miloyip miloyip closed this as completed Feb 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants