jstring_ being really slow #581

angerman · 2017-08-30T04:02:28Z

I've run into the case where we try to parse JSON documents containing other serialized JSON documents, containing HTML and Base64 encoded strings. E.g. a JSON document containing HTML pages as well as base64 data with some additional info.

[{"html": "<html string>", "data": "<base64 string>", ... }]

in a JSON envelope.

[{"date": "...", "user": "...", "payload": "<serialized json>", ... }]

These documents vary from 1mb to 100mb. And anything above 25mb ends up being essentially
impossible with aeson to parse, as it consumes upwards of 10G memory and takes minutes to
process. All the ToJSON and FromJSON instances are generated with TH. And the profiling shows ~95% of the time to be spent in jstring_ enabling cffi seems to make no difference.

The text was updated successfully, but these errors were encountered:

bergmark · 2017-09-04T18:28:23Z

Thank you for the report.

@eskimor, @jprider63, @winterland1989 do you have any ideas about this?

bergmark · 2017-09-04T18:29:26Z

It may also be interesting to see how this behaves in aeson-1.0.0.0 which is the last release before the new string parser was added.

phadej · 2017-09-04T18:43:35Z

I suspect that as jstring_ parses contents into strict ByteString, it's just slooow.

eskimor · 2017-09-04T19:53:57Z

I honestly have absolutely no idea, sorry :-(

winterland1989 · 2017-09-05T03:32:12Z

I reviewed the new code path for jstring_ under MIN_VERSION_ghc_prim(0,3,1), i.e. adding a new state to record if we need unescape after the string slice is returned. I suspect the runScanner may be the problem: The type of runScanner is runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s), The lazy tuple (ByteString, s) is a very bad sign for thunk leaking.

So @angerman can you try to revert this commit and try again?

winterland1989 · 2017-09-05T03:36:40Z

BTW, i understand we want to skip this unecessary unescaping if we use haskell native code, but for FFI version this is really not necessary: the FFI version do unescaping and utf8->utf16 decoding all together in one pass, which is more or less my original PR is about.

Edit: Now i checked haskell native unescaping code, it handles unescaping and decoding in one pass too, which rise the question why we add the skipping unescaping logic in 1bca6a1. The fact decodeUtf8 in text is implemented in C may improve some benchmark after that commit, but that's completely a wong move for both people who want to use native haskell implementations(GHCJS folks) and who want to use fast C FFI version.

angerman · 2017-09-05T04:33:11Z

I fear I won't be able to test this until next Monday. In currently at the ICFP and do not have the code in question with me :-(

…

Sent from my iPhone

On 5 Sep 2017, at 4:32 AM, winterland ***@***.***> wrote: I reviewed the new code path for MIN_VERSION_ghc_prim(0,3,1), i.e. adding a new state to record if we need unescape after the string slice is returned. I suspect the runScanner may be the problem: The type of runScanner is runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s), The lazy tuple (ByteString, s) is a very bad sign for thunk leaking. So @angerman can you try to revert this commit and try again? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

phadej · 2017-09-05T07:09:23Z

@winterland1989 because the benchmark showed it does matter. Just decoding is significantly faster: #561, the text's decodeUtf8 is C code, but that one is general and applied across whole ecosystem.

winterland1989 · 2017-09-05T08:21:09Z

@phadej I have given the reason behind the benchmark improvement above, i'm questioning how the choice made:

The whole point of a haskell native unescape code path is to support GHCJS, and under GHCJS the text package is patch so that decodeUtf8 use a slow JS version. Trying to use decodeUtf8 to optimize jstring_ will not works for GHCJS really.
If people do have FFI available, then they'd better just enable cffi flag and use c unescape code path directly, thus saving the ovehead of runScanner: the FFI unescape code path should perform identical to the C FFI decodeUtf8 code if there're no escaped code point.

IMO we have a very strange default code path: disable fast C FFI unescape/decode code path, but use C FFI decoding combined with a slow runScanner preprocess, this code path of course would be faster than a pure haskell one(except under GHCJS), but under what circumstance we want a combination of that?

phadej · 2017-09-05T08:38:58Z

I disagree on "The whole point of a haskell native unescape code path is to support GHCJS,"

For me, native unescape is because C version is not audited to be correct (or at least produce exactly the same results as Haskell version, in all environments). See #535

So there are three use cases:

conservative "native" Haskell (secure)
completely native Haskell (aka. GHCJS)
extremely fast (C)

We can make a change, so if (ghcjs) always use a second option. However, is it really faster than decodeUtf8, shouldn't it be fixed in GHCJS to be fast, or am I missing something?

Note: the runScanner is about as slow as it was before my PR, for x86_64. We have to scan the input for \ to find closing ". My patch also scans for isAscii /\ not . isCtrl. I thought that GHCJS should produce fast-ish loop there as well?

I also wondered whether runScanner is fast or slow (it looks slow). But it's all {-# INLINE #-} so I think we don't really see pair or Maybe. I didn't check the core though, I'm a bit tight on time, but I can check that.

phadej · 2017-09-05T08:40:23Z

Additional note: I'd like to see a PR about adding a note how to run benchmarks in GHCJS, otherwise it's really hard to care.

winterland1989 · 2017-09-05T09:36:43Z

OK then, please do check the core which runScanner compiled to if you got the time. For the record, last time when i'm hacking on aeson i have tried runScanner (for the same reason as you) and the core is quite bad, and that's why i choose A.scan to do the preprocess and merge unescaping and decoding in one pass.

From #535 , I'm not seeing any concrete example against c ffi version but only some vague hypothesis. Actually I'm not even aware of the switch, i just missed the discussion. At least from my point of view, the current default code path is just slow.

I made further optimization in the binary-parsers PR, since in binary-parsers i provide new combinators which can parse chunk based on spliting. But the patch is not accepted yet. In fact the only right way to do this IMO is to add a similar parser to attoparsec, otherwise we always have to scan the string twice, once for finding the end, and once for unescaping/decoding.

phadej · 2017-09-05T09:58:50Z

Note:

runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
runScanner = scan_ $ \s xs -> let !sx = concatReverse xs in return (sx, s)
{-# INLINE runScanner #-}

winterland1989 · 2017-09-05T10:05:53Z

From the code you quote, I don't believe ghc could be able to optimize the boxing on s away, but I'm happy to see I'm wrong.

angerman · 2017-09-11T13:16:39Z

Alright, so...

I've went ahead and tried the latest head. And the excessive memory consumption seems not to be reproducible. Profile is here: https://gist.github.com/bbd148529fa3e90c4935b371f9ace986

Reverting the commit @winterland1989 mentioned, resulted in worse performance.

a3f3260:

	Mon Sep 11 20:53 2017 Time and Allocation Profiling Report  (Final)

	   parser +RTS -T -P -hc -RTS 119001.payload

	total time  =        3.25 secs   (3252 ticks @ 1000 us, 1 processor)
	total alloc = 8,571,926,832 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                     SRC                                              %time %alloc  ticks     bytes

jstring_       Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(212,35)-(224,33)   90.9   89.3   2956 7658497688

a3f3260 with 1bca6a1 reversed:

	Mon Sep 11 21:07 2017 Time and Allocation Profiling Report  (Final)

	   parser +RTS -T -P -hc -RTS 119001.payload

	total time  =        4.84 secs   (4841 ticks @ 1000 us, 1 processor)
	total alloc = 15,299,307,624 bytes  (excludes profiling overheads)

COST CENTRE MODULE                     SRC                                              %time %alloc  ticks     bytes

jstring_    Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(211,35)-(215,31)   94.0   95.1   4550 14546709880

winterland1989 · 2017-09-11T17:15:47Z

Ah, it seems the Haskell unescaping code and new pre-scan is OK then.

phadej · 2020-04-04T12:53:44Z

I think this is resolved with recent changes.

bergmark · 2020-04-04T14:49:45Z

Closing under that assumption, let me know if anything is still problematic!

bergmark added the info needed label Sep 4, 2017

bergmark closed this as completed Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jstring_ being really slow #581

jstring_ being really slow #581

angerman commented Aug 30, 2017

bergmark commented Sep 4, 2017

bergmark commented Sep 4, 2017

phadej commented Sep 4, 2017

eskimor commented Sep 4, 2017

winterland1989 commented Sep 5, 2017 •

edited

Loading

winterland1989 commented Sep 5, 2017 •

edited

Loading

angerman commented Sep 5, 2017 via email

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017 •

edited

Loading

phadej commented Sep 5, 2017

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017

angerman commented Sep 11, 2017

winterland1989 commented Sep 11, 2017

phadej commented Apr 4, 2020

bergmark commented Apr 4, 2020

jstring_ being really slow #581

jstring_ being really slow #581

Comments

angerman commented Aug 30, 2017

bergmark commented Sep 4, 2017

bergmark commented Sep 4, 2017

phadej commented Sep 4, 2017

eskimor commented Sep 4, 2017

winterland1989 commented Sep 5, 2017 • edited Loading

winterland1989 commented Sep 5, 2017 • edited Loading

angerman commented Sep 5, 2017 via email

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017 • edited Loading

phadej commented Sep 5, 2017

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017

phadej commented Sep 5, 2017

winterland1989 commented Sep 5, 2017

angerman commented Sep 11, 2017

winterland1989 commented Sep 11, 2017

phadej commented Apr 4, 2020

bergmark commented Apr 4, 2020

winterland1989 commented Sep 5, 2017 •

edited

Loading

winterland1989 commented Sep 5, 2017 •

edited

Loading

winterland1989 commented Sep 5, 2017 •

edited

Loading