Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jstring_ being really slow #581

Closed
angerman opened this issue Aug 30, 2017 · 18 comments
Closed

jstring_ being really slow #581

angerman opened this issue Aug 30, 2017 · 18 comments

Comments

@angerman
Copy link

I've run into the case where we try to parse JSON documents containing other serialized JSON documents, containing HTML and Base64 encoded strings. E.g. a JSON document containing HTML pages as well as base64 data with some additional info.

[{"html": "<html string>", "data": "<base64 string>", ... }]

in a JSON envelope.

[{"date": "...", "user": "...", "payload": "<serialized json>", ... }]

These documents vary from 1mb to 100mb. And anything above 25mb ends up being essentially
impossible with aeson to parse, as it consumes upwards of 10G memory and takes minutes to
process. All the ToJSON and FromJSON instances are generated with TH. And the profiling shows ~95% of the time to be spent in jstring_ enabling cffi seems to make no difference.

@bergmark
Copy link
Collaborator

bergmark commented Sep 4, 2017

Thank you for the report.

@eskimor, @jprider63, @winterland1989 do you have any ideas about this?

@bergmark
Copy link
Collaborator

bergmark commented Sep 4, 2017

It may also be interesting to see how this behaves in aeson-1.0.0.0 which is the last release before the new string parser was added.

@phadej
Copy link
Collaborator

phadej commented Sep 4, 2017

I suspect that as jstring_ parses contents into strict ByteString, it's just slooow.

@eskimor
Copy link
Contributor

eskimor commented Sep 4, 2017

I honestly have absolutely no idea, sorry :-(

@winterland1989
Copy link
Contributor

winterland1989 commented Sep 5, 2017

I reviewed the new code path for jstring_ under MIN_VERSION_ghc_prim(0,3,1), i.e. adding a new state to record if we need unescape after the string slice is returned. I suspect the runScanner may be the problem: The type of runScanner is runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s), The lazy tuple (ByteString, s) is a very bad sign for thunk leaking.

So @angerman can you try to revert this commit and try again?

@winterland1989
Copy link
Contributor

winterland1989 commented Sep 5, 2017

BTW, i understand we want to skip this unecessary unescaping if we use haskell native code, but for FFI version this is really not necessary: the FFI version do unescaping and utf8->utf16 decoding all together in one pass, which is more or less my original PR is about.

Edit: Now i checked haskell native unescaping code, it handles unescaping and decoding in one pass too, which rise the question why we add the skipping unescaping logic in 1bca6a1. The fact decodeUtf8 in text is implemented in C may improve some benchmark after that commit, but that's completely a wong move for both people who want to use native haskell implementations(GHCJS folks) and who want to use fast C FFI version.

@angerman
Copy link
Author

angerman commented Sep 5, 2017 via email

@phadej
Copy link
Collaborator

phadej commented Sep 5, 2017

@winterland1989 because the benchmark showed it does matter. Just decoding is significantly faster: #561, the text's decodeUtf8 is C code, but that one is general and applied across whole ecosystem.

@winterland1989
Copy link
Contributor

winterland1989 commented Sep 5, 2017

@phadej I have given the reason behind the benchmark improvement above, i'm questioning how the choice made:

  • The whole point of a haskell native unescape code path is to support GHCJS, and under GHCJS the text package is patch so that decodeUtf8 use a slow JS version. Trying to use decodeUtf8 to optimize jstring_ will not works for GHCJS really.

  • If people do have FFI available, then they'd better just enable cffi flag and use c unescape code path directly, thus saving the ovehead of runScanner: the FFI unescape code path should perform identical to the C FFI decodeUtf8 code if there're no escaped code point.

IMO we have a very strange default code path: disable fast C FFI unescape/decode code path, but use C FFI decoding combined with a slow runScanner preprocess, this code path of course would be faster than a pure haskell one(except under GHCJS), but under what circumstance we want a combination of that?

@phadej
Copy link
Collaborator

phadej commented Sep 5, 2017

I disagree on "The whole point of a haskell native unescape code path is to support GHCJS,"

For me, native unescape is because C version is not audited to be correct (or at least produce exactly the same results as Haskell version, in all environments). See #535

So there are three use cases:

  • conservative "native" Haskell (secure)
  • completely native Haskell (aka. GHCJS)
  • extremely fast (C)

We can make a change, so if (ghcjs) always use a second option. However, is it really faster than decodeUtf8, shouldn't it be fixed in GHCJS to be fast, or am I missing something?


Note: the runScanner is about as slow as it was before my PR, for x86_64. We have to scan the input for \ to find closing ". My patch also scans for isAscii /\ not . isCtrl. I thought that GHCJS should produce fast-ish loop there as well?

I also wondered whether runScanner is fast or slow (it looks slow). But it's all {-# INLINE #-} so I think we don't really see pair or Maybe. I didn't check the core though, I'm a bit tight on time, but I can check that.

@phadej
Copy link
Collaborator

phadej commented Sep 5, 2017

Additional note: I'd like to see a PR about adding a note how to run benchmarks in GHCJS, otherwise it's really hard to care.

@winterland1989
Copy link
Contributor

OK then, please do check the core which runScanner compiled to if you got the time. For the record, last time when i'm hacking on aeson i have tried runScanner (for the same reason as you) and the core is quite bad, and that's why i choose A.scan to do the preprocess and merge unescaping and decoding in one pass.

From #535 , I'm not seeing any concrete example against c ffi version but only some vague hypothesis. Actually I'm not even aware of the switch, i just missed the discussion. At least from my point of view, the current default code path is just slow.

I made further optimization in the binary-parsers PR, since in binary-parsers i provide new combinators which can parse chunk based on spliting. But the patch is not accepted yet. In fact the only right way to do this IMO is to add a similar parser to attoparsec, otherwise we always have to scan the string twice, once for finding the end, and once for unescaping/decoding.

@phadej
Copy link
Collaborator

phadej commented Sep 5, 2017

Note:

runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
runScanner = scan_ $ \s xs -> let !sx = concatReverse xs in return (sx, s)
{-# INLINE runScanner #-}

@winterland1989
Copy link
Contributor

From the code you quote, I don't believe ghc could be able to optimize the boxing on s away, but I'm happy to see I'm wrong.

@angerman
Copy link
Author

Alright, so...

I've went ahead and tried the latest head. And the excessive memory consumption seems not to be reproducible. Profile is here: https://gist.github.com/bbd148529fa3e90c4935b371f9ace986

Reverting the commit @winterland1989 mentioned, resulted in worse performance.

a3f3260:

	Mon Sep 11 20:53 2017 Time and Allocation Profiling Report  (Final)

	   parser +RTS -T -P -hc -RTS 119001.payload

	total time  =        3.25 secs   (3252 ticks @ 1000 us, 1 processor)
	total alloc = 8,571,926,832 bytes  (excludes profiling overheads)

COST CENTRE    MODULE                     SRC                                              %time %alloc  ticks     bytes

jstring_       Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(212,35)-(224,33)   90.9   89.3   2956 7658497688

a3f3260 with 1bca6a1 reversed:

	Mon Sep 11 21:07 2017 Time and Allocation Profiling Report  (Final)

	   parser +RTS -T -P -hc -RTS 119001.payload

	total time  =        4.84 secs   (4841 ticks @ 1000 us, 1 processor)
	total alloc = 15,299,307,624 bytes  (excludes profiling overheads)

COST CENTRE MODULE                     SRC                                              %time %alloc  ticks     bytes

jstring_    Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(211,35)-(215,31)   94.0   95.1   4550 14546709880

@winterland1989
Copy link
Contributor

Ah, it seems the Haskell unescaping code and new pre-scan is OK then.

@phadej
Copy link
Collaborator

phadej commented Apr 4, 2020

I think this is resolved with recent changes.

@bergmark
Copy link
Collaborator

bergmark commented Apr 4, 2020

Closing under that assumption, let me know if anything is still problematic!

@bergmark bergmark closed this as completed Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants