-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jstring_ being really slow #581
Comments
Thank you for the report. @eskimor, @jprider63, @winterland1989 do you have any ideas about this? |
It may also be interesting to see how this behaves in aeson-1.0.0.0 which is the last release before the new string parser was added. |
I suspect that as |
I honestly have absolutely no idea, sorry :-( |
I reviewed the new code path for So @angerman can you try to revert this commit and try again? |
BTW, i understand we want to skip this unecessary unescaping if we use haskell native code, but for FFI version this is really not necessary: the FFI version do unescaping and utf8->utf16 decoding all together in one pass, which is more or less my original PR is about. Edit: Now i checked haskell native unescaping code, it handles unescaping and decoding in one pass too, which rise the question why we add the skipping unescaping logic in 1bca6a1. The fact |
I fear I won't be able to test this until next Monday. In currently at the ICFP and do not have the code in question with me :-(
…Sent from my iPhone
On 5 Sep 2017, at 4:32 AM, winterland ***@***.***> wrote:
I reviewed the new code path for MIN_VERSION_ghc_prim(0,3,1), i.e. adding a new state to record if we need unescape after the string slice is returned. I suspect the runScanner may be the problem: The type of runScanner is runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s), The lazy tuple (ByteString, s) is a very bad sign for thunk leaking.
So @angerman can you try to revert this commit and try again?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@winterland1989 because the benchmark showed it does matter. Just decoding is significantly faster: #561, the |
@phadej I have given the reason behind the benchmark improvement above, i'm questioning how the choice made:
IMO we have a very strange default code path: disable fast C FFI unescape/decode code path, but use C FFI decoding combined with a slow |
I disagree on "The whole point of a haskell native unescape code path is to support GHCJS," For me, native unescape is because C version is not audited to be correct (or at least produce exactly the same results as Haskell version, in all environments). See #535 So there are three use cases:
We can make a change, so Note: the I also wondered whether |
Additional note: I'd like to see a PR about adding a note how to run benchmarks in GHCJS, otherwise it's really hard to care. |
OK then, please do check the core which From #535 , I'm not seeing any concrete example against c ffi version but only some vague hypothesis. Actually I'm not even aware of the switch, i just missed the discussion. At least from my point of view, the current default code path is just slow. I made further optimization in the |
Note: runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
runScanner = scan_ $ \s xs -> let !sx = concatReverse xs in return (sx, s)
{-# INLINE runScanner #-} |
From the code you quote, I don't believe ghc could be able to optimize the boxing on |
Alright, so... I've went ahead and tried the latest head. And the excessive memory consumption seems not to be reproducible. Profile is here: https://gist.github.com/bbd148529fa3e90c4935b371f9ace986 Reverting the commit @winterland1989 mentioned, resulted in worse performance.
a3f3260 with 1bca6a1 reversed:
|
Ah, it seems the Haskell unescaping code and new pre-scan is OK then. |
I think this is resolved with recent changes. |
Closing under that assumption, let me know if anything is still problematic! |
I've run into the case where we try to parse JSON documents containing other serialized JSON documents, containing HTML and Base64 encoded strings. E.g. a JSON document containing HTML pages as well as base64 data with some additional info.
in a JSON envelope.
These documents vary from 1mb to 100mb. And anything above 25mb ends up being essentially
impossible with aeson to parse, as it consumes upwards of 10G memory and takes minutes to
process. All the
ToJSON
andFromJSON
instances are generated with TH. And the profiling shows ~95% of the time to be spent injstring_
enablingcffi
seems to make no difference.The text was updated successfully, but these errors were encountered: