-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The recent fix of the (>>=)
memory leak seems to cause an enormous performance degradation.
#171
Comments
ping @wz1000. Now I regret that I merged #128 without requiring a benchmark TBH, I could undo #127 fix, as its motivation is replicating pure parser, i.e. somewhat contrived. This is more substantial. But either way, there should be a benchmark to verify that the change works (e.g. across different GHC versions etc. if it's an issue on all of them to begin with) |
@phadej, thanks for a quick reply. Following your questions: From what There were different benchmarks. In the beginning, the slowdown was noticed for the unit and integration tests we have. There are many of these, so I narrowed the problem down to a few. For quite some time I investigated the problem based on them. Later on another - much more minimized and precise - benchmark was created, but it uses the In essence, every benchmark is just a call to the Both GHC 9.0.2 and 9.2.8 were involved in benchmarking. The issue seems to disappear once you revert the #127 and its fix (#128). The other possible solution is to forcibly keep an old |
A small(ish) reproducer would be good to have.
λ> import GHC.Utils.Encoding
λ> zDecodeString "ghczmprim_GHCziClasses_zdfOrdZMZNzuzdszdc"
"ghc-prim_GHC.Classes_$fOrd[]_$s$c" I haven't tested but this patch might help: diff --git a/src/Text/Parsec/Error.hs b/src/Text/Parsec/Error.hs
index 9cf582a..ce0f0b5 100644
--- a/src/Text/Parsec/Error.hs
+++ b/src/Text/Parsec/Error.hs
@@ -142,8 +142,8 @@ setErrorMessage msg (ParseError pos msgs)
mergeError :: ParseError -> ParseError -> ParseError
mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
-- prefer meaningful errors
- | null msgs2 && not (null msgs1) = e1
- | null msgs1 && not (null msgs2) = e2
+ | null msgs1 = e2
+ | null msgs2 = e1
| otherwise
= case pos1 `compare` pos2 of
-- select the longest match |
We need a small reproducer, I want make another mistake of accepting a performance related patch without a benchmark. Sorry. |
I don't really rely on parsec critically for any performance related code, I put up this diff as well as the original patch to be helpful to others. I did not really need it to be merged. For the record, the leak in #127 was affecting non-trivial |
Interesting! type SourceName = String
type Line = Int
type Column = Int
data SourcePos = SourcePos SourceName !Line !Column
deriving ( Eq, Ord, Data, Typeable) That stock-derived |
Yes, I was going to try to use different comparator in |
Hi all. I'm collaborating with @silencespeakstruth on the same issue. We're still working out a shareable reproducer; sorry for starting the discussion somewhat "prematurely". @wz1000 I tested the patch — doesn't seem to make any difference (both @parsonsmatt good point, I tried this: instance Ord SourcePos where
compare (SourcePos snA lnA colA) (SourcePos snB lnB colB)
= lnA `compare` lnB <> colA `compare` colB <> snA `compare` snB — it does give an improvement in allocations and a few % speedup. Looks like a sensible improvement in general. However, we're measuring 100⨯ slowdown of parsec-3.15 relative to parsec-3.14 — in a use-case involving big-ish inputs to yesodweb/shakespeare (as an overall context; didn't see this clearly stated above). Obviously, a couple % on the background of 100⨯ feels far from addressing the root cause — which we haven't yet pinned down, neither differentiated from a deficiency in shakespeare's parser. Question 1. Any good reason I'm missing for the "empty ok" continuation in This passes all tests of parsec and shakespeare, and cuts down the calls to -- empty-ok case for m
- meok x s err
- | errorIsUnknown err = unParser (k x) s cok cerr eok eerr
- | otherwise =
- let
- -- in these cases, (k x) can return as empty
- pcok = cok
- peok x s err' = eok x s (mergeError err err')
- pcerr = cerr
- peerr err' = eerr (mergeError err err')
- in unParser (k x) s pcok pcerr peok peerr
+ meok x s _err = unParser (k x) s cok cerr eok eerr
+
-- consumed-error case for m
mcerr = cerr Can a parser's "empty ok" ever be called with Question 2. The parsec/src/Text/Parsec/Prim.hs Lines 316 to 318 in c5add8b
The function isn't "small". Perhaps INLINABLE ?
Question 3. This implementation for parsec/src/Text/Parsec/Prim.hs Lines 713 to 716 in c5add8b
I understand it may well predate |
Bind not inlining usually pessimises the demand analyser quite a bit for monadic code, so I would be wary of such a change. Does removing the inline pragma help with your reproducer? |
... not really 😅 With results cherry-picking, there's maybe a 20ms speedup without the The biggest-impact tweak I tested is the |
It's "correct". What's wrong is that The |
@phadej sure, understood. FWIW, I'm measuring no difference in runtime between this |
We've got an approve to share the reproducer. Steps: git clone --branch bench/parsec-3.15 https://github.com/zoominsoftware/shakespeare
cd shakespeare
cabal bench On my machine, that yields:
As a baseline, this same benchmark without #127:
I'm using Local GHC version: 9.2.7 |
That is not a small reproducer. Please try to minimise. In fact I was mistaken, (I'm sincerelly sorry @wz1000, I forgot I added the test-suite), there is a test for #127 in I.e. today if I run
But if b00129b is reverted, then
i.e. a memory leak. So I don't want to decide which is more important. I'd like to fix both. But to do that, I need a small reproducer for this issue. |
Here is a simple reproducer: import Data.Semigroup
import System.CPUTime
import Text.ParserCombinators.Parsec
main :: IO ()
main = do
time0 <- getCPUTime
check $ stimes 10000 "a "
time1 <- getCPUTime
print $ (time1 - time0) `div` 1000000000
parser :: Parser [String]
parser = many (char 'a' <|> char 'b') `sepBy` char ' '
check :: String -> IO ()
check s = putStrLn $ either show (const "") $ parse parser {- important: pass input as SourceName -} s s Takes 20 ms with parsec 3.1.14.0 and 1180ms with parsec 3.1.16.0 |
Just to be clear, I'm not arguing to revert b00129b. The fix by @wz1000 is entirely sensible; the only reason it's mentioned at all — is this empirical case, where reverting it somehow appears to fix a 100x slowdown. ... Minimized. IMO this can get controversial: As also shown by the previous commenter. 👀 This should make sense: the |
See issue haskell#171. Co-Authored-By: Matt Parsons <parsonsmatt@gmail.com>
Somehow I believe that hashing the My standpoint is that whatever Shakespeare has been doing wrong all these ~10 years, it worked just fine and thus Shakespeare has to be ignored in this discussion; it's a separate unrelated issue and to be fixed accordingly. @phadej, can you please confirm that the reproducer made by @aadaa-fgtaa satisfies the expectations? |
I somehow doubt we need a cryptologically hard hash for |
For posterity, shakespeare-2.1.0.1 solves the regression for us. yesodweb/shakespeare#277 Though of course, more users of parsec may be affected. I'm still failing to understand 2 questions. One, all that Why not, perhaps, this? @ src/Text/Parsec/Error.hs:147 @ mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
-- prefer meaningful errors
| null msgs2 && not (null msgs1) = e1
| null msgs1 && not (null msgs2) = e2
+ | pos1 == pos2 = ParseError pos1 (msgs1 ++ msgs2)
| otherwise
- = case pos1 `compare` pos2 of
+ = case length msgs1 `compare` length msgs2 of
-- select the longest match
EQ -> ParseError pos1 (msgs1 ++ msgs2)
GT -> e1
Second, (I asked this above but didn't get any reply), the empty ok continuation (3rd argument to This'd mean that it is never actually called with any errors. Why bother with merging them then, why not this? -- empty-ok case for m
- meok x s err
- | errorIsUnknown err = unParser (k x) s cok cerr eok eerr
- | otherwise =
- let
- -- in these cases, (k x) can return as empty
- pcok = cok
- peok x s err' = eok x s (mergeError err err')
- pcerr = cerr
- peerr err' = eerr (mergeError err err')
- in unParser (k x) s pcok pcerr peok peerr
+ meok x s _err = unParser (k x) s cok cerr eok eerr
+
-- consumed-error case for m
mcerr = cerr This passes all tests. And improves performance for the pathological "40kiB filename" users. |
You can tell shakespear maintainers that they can revert the fix if they don't like it, as Please give people time.
Incorrect. About longest parse. |
@phadej ok. Then the |
That's wrong analysis too. Nothing wrong with |
To whom it may concern...
Short story
An attempt to migrate our production to the LTS-20.26 (GHC 9.2.8) accidentally discovered a significant performance downfall.
Without the #127:
...and with it:
Based on these (and many others, shared down below) observations, I tend to conclude that #127 performs about ~100 times slower.
Long story and more fruitful pieces of evidence
Once unwanted slowness was spotted, a drill-down analysis was carried out. Below I am going to describe every step taken and its outcome.
Step 1:
perf
profilingperf record -F 33 -g -e cycles:u -- $EXE +RTS -g2 -RTS
was used to profile the executable ($EXE
) built with the-O 2
flag to ensure maximum optimization. Both profiles then were compared against each other. The following was observed for the slowed-down version of the code:Cleary,
ghczmprim_GHCziClasses_zdfOrdZMZNzuzdszdc
stands forinstance Ord
implementation.98,53%
stands for "98,54% of all the userspace cycles were spent executing theghczmprim_GHCziClasses_zdfOrdZMZNzuzdszdc
". That directed the investigation towardsRTS
-based flag instrumentation in order to be able to get a deeper insight into where these comparisons come from.Step 2:
RTS
flags$EXE +RTS -P -RTS
produced a Haskell-native runtime profile, which then was visualized with a flame graph using theghc-prof-flamegraph
command line utility.Comparing the flame graphs clearly highlighted that the
parseLines :: HamletSettings -> String -> Result (Maybe NewlineStyle, HamletSettings, [(Int, Line)])
started to eat up significantly more CPU time 1.That pointed here (to the
Parsec
itself) and its changelog, where #127 popped up as a potential root cause. Reverting the #128 only (which fixed the #127) indeed rollbacks performance back to the expected measure. Trying to figure out what's happening it was decided to dump GHC intermediate representation.Step 3:
GHC
intermediate analysisThe problematic piece of code was built with the
-ddump-simpl -dsuppress-all -dno-suppress-type-signatures
and those were compared. It turned out that the misperforming version generates more code of the following pattern:and the
$fOrd[]_$s$ccompare1
method called about 8 times more often. Unfortunately, I lack knowledge of Haskell internals and thus am unable to analyze the intermediate representation further.The conclusion
To my understanding, these 3 steps advocate #127 to be the root cause. Yet I fail to explain how could that happen. As of now, a hacky hotfix allows us to migrate to the
GHC 9.2.8
avoiding the regression, however, I would like to open a discussion on how one could possibly fix it. The open questions are:-ddump-simpl -dsuppress-all -dno-suppress-type-signatures
flags and can it help to make the issues even more concrete?P.S.
At the moment the code involved in instrumentation lies under an NDA. Thus, I do not share it intentionally due to legal restrictions. I hope the observations above will be sufficient for this discussion to progress.
Footnotes
The
shakespeare-2.1.0
is running in the production. ↩The text was updated successfully, but these errors were encountered: