(>>=) leaks memory #127

wz1000 · 2021-03-22T10:44:37Z

I have the following program:

module Main where
  
import Text.Parsec
import System.Environment
import Control.Monad

main :: IO ()
main = do
  n <- read . head <$> getArgs
  print $ runParser (replicateM_ n $ pure ()) () "test" ""

Running it with +RTS -s(n = 500000), I get:

$ ./leak 500000 +RTS -s                                                                                                                                                                                            Right ()
     212,105,040 bytes allocated in the heap
     275,152,536 bytes copied during GC
      56,962,568 bytes maximum residency (8 sample(s))
       1,384,952 bytes maximum slop
             114 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  21.8% of total user, 21.7% of total elapsed

With n = 5000000:

$ ./leak 5000000 +RTS -s                                                                                                                                                                                                    Right ()
   2,120,105,552 bytes allocated in the heap
   2,977,111,368 bytes copied during GC
     439,420,112 bytes maximum residency (12 sample(s))
      10,239,688 bytes maximum slop
            1259 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  19.8% of total user, 19.7% of total elapsed

Clearly it consumes memory at least linearly in n.

To investigate, I used (the as yet unreleased) Info Table Profiling, which resulted in the following graph:

The source locations it points to are all in parserReturn and parserBind.

The first ParseError thunk originates in parserReturn, in the call to unknownError:

parserReturn x
    = ParsecT $ \s _ _ eok _ ->
      eok x s (unknownError s)

The second and third FUN closures originate in parserBind, in particular peok and peerr.

parserBind m k
  = ParsecT $ \s cok cerr eok eerr ->
     ...
        -- empty-ok case for m
        meok x s err =
            let
                -- in these cases, (k x) can return as empty
                pcok = cok
                peok x s err' = eok x s (mergeError err err')
                pcerr = cerr
                peerr err' = eerr (mergeError err err')
            in  unParser (k x) s pcok pcerr peok peerr
     ...
    in unParser m s mcok mcerr meok meerr

If we look at the definition of mergeError and unknownError, we see that unknownError s is a neutral element for mergeError. This means that our program doesn't actually need to allocate all these closures, since unknownError is the only kind of ParseError that is generated by the program. In particular, peok and peerr would be the same as eok and eerr respectively under this identity.

After applying these changes:

diff --git a/src/Text/Parsec/Prim.hs b/src/Text/Parsec/Prim.hs
index 378da8d..75b503c 100644
--- a/src/Text/Parsec/Prim.hs
+++ b/src/Text/Parsec/Prim.hs
@@ -318,7 +318,9 @@ parserBind m k
   = ParsecT $ \s cok cerr eok eerr ->
     let
         -- consumed-okay case for m
-        mcok x s err =
+        mcok x s err
+          | errorIsUnknown err = unParser (k x) s cok cerr cok cerr
+          | otherwise =
             let
                  -- if (k x) consumes, those go straigt up
                  pcok = cok
@@ -335,7 +337,9 @@ parserBind m k
             in  unParser (k x) s pcok pcerr peok peerr
 
         -- empty-ok case for m
-        meok x s err =
+        meok x s err
+          | errorIsUnknown err = unParser (k x) s cok cerr eok eerr
+          | otherwise =
             let
                 -- in these cases, (k x) can return as empty
                 pcok = cok

we get

$ ./leak 5000000 +RTS -s
Right ()
   1,680,105,552 bytes allocated in the heap
         122,976 bytes copied during GC
          63,296 bytes maximum residency (2 sample(s))
          30,728 bytes maximum slop
               2 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  98.8% of total user, 98.6% of total elapsed

and a flat graph:

The text was updated successfully, but these errors were encountered:

wz1000 mentioned this issue Mar 22, 2021

Plug leak in bind #128

Closed

phadej closed this as completed Nov 19, 2021

silencespeakstruth mentioned this issue Sep 7, 2023

The recent fix of the (>>=) memory leak seems to cause an enormous performance degradation. #171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(>>=) leaks memory #127

(>>=) leaks memory #127

wz1000 commented Mar 22, 2021 •

edited

Loading

(>>=) leaks memory #127

(>>=) leaks memory #127

Comments

wz1000 commented Mar 22, 2021 • edited Loading

wz1000 commented Mar 22, 2021 •

edited

Loading