Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(>>=) leaks memory #127

Closed
wz1000 opened this issue Mar 22, 2021 · 0 comments
Closed

(>>=) leaks memory #127

wz1000 opened this issue Mar 22, 2021 · 0 comments

Comments

@wz1000
Copy link
Contributor

wz1000 commented Mar 22, 2021

I have the following program:

module Main where
  
import Text.Parsec
import System.Environment
import Control.Monad

main :: IO ()
main = do
  n <- read . head <$> getArgs
  print $ runParser (replicateM_ n $ pure ()) () "test" ""

Running it with +RTS -s(n = 500000), I get:

$ ./leak 500000 +RTS -s                                                                                                                                                                                            Right ()
     212,105,040 bytes allocated in the heap
     275,152,536 bytes copied during GC
      56,962,568 bytes maximum residency (8 sample(s))
       1,384,952 bytes maximum slop
             114 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  21.8% of total user, 21.7% of total elapsed

With n = 5000000:

$ ./leak 5000000 +RTS -s                                                                                                                                                                                                    Right ()
   2,120,105,552 bytes allocated in the heap
   2,977,111,368 bytes copied during GC
     439,420,112 bytes maximum residency (12 sample(s))
      10,239,688 bytes maximum slop
            1259 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  19.8% of total user, 19.7% of total elapsed

Clearly it consumes memory at least linearly in n.

To investigate, I used (the as yet unreleased) Info Table Profiling, which resulted in the following graph:

2021-03-22-150349_grab

The source locations it points to are all in parserReturn and parserBind.

The first ParseError thunk originates in parserReturn, in the call to unknownError:

parserReturn x
    = ParsecT $ \s _ _ eok _ ->
      eok x s (unknownError s)

The second and third FUN closures originate in parserBind, in particular peok and peerr.

parserBind m k
  = ParsecT $ \s cok cerr eok eerr ->
     ...
        -- empty-ok case for m
        meok x s err =
            let
                -- in these cases, (k x) can return as empty
                pcok = cok
                peok x s err' = eok x s (mergeError err err')
                pcerr = cerr
                peerr err' = eerr (mergeError err err')
            in  unParser (k x) s pcok pcerr peok peerr
     ...
    in unParser m s mcok mcerr meok meerr

If we look at the definition of mergeError and unknownError, we see that unknownError s is a neutral element for mergeError. This means that our program doesn't actually need to allocate all these closures, since unknownError is the only kind of ParseError that is generated by the program. In particular, peok and peerr would be the same as eok and eerr respectively under this identity.

After applying these changes:

diff --git a/src/Text/Parsec/Prim.hs b/src/Text/Parsec/Prim.hs
index 378da8d..75b503c 100644
--- a/src/Text/Parsec/Prim.hs
+++ b/src/Text/Parsec/Prim.hs
@@ -318,7 +318,9 @@ parserBind m k
   = ParsecT $ \s cok cerr eok eerr ->
     let
         -- consumed-okay case for m
-        mcok x s err =
+        mcok x s err
+          | errorIsUnknown err = unParser (k x) s cok cerr cok cerr
+          | otherwise =
             let
                  -- if (k x) consumes, those go straigt up
                  pcok = cok
@@ -335,7 +337,9 @@ parserBind m k
             in  unParser (k x) s pcok pcerr peok peerr
 
         -- empty-ok case for m
-        meok x s err =
+        meok x s err
+          | errorIsUnknown err = unParser (k x) s cok cerr eok eerr
+          | otherwise =
             let
                 -- in these cases, (k x) can return as empty
                 pcok = cok

we get

$ ./leak 5000000 +RTS -s
Right ()
   1,680,105,552 bytes allocated in the heap
         122,976 bytes copied during GC
          63,296 bytes maximum residency (2 sample(s))
          30,728 bytes maximum slop
               2 MiB total memory in use (0 MB lost due to fragmentation)

  Productivity  98.8% of total user, 98.6% of total elapsed

and a flat graph:

2021-03-22-160340_grab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants