-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conduit's internal Pipe is eating my RAM #370
Comments
Thanks for the report. I'm not going to have a chance to look into this yet. A repro that doesn't use |
I've been using conduit-1.2.10 until now, but would like to move to 1.3.1.1, but this bug is holding me back and I don't see how to avoid it. This simple code: {-# OPTIONS_GHC -fno-full-laziness #-}
module Main where
import Data.Conduit (runConduitRes, (.|))
import qualified Data.Conduit.Binary as C
import qualified Data.Conduit.List as C
import System.Directory (getDirectoryContents)
import System.Environment (getArgs)
import qualified Text.XML.Stream.Parse as X
parseXMLFile :: FilePath -> IO ()
parseXMLFile f = do
_ <- runConduitRes (C.sourceFile f
.| X.parseBytes X.def
.| C.sinkNull)
return ()
main :: IO ()
main = do
files <- getArgs
mapM_ parseXMLFile files run with
never even reaches 1MB memory usage on conduit-1.2.10 / xml-conduit-1.5.1, while with conduit-1.3.1 / xml-conduit-1.8.0.1 it goes up to 100MB when I pass it a 23MB xml file. If I pass the same file twice it of course makes sense to retain the result value, but the result should not take 100MB :) The I can avoid the leak with e.g. explicitly |
@unhammer, have you tried applying |
No, I was just applying it to my local project … @zudov Thank you so much, that fixed it =D It now runs in nearly constant memory even in our big pipeline with zero other changes than diff --git a/stack.yaml b/stack.yaml
index abeb271b..f9f511ac 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -49,6 +49,8 @@
+ghc-options:
+ xml-conduit: -fno-full-laziness
|
@unhammer happy to hear this fixes it for you! (credit goes to folks at Well-Typed for the suggestion for However since the original report we switched to We're now trying to have a small replication without |
@f-f are you using xeno with conduit? I thought there wasn't any conduit-support for xeno yet. Or are you saying the leak is because of a newer GHC? |
Here's an interesting space leak that we have discovered after switching to
lts-11
(andconduit-1.3
).Background
We have a sync job that requests a big XML file over HTTP, parses it into a stream of customer records and then processes them. The processing involves database calls and in some circumstances (when customer record lacks data or are duplicated) HTTP calls with more XML parsing.
Reproducing it
I've made this example that only depends on
conduit
andxml-conduit
.Changes inxml-conduit
don't appear very dangerous after a brief look. I suspect that it's reproducible withoutxml-conduit
, but I might be wrong.For ease of reproduction the
input
here is generated in-place, it can as well be read from network or handle. The input is a stream looking like<ID>0</ID><ID>1</ID><ID>2</ID>...
, it gets parsed into a stream of"0"
,"1"
,"2"
. This is then processed bymapM_
that contains another reference toXML.parseBytes
.Compiling the binary against
conduit-1.3
and running it with+RTS -s -p
gives the following output:The culpit of the leak is in the fact that there's "outer" and "inner" reference to
XML.parseBytes
. The inner reference never gets executed, yet the mere presence of it triggers the issue. Producing the input without xml parsing (Conduit.iterate succ 0 .| Conduit.map show
) fixes the issue. Replacing the inner conduit with justrunConduit $ pure () .| sinkNull
orerror "IMPOSSIBLE HAPPENED"
fixes the issue as well.Trying conduit-1.2
Here's the output of the same code built against
conduit-1.2
Trying to profile
Building it against
conduit-1.3
WITH profiling, makes the problem go away and gives this output (take note of run timing, it's much slower):I haven't yet managed to build the thing with
-prof
, but without havingSCC
s everywhere (since stack is forcing-auto-all
and-auto-cafs
)Attempts to find a fix
Here is a incomplete and fuzzy list of things that I've tried (in combinations as well) and that DIDN't help with the issue:
-O0
-fno-full-laziness
-fno-state-hack
NOINLINE
pragmas, passing dummy arguments to avoid unwantedCAF
sThe biggest suspect that I had is unwanted conduit sharing in CAFs, but so far I couldn't prove that this is the case.
The second big suspicion are the rewrite rules. I haven't yet tried to debug them (e.g. looking at what rules are firing, etc). I've inspected the Core output to ensure that no sharing is happening, but I didn't inspect the changes that happen by enabling/disabling rewrite rules.
The next logical step would be to get reproduce it without
xml-conduit
which would allow to have much better grasp at what's going on.The text was updated successfully, but these errors were encountered: