Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much memory allocation/poor performance in long run tasty execution #917

Closed
jneira opened this issue Dec 21, 2018 · 17 comments
Closed

Comments

@jneira
Copy link
Collaborator

jneira commented Dec 21, 2018

When running tests or benchmarks eta executables the take excessive heap memory.

Description

Detected when running the dhall-eta test suite in windows and circleci (see #915 (comment) )

Expected Behavior

The proccess should use less memory (not sure about how many)

Actual Behavior

The process takes up to 2.5 Gb

Possible Fix

See @rahulmutt comment: #915 (comment)

Steps to Reproduce

  1. Run dhall-eta test suite in local or circleci

Context

Setup test suite for dhall-eta

Your Environment

@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

Hi, i've collected some snapshots, made with jvisualvm
dhall.eta.tasty.profiles.zip
:

  • CPU profiling
    • dhall.eta.tasty.nps with output in the console
    • dhall.eta.tasty.pipe.to.file.nps with the output redirected to a file
  • dhall.eta.tasty.heap.nps: memory profiling

Taking a quick look:

  • The diff between the console and the file output is brutal. When redirecting a lot of threads are spawned and the execution is so much quickier. With console output only one thread is used.
  • In memory snapshot the class with more allocation is eta.runtime.thunk.SelectorPUpd

@jneira jneira changed the title Too much memory allocation in long run eta processes Too much memory allocation/poor performance in long run tasty execution Jan 3, 2019
@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

So passing --hide-successes to tasty execution makes the test suite run a way faster: https://circleci.com/gh/eta-lang/dhall-eta/75

@rahulmutt
Copy link
Member

Thanks for the observation. That surely means there's a memory leak to investigate here since holding on to less info made it run faster.

@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

Yeah, although memory usage and gc overhead is similar between console and redirecting to file. Not sure if there is a memory leak cause once the process take the maximum memory possible the usage is pretty stable

monitor

dhall.eta.tasty.pipe.to.file.heap.nps.zip

@rahulmutt
Copy link
Member

I have a suspicion that this has to do with native memory allocation and not the heap memory. Can you check that as well? I think the MemoryManager isn't freeing as often or as well as it should leading to native heap growing endlessly.

@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

In fact, not all executions to console are equal, i've taken another one and it was similar to the redirect one 🤔

monitor console

I've monitored native memory taking some samples as suggested in https://stackoverflow.com/a/30941584/49554:

native.txt

@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

Another file with native memory samples including time:
native.txt

@rahulmutt
Copy link
Member

Hmm well I was wrong about that - it looks like the native memory usage increases very gradually and in amounts < 1MB. Btw you can view native memory in VisualVM by enabling the "VisualVM-BufferMonitor" plugin.

@jneira
Copy link
Collaborator Author

jneira commented Jan 3, 2019

Wow, thanks for the tip

@nightscape
Copy link

nightscape commented Jan 3, 2019

Another thing that might be interesting is if the JVM is taking the memory just because it can, or if it really needs it.
You could test that by pressing the "Perform GC" button when it reaches the peak and check how much it drops.

@nightscape
Copy link

Another helpful tool is Eclipse MAT.
MAT operates on JVM memory dumps and you can do all sorts of analyses, e.g. find out which object types consume how much memory, find out by which instances another instance is referenced, etc.

@rahulmutt
Copy link
Member

rahulmutt commented Jan 3, 2019

@jneira If the largest number of classes you see is eta.runtime.thunk.SelectorPUpd then this could mean it's an issue of the Eta runtime's lack of selector thunk optimization.

We probably need to implement this:
#517

A simple way to implement it is to spawn a thread when the runtime system initializes and just have it traverse the weak references to the selector thunks periodically to see if they can be reduced.

More details on how this leak occurs here:
https://homepages.inf.ed.ac.uk/wadler/papers/leak/leak.ps.gz

@jneira
Copy link
Collaborator Author

jneira commented Jan 4, 2019

@nightscape thanks for the tip! i am afraid that doing a gc does not free any significant memory so the 1500 Mb max seems to be needed

@rahulmutt
Copy link
Member

Some progress updates:

  • I've implemented a basic form of selector thunk optimization via StgContext-local weak references. The solution doesn't involve multiple threads and automatically bounds the number of weak references created to avoid causing extra GC overhead. It appears showing better memory characteristics than before, but it can still be better. The next step is to short out thunk indirections to let go of even more memory.

    I've been using this code to test the optimization (inspired by the Wadler paper):

    import System.IO
    import System.Directory
    import Data.Function
    
    main :: IO ()
    main = do
      let file = "hello"
          file2 = "hello2"
      contents <- readFile file
      let insertb xs = before ++ "b" ++ after
            where (before, after) = break (== 'b') xs
      writeFile file2 (insertb contents)
      removeFile file2

    Where hello is a file with a large number of characters other than 'b'.

  • I'm also going to implement general thunk clearing that is thread-safe so that I can re-enable it by default. Without thunk clearing, severe space leaks can happen so it is absolutely essential that it be done. It can be enabled even now with -Deta.rts.clearThunks=true and in fact I had to do so to even verify that the selector thunk optimization was working.

It will probably take a couple more days to implement what I mentioned above.

@rahulmutt
Copy link
Member

@jneira I've implemented both selector thunk optimization and re-enabled thunk clearing because it is now thread-safe (verified by running eta-benchmarks which failed with spurious NPEs before b/c of thunk clearing and now runs smoothly).

Wait until the docker image for the current master is ready and go ahead an re-run the CircleCI build for dhall-eta and see how it fares.

@jneira
Copy link
Collaborator Author

jneira commented Jan 10, 2019

In the bright side, the execution in local had a simply amazing improvement in both memory and time, fantastic work @rahulmutt:

image

So the main goal had been achieved!

But i am afraid the build in circleci hangs anyway so maybe it is caused for another reason. In my windows test the openBinaryFile: resource busy (file is locked) persist.

@jneira
Copy link
Collaborator Author

jneira commented Jan 10, 2019

I am going to close this one cause the memory allocation is resolved

@jneira jneira closed this as completed Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants