Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Data.Text.IO.readFile throws intermitent java.nio.channels.ClosedByInterruptException #915

Closed
jneira opened this issue Dec 12, 2018 · 21 comments

Comments

@jneira
Copy link
Collaborator

jneira commented Dec 12, 2018

Description

dhall-eta tests reads lot of dhall files using Data.Text.IO.readFile and some of the calls throws java.nio.channels.ClosedByInterruptException

Expected Behavior

All files should be readed correctly

Actual Behavior

Some of the calls throws:

normalization/success/prelude/Natural/even/0A.dhall:          FAIL
      Exception: 
      ↳ ./../dhall-lang/Prelude/package.dhall
      JException java.nio.channels.ClosedByInterruptException

In windows i get occasionally

      Exception:
      ↳ ./../dhall-lang/Prelude/package.dhall
        ↳ ./../dhall-lang/Prelude/Integer/package.dhall
          ↳ ./../dhall-lang/Prelude/Integer/toDouble
      D:\ws\eta\dhall\dhall-eta\..\dhall-lang\Prelude\Integer\toDouble: openFile: resource busy (file is locked)

Possible Fix

Not sure where the bug can be fixed.. to investigate

Steps to Reproduce

  1. Trigger a circleci test build for dhall-eta like https://circleci.com/gh/eta-lang/dhall-eta/13
  2. Or checkout dhall-eta and run etlas test in local

Context

Setup test suite for dhall-eta

Your Environment

@rahulmutt
Copy link
Member

I've observed similar spurious exceptions when running a load test against any Eta web framework. This is most likely due to the Eta RTS's usage of Thread.interrupt(). Recently, I recall changing a bunch of places in the code to use a volatile boolean instead of the interrupt flag of the thread. It may be that there are still places where it's still using the old interrupt mechanism.

@rahulmutt
Copy link
Member

When reproducing on my system, I got the following:

    Should import relative
      Importing fieldOrderA:                                      FAIL
        Exception: user error (The import failed but it had to success. Dhall exception: 
        ↳ ./../dhall-lang/tests/import/data/fieldOrder/1.dhall sha256:4a7866b88389e18cf481b525544fd7903325252faf3a86c8fdc981298c788a9b
        JException java.nio.channels.ClosedByInterruptException)
      Importing issue553B:                                        FAIL
        Exception: user error (The import failed but it had to success. Dhall exception: 
        Error: Import integrity check failed
        
        Expected hash:
        
        ↳ e2d014696fb7d773727ae5aa42dc20bbd2447ea82bcb5971ccbb7763906edace
        
        Actual hash:
        
        ↳ 6fb582c043889dd5a4c97176f8a58d2633252b5374cb71e288b93bc3757f9643
        )

and

    Should import
      Importing asTextA:                                          FAIL
        Exception: user error (The import failed but it had to success.)

and

   Should NOT parse
      Parsing annotation:                                         FAIL
        Exception: user error (The parsing succeded but it had to fail)
      Parsing mandatoryNewline:                                   FAIL
        Exception: user error (The parsing succeded but it had to fail)
      Parsing missingSpace:                                       FAIL
        Exception: user error (The parsing succeded but it had to fail)
      Parsing spaceAfterListAppend:                               FAIL
        Exception: user error (The parsing succeded but it had to fail)

@rahulmutt
Copy link
Member

I have verified that the above test results are reproducible. I'll look into them one by one.

@jneira
Copy link
Collaborator Author

jneira commented Dec 18, 2018

Yeah, i've got those result too:

@rahulmutt
Copy link
Member

Ok to be clear, which tests are a problem with Eta itself?

@jneira
Copy link
Collaborator Author

jneira commented Dec 18, 2018

For now errors caused by java.nio.channels.ClosedByInterruptException's in all envs and openFile: resource busy (file is locked) in windows

@rahulmutt
Copy link
Member

Tracked down the exception and pushed a fix to master. Couldn't reproduce the resource busy exception - probably has to do with file handles not being closed properly. Will need to try it out on a windows system.

@jneira
Copy link
Collaborator Author

jneira commented Dec 18, 2018

Great! I am afraid that the test suite now hangs, only in circleci https://circleci.com/gh/eta-lang/dhall-eta/29 😟

@rahulmutt
Copy link
Member

Can you re-run it with SSH and share the output of jstack?

@jneira
Copy link
Collaborator Author

jneira commented Dec 19, 2018

It seems jstack doesnt work:

root@612f58a556a0:~# jps -l
16929 sun.tools.jps.Jps
13618 eta.main
root@612f58a556a0:~# jstack 13618
13618: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
root@612f58a556a0:~# jstack 13618 -F
Attaching to core -F from executable 13618, please wait...
Error attaching to core file: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:286)
        at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:673)
        at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
        at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
        at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
        at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:156)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:191)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)

Same error after su root

@jneira
Copy link
Collaborator Author

jneira commented Dec 19, 2018

Strangely i can see dhall cache files produced by tests i cant see in the log 🤔

root@612f58a556a0:~# ls .cache/dhall
47a62403bcd28107cb0a99b05fc4d8bc73f06277c422d059ba9efdd50307abbc
cf1692707b5330f532768bcbaa7795ee56bf8b74431d063abb8435df77825015
d36b3384c6e0dc1837f5740b14eb20b1c51de1cb16a8008ee577d7d8f7ea115e

@jneira
Copy link
Collaborator Author

jneira commented Dec 19, 2018

@rahulmutt in the bright side, with master version of eta, both error about files (java.nio.channels.ClosedByInterruptException and openFile: resource busy (file is locked)) are gone!
Thank for the fix

@rahulmutt
Copy link
Member

Awesome to hear that the fix worked. So we just need to figure out how to debug the hanging on CircleCI.

@rahulmutt
Copy link
Member

While I can't say why it would take 10 mins to execute w/o output, perhaps adding a longer timeout will give some clue?

@jneira
Copy link
Collaborator Author

jneira commented Dec 20, 2018

Mmm, it doesn't should take so many time but i'll try to increase the timeout to test it

@jneira
Copy link
Collaborator Author

jneira commented Dec 20, 2018

  • I've set the timeout to 30m and the output did seem to progress a little bit more but hanged at some point:
dhall-eta tests
  Parser tests
    Should parse
      Parsing constructorsA:                                      OK (4.24s)
      Parsing singleQuotedStringA:                                OK (4.24s)
      Parsing forallA:                                            OK (4.24s)
      Parsing doubleQuotedStringA:                                OK (4.24s)
      Parsing largeExpressionA:                                   OK (4.52s)
      Parsing labelA:                                             OK (9.85s)
      Parsing quotedLabelA:                                       OK (9.85s)
      Parsing operatorsA:                                         OK (9.86s)
      Parsing collectionImportTypeA:                              OK (4.24s)
      Parsing recordA:                                            OK (9.85s)
      Parsing pathsA:                                             OK (4.24s)
      Parsing quotedPathsA:                                       OK (4.24s)
      Parsing functionTypeA:                                      OK (4.23s)
      Parsing parenthesizeUsingA:                                 OK (9.82s)
      Parsing reservedPrefixA:                                    OK (9.85s)
      Parsing pathTerminationA:                                   OK (4.24s)
  • Otoh i am afraid the windows i've got again the windows error. Strangely the error is not thrown if i pipe the test log output to a file but persist if i let the output write in stdout 😮

@jneira jneira pinned this issue Dec 20, 2018
@jneira
Copy link
Collaborator Author

jneira commented Dec 20, 2018

Ok, i've resolved the hang:

  • I've detected that, in local, the java process was taking 2.5 Gb of ram
  • In circleci i had fixed -Xmx2Gb
  • So the java process was busy all of the time doing garbage collections
  • Setting -Xmx3Gb has resolved the issue: https://circleci.com/gh/eta-lang/dhall-eta/42

Now maybe we should investigate why the java process is using that excessive amount of memory...
And why the java process did not throw the infamous java.lang.OutOfMemoryError: GC overhead limit exceeded

@rahulmutt
Copy link
Member

Great sleuthing @jneira! :)

Facing something similar with eta-benchmarks circleci build. I have a gut feeling that the excessive memory usage may be related to the fact that thunk clearing is disabled by default because it causes correctness errors when the same thunk is evaluated concurrently across threads.

Thunk clearing is a GC optimization that clears out the free variables of a thunk once it's been evaluated since the thunk doesn't need to be evaluated anymore. The long-term fix for this so that we can enable it by default again is to change the code generation for thunks a little bit that passes in free variables as an argument. This code generation change should make switching over to invokedynamic much easier as well.

Since the original issue at hand is resolved, can we continue the discussion in a new thread about memory usage?

@jneira
Copy link
Collaborator Author

jneira commented Dec 20, 2018

Yeah, only left the intermitent openFile: resource busy (file is locked) error for windows. But maybe it should be in another issue too.

@rahulmutt
Copy link
Member

Yes, please file that in a new issue as well. As a tip for debugging exceptions like that:

export ETA_JAVA_ARGS="-Deta.debug.exceptions=true -Deta.debug.exceptionsVerbose=true -Deta.debug.toFile=true"

It'll dump ALL exceptions including those that have been caught in between and re-thrown. Extremely useful when you want to know what's really going on, and can potentially generate huge files.

@jneira
Copy link
Collaborator Author

jneira commented Dec 21, 2018

Filed specifics issues about windows error and excessive memory allocation

@jneira jneira closed this as completed Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants