Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM stuck forever at Pointer.physicalBytesInaccurate() #767

Closed
0x6675636b796f75676974687562 opened this issue Jul 9, 2024 · 13 comments
Closed
Labels

Comments

@0x6675636b796f75676974687562

I'm parsing multiple C++ files with llvm from javacpp-presets.

At some point, after parsing ~100 files (the exact threshold varies), the JVM process gets permanently stuck at Pointer.physicalBytesInaccurate(), not returning even after spending an hour inside this method. The JVM stack trace is:

   java.lang.Thread.State: RUNNABLE
	at app//org.bytedeco.javacpp.Pointer.physicalBytesInaccurate(Native Method)
	at app//org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:705)
	- locked <3542162a> (a java.lang.Class)
	at app//org.bytedeco.javacpp.Pointer.init(Pointer.java:127)
	at app//org.bytedeco.llvm.global.clang.clang_getTypeSpelling(Native Method)
	at app//<private code>
	at app//kotlin.sequences.TransformingSequence$iterator$1.next(Sequences.kt:210)
	at app//kotlin.sequences.SequencesKt___SequencesKt.toCollection(_Sequences.kt:787)
	at app//kotlin.sequences.SequencesKt___SequencesKt.toMutableList(_Sequences.kt:817)
	at app//kotlin.sequences.SequencesKt___SequencesKt.toList(_Sequences.kt:808)
	at app//<private code>
	at app//org.bytedeco.llvm.global.clang.clang_visitChildren(Native Method)
	at app//<private code>
	at platform/jdk.httpserver@17.0.3.1/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at platform/jdk.httpserver@17.0.3.1/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at platform/jdk.httpserver@17.0.3.1/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at platform/jdk.httpserver@17.0.3.1/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:733)
	at platform/jdk.httpserver@17.0.3.1/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at platform/jdk.httpserver@17.0.3.1/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:700)
	at java.base@17.0.3.1/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base@17.0.3.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at app//<private code>
	at app//kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

As you can see, my code is calling clang.clang_getTypeSpelling(), and this attempt hangs while trying to invoke the Pointer.init()Pointer.deallocator()Pointer.physicalBytesInaccurate() chain.

  • The exact library version is org.bytedeco.javacpp:1.5.9.
  • Java version is 17.0.3.1, from Oracle.

The JVM arguments are as follows:

-server
--enable-preview
-Xss2m
-XX:InitialRAMPercentage=80.0
-XX:MaxRAMPercentage=80.0
-XX:MaxRAM=1073741824
-XX:+UseParallelGC
-XX:+CompactStrings
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:+CreateCoredumpOnCrash
-Dorg.bytedeco.javacpp.maxPhysicalBytes=1073741824
-Dfile.encoding=UTF-8
-Djansi.force=true
-Djdk.attach.allowAttachSelf=
-Dsun.stderr.encoding=UTF-8
-Dsun.stdout.encoding=UTF-8

The maximum process memory to be used by either JVM or Libclang is set to 1 GB (1073741824), and the maximum heap size is set to be 80% of that value (i.e. 786 MB).

The reverse call tree obtained from the profiler:

image

@saudet
Copy link
Member

saudet commented Jul 10, 2024

Please try to set the "org.bytedeco.javacpp.nopointergc" system property to "true".

@0x6675636b796f75676974687562
Copy link
Author

0x6675636b796f75676974687562 commented Jul 10, 2024

Please try to set the "org.bytedeco.javacpp.nopointergc" system property to "true".

Thank you Samuel @saudet, I'll try to and get back with my feedback in a short while.

For what it's worth, the JVM thread is stuck at NtQueryVirtualMemory@ntdll.dll, probably spinning waiting for some condition to be met:

image

Stack:

image

So I think I'll also play with memory settings (-Xmx and org.bytedeco.javacpp.maxPhysicalBytes) and heap/non-heap/native ratio and see whether there's any change.

@0x6675636b796f75676974687562
Copy link
Author

0x6675636b796f75676974687562 commented Jul 10, 2024

Samuel @saudet, after some time spent searching, it looks like my issue is similar to tensorflow/java#208, and I have two questions:

  1. What's the difference between setting org.bytedeco.javacpp.noPointerGC to true and org.bytedeco.javacpp.maxPhysicalBytes to zero? According to the code, both effectively disable the JavaCPP-triggered garbage collection.

  2. If the garbage collection is disabled, for the native memory to get freed, is it sufficient to treat Pointer descendants as regular AutoCloseable's (i.e. invoke close() in a finally block immediately once I'm done using the object)? The reason I'm asking is that, once I've set noPointerGC, the Working Set size of my Java process continues to grow as the process is running, and the peak Working Set value is now considerably (~1.25x) larger than it used to be with GC enabled (6.2+ GB vs 5.1 GB).

    In my own scenario, I observe the following numbers:

    org.bytedeco.javacpp.noPointerGC org.bytedeco.javacpp.maxPhysicalBytes Peak Working Set
    false 0 5040 MB
    false 4096 MB Process hung at Pointer.physicalBytesInaccurate()
    false 8192 MB 5128 MB
    true 0 4700 MB
    true 6144 MB 6750 MB
    true 8192 MB 6250 MB

@saudet
Copy link
Member

saudet commented Jul 11, 2024

When maxPhysicalBytes is 0 it just doesn't try as hard to release memory, that's all.

Yes, Pointer.close() is for that purpose, but it's easier to use PointerScope:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

@0x6675636b796f75676974687562
Copy link
Author

Samuel @saudet, thank you for your response, I really appreciate your feedback.

Unfortunately, adding PointerScope to the mix didn't change much, probably because we were already properly closing all the pointers we controlled. Yet, quite contrary to the experience of your other users, native memory (Working Set) usage doesn't settle at 1 GB, nor at 4 GB. Instead, it keeps growing:

image

despite our used JVM heap is shallow (< 1 GB):

image

There're indeed minor "drops" in Working Set (or, consequently, Private Bytes values) whenever a Pointer is manually released and/or PointerScope left, but overall the memory keeps growing:

image

Can this issue be caused by a recursive nature of Libclang and its clang_visitChildren() and CXCursorVisitor API? Because in this case, JVM and native stack frames are heavily interleaved. Is it possible that PointerScope is not "visible" across a native stack frame?

Can you suggest how we can further diagnose the problem?

We've tried what looks like all possible combinations of property values:

  • org.bytedeco.javacpp.maxBytes
  • org.bytedeco.javacpp.maxPhysicalBytes
  • org.bytedeco.javacpp.maxRetries
  • org.bytedeco.javacpp.noPointerGC

— but made very little progress so far.

@0x6675636b796f75676974687562
Copy link
Author

0x6675636b796f75676974687562 commented Jul 11, 2024

Samuel @saudet, a few more observations.

Here's the expected growth of the Working Set in the presence of unidentified memory leaks:

image

If I sprinkle the code with more PointerScope instances here and there, memory leaks don't go away -- instead, this merely slows everything down (as you can see, the graph gets scaled horizontally):

image

Finally, despite memory limits and garbage collection are essentially disabled (maxBytes=0, maxPhysicalBytes=0, maxRetries=0, noPointerGC=true), sometimes JavaCPP may still think it has run out of memory; in this case all useful (application's) I/O stops and CPU usage hits 80% (in native code):

image

@saudet
Copy link
Member

saudet commented Jul 11, 2024 via email

@0x6675636b796f75676974687562
Copy link
Author

By trial and error, I figured out how to prevent memory leaks when using Libclang, despite its documentation is brief and insufficient.

  1. Basically, when parsing C++ code, there're two phases: first we initialize an instance of CXTranslationUnit via clang_parseTranslationUnit() or clang_parseTranslationUnit2(), then we traverse the AST via clang_visitChildren(). Speaking of the first phase, a call to clang_parseTranslationUnit2() should be surrounded with a PointerScope:
    	try (final var ignored = new PointerScope()) {
    		clang_parseTranslationUnit2(...);
    	}
  2. When implementing your CXCursorVisitor, despite the documentation says nothing about it, all three call() arguments should be closed when exiting this method, otherwise memory will be leaking:
    class AstVisitor extends CXCursorVisitor {
    	@Override
    	public int call(final CXCursor cursor, final CXCursor parent, final CXClientData clientData) {
    		try (cursor; parent; clientData) {
    			// ...
    		}
    
    		return CXChildVisit_Recurse;
    	}
    }
  3. Additionally, at the beginning of the call() method, entering a new PointerScope is also 100% necessary, probably because the outer ("lower") stack frame is a native one (i.e. call() is directly invoked by the native code). Despite previously registered ("outer") pointer scopes are still visible, having only a single scope per translation unit (i.e., AST tree) rather than
    per cursor eventually results in 100% usage of all CPU cores — in the native code.
  4. Finally, a buffer of CXToken instances created with clang_tokenize() needs to be properly disposed of via clang_disposeTokens(). If this is not done, memory will leak no matter what. Most Java engineers, myself included, will forget to reset the pointer so that it points to the beginning of the buffer (via CXToken.position(long)). This is very similar in nature to the Buffer.flip() invocation in the Java NIO API. If the pointer is not reset, the call to clang_disposeTokens() will result in a segmentation fault. So the correct usage example would be:
    void forEachToken(
    		final CXCursor cursor,
    		final Consumer<? super CXToken> action
    ) {
    	try (final var extent = clang_getCursorExtent(cursor)) {
    		try (final var translationUnit = clang_Cursor_getTranslationUnit(cursor)) {
    			try (final var tokens = new CXToken()) {
    				final var tokenCountRef = new int[1];
    				clang_tokenize(translationUnit, extent, tokens, tokenCountRef);
    				final var tokenCount = tokenCountRef[0];
    				try {
    					IntStream.range(0, tokenCount)
    						 .mapToObj(tokens::position)
    						 .forEach(action);
    				} finally {
    					tokens.position(0L);
    					clang_disposeTokens(translationUnit, tokens, tokenCount);
    				}
    			}
    		}
    	}
    }

One the above is done, the Java application can be safely launched with pointer garbage collection disabled, and in my scenario memory usage stabilizes at around 600 to 700 MB (as opposed to 5 GB with memory leaks):

-Dorg.bytedeco.javacpp.maxBytes=0
-Dorg.bytedeco.javacpp.maxPhysicalBytes=0
-Dorg.bytedeco.javacpp.maxRetries=0
-Dorg.bytedeco.javacpp.noPointerGC=true

I've set up a sample repo with my findings available as a runnable code.

This issue can be closed. Thank you for your support.

@saudet
Copy link
Member

saudet commented Jul 17, 2024

Thanks for the detailed explanations! It would be great if your could contribute sample code that demonstrate all this

@0x6675636b796f75676974687562
Copy link
Author

0x6675636b796f75676974687562 commented Jul 18, 2024

It would be great if your could contribute sample code that demonstrate all this

Definitely. I could merge my source code into a single self-contained Java file and add it to llvm/samples/clang.

Yet, the existing LLVM samples are currently intended to also run on Java 7.

I backported my own samples from Java 17 to Java 7 and 8.

The question is: which version (7, 8, or 17) do you want added to LLVM samples?

@saudet
Copy link
Member

saudet commented Jul 19, 2024

It doesn't really matter what version of Java the samples are in, whichever is fine 👍
Although Java 8 is probably the most currently used version, so that would be best I guess?

@0x6675636b796f75676974687562
Copy link
Author

Samuel @saudet, here you go:

@saudet
Copy link
Member

saudet commented Aug 3, 2024

Thanks for the contribution!

@saudet saudet closed this as completed Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants