-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct memory allocation is slow #299
Comments
Memory allocation with MSVC is known to be slow, that's not really JavaCPP's fault. JNA and BridJ don't use C++ to allocate memory. We could allocate memory the same way for JavaCPP with, for example, Pointer.malloc(), cast it to SYSTEMTIME, and that should be faster. Could you give that a try? |
The below code performs indeed much faster:
Now the question is, why not to generate a struct's default constructor implementation with |
We could, but it wouldn't be C++ :) I think Win32 doesn't throw C++ exceptions though, so we can probably speed this up with a |
Ah, no, we already have try (SYSTEMTIME systemtime = new SYSTEMTIME()) {
windows.GetSystemTime(systemtime);
return systemtime.wSecond();
} |
SYSTEMTIME is a C struct (with no constructor), and I believe that library users would prefer faster implementation with C rather than a slower one with C++. I'd suggest modifying the parser to
|
Before we start modifying everything just because MSVC allocation is slow, let's check how the try-with-resources version performs. It should work well enough. |
Actually, no, C++ allocation isn't the bottleneck at all here. It's the deallocator registration which is slow. |
More than half the time seems to be spent by the garbage collector browsing through the doubly-linked list of phantom references. If that's the case, there might not be much we can do about this other than simply not rely on the GC at all. The JDK itself uses doubly-linked lists for its own use of phantom references: |
FYI, starting with JavaCPP 1.5.6, we can now skip all that overhead and get very low latency by setting the "org.bytedeco.javacpp.nopointergc" system property to "true", see tensorflow/java#313. |
I run a simple benchmark calling window API's
GetSystemTime
using JavaCpp's built-in windows API wrappers. This code allocates a struct, calls the native API and fetches some field from the struct:Profiling shows that the first line takes >90% of the overall execution time
I believe that there is some space for optimization here. The same thing implemented with Bridj or JNR outperforms JNI+JavaCpp just because of faster allocation, see the benchmark at https://github.com/zakgof/java-native-benchmark.
Say, with Bridj allocation takes <50% of the overall time:
The text was updated successfully, but these errors were encountered: