-
Notifications
You must be signed in to change notification settings - Fork 122
Conversation
@tucek can you please update the README.md of jni with latest introduction of how you installed everything? Most parts there are very outdated. The Debian 9+8 instructions can be safely removed. When I try to compile this branch I get
Probably some wrongly-used cmake debugging message you forgot to remove? On Debian 10 with:
I can also reproduce the segfault. Running with valgrind, the segfault does not happen: the test can be executed successfully. @tucek can you please try different Java versions (e.g. the one of Oracle) on different OSs to see which systems are affected and for which it works (if any)? If we have a problem in our code it may be related to how we execute JNI_CreateJavaVM¹ or in line 120 or 309 (NewObject), as valgrind is pointing there. @kodebach can you take a look about that, please? It would be a great help. Ideally even fixing the TODO in line 201. Would be great if we can give the students a smooth experience when using the JNI plugin! Note for myself, I started valgrind with: |
I do not have time to fully diagnose this problem, but here are my findings from a quick investigation:
Summary @markus2330 For the future, we should probably reconsider #3631 (separate ref counters for KeySet -> Key and Other -> Key, references) and maybe think about even more changes to the reference counting. For example, maybe [*] When I run |
PS. @tucek If you need to debug this further and would prefer a graphical interface for the debugger over GDB, you can use this VSCode config (assumes your build folder is called {
"name": "testmod_jni",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/build/bin/testmod_jni",
"args": [
"${workspaceFolder}/src/plugins/jni"
],
"stopAtEntry": false,
"cwd": "${workspaceFolder}/build/bin",
"environment": [
{
"name": "LD_LIBRARY_PATH",
"value": "${workspaceFolder}/build/lib"
}
],
"externalConsole": false,
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
} |
Another note: I just saw that the |
I intentionally had not yet removed this debug message. (Since this is still a draft PR...)
I've tested Ubuntu 20.04 and Debian 10 with openjdk-11-jdk and get identical results. The only difference I found was, that i had to set JAVA_HOME on Debian 10 for the FindJNI cmake module to actually find the JNI header file. Are you sure plugin jni was successfully included by cmake and
Since i did not get different results between the two configurations above (in contrary to your report), I do not really see the indication for doing a (quite time consuming) extensive test with different java distribution / os combinations. Nonetheless i've tested the following additional combinations:
All yielded the same result. |
Yes. I've set JAVA_HOME, I am not sure if it was needed, though. Please update the README.md with such information if you know about it.
Only with valgrind, otherwise I got a double free error (with termination of the process).
Which two configurations above?
You also get segfaults when running with valgrind?
Which is? Btw. in previous segfaults of JNI it was usually related to the Java version. If we would find any combination that works for CM, it would already help a lot. This is why I asked for the investigation. Seems like this is not the case this time, though. |
I am pretty sure this issue is entirely independent of Java Version and OS. AFAICT everything is working as intended (i.e. as written in code). We're just telling the JVM to call There is a very similar problem, if you write a plugin in C++ and do something like this: int elektraMyPluginGet (ckdb::Plugin * handle, ckdb::KeySet * returned, ckdb::Key * parentKey)
{
kdb::Key parent(parentKey);
// ...
} At the end of Later on, at some point after The easiest solution for the C++ plugin is to not use int elektraMyPluginGet (ckdb::Plugin * handle, ckdb::KeySet * returned, ckdb::Key * parentKey)
{
keyIncRef (parentKey);
{
kdb::Key parent (parentKey);
// here we can use the parent
// ...
}
keyDecRef (parentKey);
} Something similar probably needs to happen in |
Yes, it looks so. I was only surprised as nearly nothing was changed in the JNI code (within one year basically only a small fix: f630de9) and the previous version was working on (at that time) recent Java versions (it did segfault on old Java versions, though).
This might explain it, maybe older JVMs never collected the objects.
Yes, it would be fine for me. I wonder where you see C++ code? src/plugins/jni/jni.c is pure C?
This discussion seems to be OT here, let us discuss this with the other changes of #3693. (When we have time for this, now we have an urgent fix and a release to do.) |
will do...
With all 5 combinations i've tested before and one new one:
i've got the double free when running I hope this clears it up a little bit. |
I've tried to get rid of the double free by incrementing the reference count of the keys in jni.c, but did not get anywhere... @markus2330
|
A temporary workaround would be to remove the |
Last CM it was working. A git-bisect would be needed to find the last working commit. But maybe the code never worked with recent Javas.
We will discuss this tomorrow.
As you already debugged and found a plausible reason, please let us try this first. It is more important to have a tested&reliable solution than to have something very quick. |
There is no C++ code. It was just an example, because the problem is a bit more obvious with C++ (and easily found with
It seems @tucek already tried with
Maybe the attempt was not thorough enough, but I already suspected that a similar problem might occur with If
The tests have been disabled for at least 4 years now. The double-free may have existed but no code actually triggered a segfault. Technically, although unlikely, a double-free can happen without a segfault and Java Plugins would definitely not be run with
If the issue is indeed what I described then it is entirely independent of Java Language or JVM Version. In fact the C++ example shows that it will happen in any language, where the binding automatically calls However, in Java the issue somewhat depends on the Garbage Collector implementation. The GC does change between some Java Versions, but AFAIK it was never entirely deterministic. So whether or not you actually get a segfault is probably somewhat random. TL;DR if simply adding |
I have done some more investigations:
[*] It seems that 1) the JVM never guaranteed that |
@kodebach Thank you for the thorough analysis! I was also wondering why release method is being called directly. but when i am understanding correctly:
Before removing release in jni plugin, while finalize never being called:
After removing release in jni plugin, while finalize never being called:
it is not freed at all While KeySet is being freed correctly by libelektra/src/libs/elektra/plugin.c Line 313 in d9a8478
Now, if we fix the Key finalization, we would get the same problem again. Ad key finalization:
|
Yes, I missed that removing the
|
Very good! The last peace missing would be to provide an alternative constructor signature for suppressing the finalization of the Key / KeySet. Then we could leave the finalization code as is for now and i would create a ticket to deal with it in the future. @markus2330 If this solution is fine with you, i would update my branch accordingly. |
First and foremost the people using JNI and the CI needs to be happy with the solution. So yes, please create a PR and let us see if the CI is happy. Obviously it would be better to not create further issues and have a solution that fixes the problem without leftovers. Maybe you or @kodebach find a clean&nice way? But if this is not possible now, we'll have to live with one more open issue. |
I think the proposed solution fixed the problem in a relatively clean way. The additional issue i would want to create is not actually directly related to this issue, but to the fact that the finalize mechanism has been deprecated several Java versions ago and we might want to be prepared before it finally gets removed. |
I agree this is an additional issue but still it would be nice to have this fixed within the release, too. But you are right: let us first fix the important&urgent problems like segfaults, and then let us reevaluate. |
@markus2330 means we never really supported Java 8 and therefore we can require Java 9+ and update the clean up procedures right away with this PR. |
PR will also fix #3772 |
* Introduced Optional return values for KeySet.lookup* and Key.getMeta * JNA minimum Java version increased from 8 to 9 * replaced KeySet and Key finalize() by using a Cleaner * Improved Java doc * updated jni.c fixing double free (also required updates to JNA binding) * misc JNA clean-up and improvements * TODO address empty key allocation issue * TODO remove debug output * TODO add contribution notes * TODO update jni plugin README
This reverts commit 1960b03. Revert "retrigger checks" This reverts commit 560cd3c. Revert "applied style" This reverts commit edb1a41. Revert "ElektraInitiative#3825 removed DeleteLocalRef in the hope of change for the better ;)" This reverts commit dcb4c41.
The testmod_jni error does also occur if the build directory is deleted after installing.
|
I'm trying to find the dependency to the build dir using strace... |
testmod_jni wants to access build/src/bindings/jna/libelektra/build/libs/libelektra-0.9.5-all.jar because it is injected into the code at build time in https://github.com/tucek/libelektra/blob/315a909579f5036c39544c65e6caa51cd5bd61fb/src/plugins/jni/testmod_jni.h.in#L12 We should find another way to set the classpath for the test, so it works with installed jna bindings and the build local version... |
I don't think there is currently an easy way to pass such a parameter to the test. There is a mechanism for plugins that need extra executables to work (e.g. I think it is totally fine to just exclude the test in the installed version. Either via some CMake setup, or just by checking the |
@kodebach what about first checking for the build version and then falling back to the install version and failing verbosely when neither has been found? |
This is also just a work around and not a permanent solution. Take this scenario for example:
A similar scenario is possible, if we first check for the installed version. The only permanent solution is, if the installed version always uses the installed JAR and the build folder version always uses the build folder JAR. If you really want to invest the time, look into how libelektra/tests/cframework/tests.c Lines 281 to 289 in 3a3b7af
However, I have to stress again, I do not think it is worth your time right now and disabling the tests in installed versions (probably easiest by just not installing the |
… added sesnible erromessage instead of segfault
…added sensible error message instead of segfault, when jar is not found
The jni / package step problem is hereby resolved. |
The last thing i would like to try is enabling the Cleaner again, while adding a locking so that KeySets and Keys are not being cleaned up in parallel. |
…t key sets from being released in parallel as well as prevent keys from being release in parallel to key sets
CI fails again because of some links being temporary unavailable... IMHO checks for broken links should only be executed manually or maybe automatically by the release pipeline... |
But then we do not know when a PR introduces a broken link until the release day, which is suboptimal. The way it is now we immediately see broken links and can fix or temporarily whitelist them, even if it is not our own fault. |
That's correct, but IMHO the CI build should not depend on all referenced websites being up to succeed. But it's not my place to prioritize such issues for this project. |
A good compromise would probably to set the link checker, such that it only checks links in files that have been modified by the PR. That way we can catch broken links before they are added, but don't annoy people with websites that are temporarily offline. But such a setup is more complicated than the current one and since the Link Check is a separate build job, we can always just ignore a failure and merge anyway. |
Agreed.
Good suggestions are always welcome. ❤️
I also think the simplest is just to ignore it, but be aware of what is going on. Better approaches are definitely welcome but currently probably not worth our time. |
From my perspective, this PR is rdy to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job! 💖 Thanks to everyone participating! 💞
Now lets do further testing, especially also with a more real-world plugin (H3) and kdb mount
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thank you so much! 🚀
Basics
These points need to be fulfilled for every PR:
(added as entry in
doc/news/_preparation_next_release.md
whichcontains
_(my name)_
)Please always add something to the release notes.
(first line should have
module: short statement
syntax)close #X
, are in the commit messages.doc/news/_preparation_next_release.md
scripts/dev/reformat-all
If you have any troubles fulfilling these criteria, please write
about the trouble as comment in the PR. We will help you.
But we cannot accept PRs that do not fulfill the basics.
Checklist
Check relevant points but please do not remove entries.
For docu fixes, spell checking, and similar none of these points below
need to be checked.
(not in the PR description)
Review
Reviewers will usually check the following:
Labels
If you are already Elektra developer:
say that everything is ready to be merged.