-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/R/etc/ldpaths: No such file or directory #67
Comments
This is exactly when the file needs to be updated. Can you not mount your envs on another file system? NFS is often problematic. My only suggestion would be to try to minimize these time windows. You could copy the original Please feel free to submit a PR. Personally I cannot reproduce your error since I do not use NFS. |
…r concurrent access Try to fix conda-forge#67
The point is that The best solution will be to have a An easy but yet incomplete solution can be to modify the following patch such as the files are only replaced when the new file is different. Here a proposition that is haven't tested: |
It is not so simple at all, in fact it's impossible. There is no such thing as a 'file that is proper to each conda env' because we deliberately support a range of Javas both from conda packages, from the system package manager and 3rd party. I have gone to extensive lengths to make this work. You've replaced a race condition around ldpaths with one around ldpaths.new, an improvement but my suggestion is still better. You can get the best of both worlds by replacing .new in your patch with .new.$$ Please work on a PR if you want this to work on nfs. |
@pbordron it is indeed important that conda packages work on NFS. If I can help you with the PR, please let me know. I won't have the time to work on it on my own, but I would be happy to help in any way. Thanks a lot for pointing this out! |
From now, I do not have the time nor the environment for looking for a fix. |
Hello,
began once I created a new environment. I also use a cluster with CentOS_6 and CentOS_7 nodes, all under LSF. Is there a any other way to fix to this issue other than the patch mentioned above ? Any other ideas on how to fix the issue ? Many thanks in advance, base environment
second environment
conda install info
|
@RodrigoGM the patch is not tested and replace a race condition with another. |
- Rebase patches - Try to partially fix conda-forge#67
- Rebase patches - Update MIKTEX_VER to 2.9.6942 - Try to partially fix conda-forge#67
Hi,
I have to reinstall
But I don't know how many time it will be up. So as suggested by @pbordron (we know each other), I edited the bash script
It should be harmless in the context of Galaxy since we never stack env. 🤞 (FYI: @fgiacomoni) |
- Rebase patches - Update MIKTEX_VER to 2.9.6942 - Try to partially fix conda-forge#67
I also had this issue, and solved this in a tricky way. In my other conda env, there was another I may face this problem again while change conda envs, but at least I could return work ASAP. (I already backup Looking forward to someone who can give the perfect solution. |
I ran into this issue with Snakemake. I use it with |
Hey I had the exact same problem (running snakelike in parallel) and lecorguille solution helped. Inside the currently used conda directory (should be in the error log or just look for the most recent folder) inside From R CMD javareconf > /dev/null 2>&1 || true to # R CMD javareconf > /dev/null 2>&1 || true Best Regards |
@Finesim97 well, I've just commented lines from 416-432 in env/lib/R/bin/javareconf (the ones that are responsible for updating files) and everything works great. It is partially equal to your solution. Thank you! But what's the point of updating @mingwandroid said that activation is
But as it can be seen, users just comment out this step altogether. |
If you want to query why the R foundation made this choice you should ask them.
That you do not need this doesn't mean others do not. We need a solution that works correctly in all cases (and it's not difficult either). Reconfiguring Java for each env is not going to be undone for it is the correct thing to do. If you convince R upstream to remove javareconf form R altogether (say moving it into rJava itself) then that would also work, however I don't know enough though too say this is a good idea or even possible. |
I am experimenting the same issue... I have not tested your solution, mine was to remove the environment to force it to be reinstalled. I will try yours when I have time. Maybe something for the snakemake devloppers to look at? @johanneskoester ? |
So @mingwandroid, it boils down to having something like an environment-wide lock that is acquired before the activate script, ensuring that there are no two activate scripts executed at the same time, right? I would think that this could simply be implemented inside |
Yes. Making the lock as short lived as possible would be ideal |
Great. Is this something that somebody on your side wants to do, or would you rather like me to provide a PR? If it should be me, can you give me a pointer where that would be in the code base? Sorry for asking, I don't want to bother you more than necessary, but my time is very limited these days. |
Mine is also limited. Ping @msarahan |
@msarahan, what do you think? |
I took only a quick look at this. IIUC, the activation script is just to give some users the convenience of having My general advice on (de-)activation scripts:
Getting back to the the case at hand:
I have no idea how Those are disruptive changes, of course. I won't expect 1. to happen. However, 2. is a sane compromise in my book. But since I don't really care much, I won't argue or push for that change. Adding locks to Conda environments during activation is not an option for me. Having Imagine if the I'll open a pull request to address "8.1. avoid unnecessary writes" from my recommendation list above. r-base-feedstock/recipe/0017-Foribly-remove-then-forcibly-mv-in-javareconf.in.patch Line 20 in 9934a07
|
IMHO while we have no have Java ecosystem to speak of, allowing different ones per env is important, no necessary. We can do better around our locking here but I don't think we should throw the baby out with the bath water. These are decisions for R upstream. I didn't write R CMD javareconf or come up with the idea, but I believe per env activation is right. Unless you can carefully articulated why not then we're at an impasse on that point. Please try our R with a range of Javas. I went to extreme lengths to maintain good compatibility for our users because we have nothing much for them ourselves. I'm not so busy on R these days FWIW but I don't think this is the right track to take. |
I disagree with your point that activation scripts shouldn't write files beyond accepting they shouldn't exist when not necessary. Writing env local files is appropriate but we should lock things better. At the end of the day R provides this facility and I thought it there to take advantage of, that this was most appropriate place (think people experimenting with different Javas with R) and we should be as dynamic and friendly to all the Javas as possible and I believe using it is entirely reasonable (if bugged currently .. be aware we have a few openjdks too, conda-forge's coming from Azul still I think and ADs coming from Jetbrains, and I want to support system and legacy Javas). I'm just trying to broaden our appeal, I don't use Java at all and R little (ironically usually when trying to fix issues with rJava, yes, largely to do with trying to be broadly compatible (and not surprise or burden our users - which Java should be active? How do I do that?). |
I don't disagree with the "per env" part, just the "activation" one. My stance is just that reconfiguration during every environment activation is excessive and can lead to concurrency issues during writes (i.e., this very issue). I'd rather have those changes made on demand, which means when
Glad to hear, I'll just take your word on this ;).
I'll leave this to others since I'm with you with "I don't use Java at all and R little" -- for me it is rather "I don't use R at all and Java little" ;).
The thing is just that I believe users likely won't expect environment activations to do much special things, esp. not things that might fail. Plus, authors of activation scripts have to be aware of possible concurrency issues and handle them accordingly.
IMO, only in exceptional circumstances for activation scripts. And due to the "exceptional" part, I'd rather not have Overall my opinion is just
and hence I think for now we should just try to reduce the concurrent write situation here and, if someone is willing to dig into it (i.e., not me), add some proper file locking in |
On gitter people are expecting to be able to operate on the same conda env from multiple processes at the same time and that's what is happening here. I'd rather we fixed it higher level. In no way should an env been mutable by two conda at once. I'd this bug helps us to shake out issues there then that's a win. But yes, happy for wanting reasonable here too, just want to be flexible and am very concerned about UX and compat. |
Moving the conversation from gitter over here. My points were:
I would therefore argue that If things specific to the local host really really really need to be altered, the respective configuration would have to be placed in a temporary directory unique to the activation session. We'd also have to accept that it would not be reliably cleaned away in all cases. Using locking to protect from things changing underneath running processes (which would lead to undefined results) would require acquiring the lock in In the case at hand, I don't see why host local JVMs need to be supported at all. IMO, the root package requiring the javareconf files should require a JVM installed into the environment and handle the ldpath setup in In summary, my $0.02 based on experience parallelizing C/C++ and Python apps is that |
If environment mutation is locked which I think it should be and am not sure if that's a bug or a missing feature, then that would include activation (because we've always allowed env mutation, and this problem can happen with anything, env vars for example). If we fix all that then this is automatically fixed. Though @mbargull's PR will help in the meantime. |
I would really like if activation were assumed to be "const." I agree with Ray, though, that we have the existing expectations to contend with. Perhaps with conda/conda#8727 we can have a "safe" way of activation: if any arbitrary shell scripts are present, it is considered unsafe (mutation may happen), and must be locked. If no unsafe scripts are present, though, conda will not require locking for activation, and perhaps can go faster. |
@epruesse, it absolutely would not. |
I was wondering, would it be possible to give an update re (the resolution of) this issue? Thank you very much in advance |
I hit this problem when hundreds of processes tried to activated a single read only conda environment and spawned hundreds of Java processes, which ended up consuming system resources. I have a suggestion for this situation which follows up on @mbargull 's second point:
And it is least disruptive to the current setup, because it doesn't change the the status quo. In the activation script, introduce an env var like |
Fix conda-forge#67 Should work also over NFS unless the system is over 15 years old.
Dear all, I also run into this issue frequently when processing a lot of files in parallel using snakemake with--use-conda (on an NFS share). I see that there is a fix: kpalin@9eda35b. But since this issue is still open, does that mean it is not in any release yet? My apologies for my ignorance, I'm not able to determine if that commit is in any release... By the way, setting my conda envs as read only would be an option for me (actually I'd prefer it, and keep their management restricted to 1 or 2 accounts only). |
* Seems to be affected by conda-forge/r-base-feedstock#67 (comment)
Issue:
I get the following error when using some conda R environment with galaxy or snakemake in HPC infrastructure:
After tracking the issue, it appears that activating the conda environment with R runs the command
R CMD javareconf
that updates the file$CONDA_ENV/lib/R/etc/ldpaths
.I get this issue in two different cluster infrastructures:
In both case, conda envs are stored on NFSv3 or BeeGFS shares mounted on computation nodes.
Concurrent activations like it can append during a galaxy school or with some snakemake wokflows with network latency lead to two cases:
$CONDA_PREFIX/lib/R/etc/ldpaths
doesn't exists during a short gap of time and another activation appends during this gap producing this error for some jobs.$CONDA_PREFIX/lib/R/etc/ldpaths
doesn't exists anymore. One way to solve the issue is to reinstall the r-base package.IMHO, I think a way to solve this issue is to replace the file only when needed, not at each environment activation.
Regards
Philippe
Environment (
conda list
):Details about
conda
and system (conda info
):The text was updated successfully, but these errors were encountered: