-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to use soft links instead of hard links #3308
Comments
I'm hoping @ilanschnell can supplement here with the full history of why hard links are ultimately preferred over all else. There's a lot of engineering and experience behind it. The short answer is that soft links are problematic when working with linked shared libraries. Hard links (or copies) more reliably keep the compiled-in relative paths intact. @ilanschnell, mind adding some more history? |
Hard links are preferred over anything else because they are the cheapest, ie. no new file objects (inodes) are created, only directory name entries. Softlinks involve creating new inodes, and are also slower, because the OS has to follow them. Also hard-links work well across all OSes, including Windows. Moreover, a hard-linked file is just like the original file. There is no difference between the original file and the hard-linked file. In fact, any file is a hard-link already (with reference count 1). This also means that unexpected behavior is avoided, e.g. some programs detect soft likes and treat them special. |
Thanks! unexpected behavior is worrisome, I think the cost is affordable (for us at least). We haven't run into issues using soft links for binaries, shared object libraries, and python directories linked through a directory on the PYTHONPATH (on linux) but haven't thought about cross platform issues. I would still log a request for the feature - conda is very flexible, it will make soft links if it can't do hard links, but let you turn them off if you want (and turn off hard links too I think) so this would add to the flexibility - but no worries if this doesn't make sense for conda. |
For now, I think user can use --copy to install all packages using copies instead of hard or soft-linking |
From the Conda ML, possibly from @natefoo, and related to the request for a I'm using Conda in CVMFS and am presented with the problem of rapidly growing space usage in the CVMFS repository (filesystem). This is in part because CVMFS does not support cross-directory hardlinks and so installing packages into environments results in multiple copies of the same package contents. CVMFS is an HTTP-based FUSE-implemented read-only filesystem designed for software distribution which stores the "master" copy of a repository in content-addressable storage (CAS) (i.e. hashed file chunks on a local filesystem) on a host called a stratum 0. Changes are propagated out from stratum 0 servers and ultimately to clients via HTTP, which mount it via FUSE. When you make changes to the repository on the stratum 0 (i.e. beginning a transaction), CVMFS first read-only mounts the CAS using the CVMFS FUSE client. It then uses AUFS to mount a writable filesystem that is unioned with the read-only mount. You make your changes and when done, publish the changes. The difference between the read-write set and the read-only set becomes a snapshot. This AUFS filesystem supports cross-directory hardlinks just fine. But when the changes are published, cross-directory hardlinks are broken and become multiple copies of the same file. Conda falls back to symlinks if hardlinks are impossible, but unfortunately, this doesn't work in the CVMFS case, since conda is able to create hardlinks at runtime and doesn't know they will be broken later. So, is it possible to force conda to use symlinks? The only config option I can see related to soft/hardlinks is to disable the symlink fallback. I need it to prefer symlinks to hardlinks, not the other way around. |
Thanks @ijstokes, that was indeed from me. |
I believe I've (re)discovered a serious limitation with the use of soft links, namely, they break conda's use of
The environment has all of the appropriate symlinks, e.g.: $ ls -lR conda-env
conda-env:
total 128
drwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 13:35 bin
drwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 11:32 conda-meta
drwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 11:32 include
drwxr-xr-x 3 galaxy galaxygp 32768 Oct 4 11:32 lib
conda-env/bin:
total 2976
...
lrwxrwxrwx 1 galaxy galaxygp 65 Oct 4 11:32 vcfintersect -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/vcfintersect
...
conda-env/lib:
total 608
lrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libgcc_s.so -> libgcc_s.so.1
lrwxrwxrwx 1 galaxy galaxygp 62 Oct 4 11:32 libgcc_s.so.1 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgcc_s.so.1
lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libgfortran.so -> libgfortran.so.3.0.0
lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libgfortran.so.3 -> libgfortran.so.3.0.0
lrwxrwxrwx 1 galaxy galaxygp 69 Oct 4 11:32 libgfortran.so.3.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgfortran.so.3.0.0
lrwxrwxrwx 1 galaxy galaxygp 16 Oct 4 11:32 libgomp.so -> libgomp.so.1.0.0
lrwxrwxrwx 1 galaxy galaxygp 16 Oct 4 11:32 libgomp.so.1 -> libgomp.so.1.0.0
lrwxrwxrwx 1 galaxy galaxygp 65 Oct 4 11:32 libgomp.so.1.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgomp.so.1.0.0
lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libquadmath.so -> libquadmath.so.0.0.0
lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libquadmath.so.0 -> libquadmath.so.0.0.0
lrwxrwxrwx 1 galaxy galaxygp 69 Oct 4 11:32 libquadmath.so.0.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libquadmath.so.0.0.0
lrwxrwxrwx 1 galaxy galaxygp 19 Oct 4 11:32 libstdc++.so -> libstdc++.so.6.0.21
lrwxrwxrwx 1 galaxy galaxygp 19 Oct 4 11:32 libstdc++.so.6 -> libstdc++.so.6.0.21
lrwxrwxrwx 1 galaxy galaxygp 68 Oct 4 11:32 libstdc++.so.6.0.21 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libstdc++.so.6.0.21
lrwxrwxrwx 1 galaxy galaxygp 53 Oct 4 11:32 libz.a -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/zlib-1.2.8-3/lib/libz.a
lrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libz.so -> libz.so.1.2.8
lrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libz.so.1 -> libz.so.1.2.8
lrwxrwxrwx 1 galaxy galaxygp 60 Oct 4 11:32 libz.so.1.2.8 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/zlib-1.2.8-3/lib/libz.so.1.2.8 However, attempting to use $ source activate /galaxy/main/jobs/123456/conda-env
discarding /cvmfs/main.galaxyproject.org/deps/_conda/bin from PATH
prepending /galaxy/main/jobs/123456/conda-env/bin to PATH
$ strace vcfintersect --help
execve("/galaxy-repl/main/jobdir/014/005/14005770/conda-env/bin/vcfintersect", ["vcfintersect", "--help"], [/* 24 vars */]) = 0
brk(0) = 0x1d32000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f0349f000
readlink("/proc/self/exe", "/cvmfs/main.galaxyproject.org/de"..., 4096) = 82
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/tls/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/tls/x86_64", 0x7fff5f0b77d0) = -1 ENOENT (No such file or directory)
open("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/tls/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/tls", 0x7fff5f0b77d0) = -1 ENOENT (No such file or directory)
open("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/x86_64", 0x7fff5f0b77d0) = -1 ENOENT (No such file or directory)
open("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/../lib", 0x7fff5f0b77d0) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=35290, ...}) = 0
mmap(NULL, 35290, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5f03496000
close(3) = 0
...
writev(2, [{"vcfintersect", 12}, {": ", 2}, {"error while loading shared libra"..., 36}, {": ", 2}, {"libgomp.so.1", 12}, {": ", 2}, {"cannot open shared object file", 30}, {": ", 2}, {"No such file or directory", 25}, {"\n", 1}], 10) = 124
exit_group(127) = ?
+++ exited with 127 +++
vcfintersect: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory This was discussed in #864, with the answer being i.e. " As a result, I suspect conda is not generally usable across filesystems for many packages unless the |
That’s interesting, I guess soft links as a conda feature is a bad idea. I think LD_LIBRARY_PATH takes precedence over RPATH, if you want to try to work around it by managing environment variables. From: Nate Coraor <notifications@github.commailto:notifications@github.com> I believe I've (re)discovered a serious limitation with the use of soft links, namely, they break conda's use of $ORIGIN RPATHs in executables. For example (different from the previous example where my entire conda install is in CVMFS), I have
The environment has all of the appropriate symlinks, e.g.: $ ls -lR conda-envconda-env:total 128drwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 13:35 bindrwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 11:32 conda-metadrwxr-xr-x 2 galaxy galaxygp 32768 Oct 4 11:32 includedrwxr-xr-x 3 galaxy galaxygp 32768 Oct 4 11:32 libconda-env/bin:total 2976...lrwxrwxrwx 1 galaxy galaxygp 65 Oct 4 11:32 vcfintersect -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/vcflib-1.0.0_rc1-0/bin/vcfintersect...conda-env/lib:total 608lrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libgcc_s.so -> libgcc_s.so.1lrwxrwxrwx 1 galaxy galaxygp 62 Oct 4 11:32 libgcc_s.so.1 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgcc_s.so.1lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libgfortran.so -> libgfortran.so.3.0.0lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libgfortran.so.3 -> libgfortran.so.3.0.0lrwxrwxrwx 1 galaxy galaxygp 69 Oct 4 11:32 libgfortran.so.3.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgfortran.so.3.0.0lrwxrwxrwx 1 galaxy galaxygp 16 Oct 4 11:32 libgomp.so -> libgomp.so.1.0.0lrwxrwxrwx 1 galaxy galaxygp 16 Oct 4 11:32 libgomp.so.1 -> libgomp.so.1.0.0lrwxrwxrwx 1 galaxy galaxygp 65 Oct 4 11:32 libgomp.so.1.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libgomp.so.1.0.0lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libquadmath.so -> libquadmath.so.0.0.0lrwxrwxrwx 1 galaxy galaxygp 20 Oct 4 11:32 libquadmath.so.0 -> libquadmath.so.0.0.0lrwxrwxrwx 1 galaxy galaxygp 69 Oct 4 11:32 libquadmath.so.0.0.0 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libquadmath.so.0.0.0lrwxrwxrwx 1 galaxy galaxygp 19 Oct 4 11:32 libstdc++.so -> libstdc++.so.6.0.21lrwxrwxrwx 1 galaxy galaxygp 19 Oct 4 11:32 libstdc++.so.6 -> libstdc++.so.6.0.21lrwxrwxrwx 1 galaxy galaxygp 68 Oct 4 11:32 libstdc++.so.6.0.21 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/libgcc-5.2.0-0/lib/libstdc++.so.6.0.21lrwxrwxrwx 1 galaxy galaxygp 53 Oct 4 11:32 libz.a -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/zlib-1.2.8-3/lib/libz.alrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libz.so -> libz.so.1.2.8lrwxrwxrwx 1 galaxy galaxygp 13 Oct 4 11:32 libz.so.1 -> libz.so.1.2.8lrwxrwxrwx 1 galaxy galaxygp 60 Oct 4 11:32 libz.so.1.2.8 -> /cvmfs/main.galaxyproject.org/deps/_conda/pkgs/zlib-1.2.8-3/lib/libz.so.1.2.8 However, attempting to use vcfintersect will fail: $ source activate /galaxy/main/jobs/123456/conda-envdiscarding /cvmfs/main.galaxyproject.org/deps/_conda/bin from PATHprepending /galaxy/main/jobs/123456/conda-env/bin to PATH This was discussed in #864#864, with the answer being i.e. "no_softlinks should be used more judiciously in package recipes." Unfortunately, the reality is that any executable that has dynamic library dependencies in conda is not symlink-suitable, and since symlinks really only make sense cross-filesystem, hard links are really the only suitable solution. As a result, I suspect conda is not generally usable across filesystems for many packages unless the --copy argument to conda create is used (which is, unfortunately, slow). A (e.g.) --softlink-except-executables (that would be the default if the package and env filesystem are not the same) would make sense for this case and be slightly faster than --copy. — |
Hi, Looking at this issue + #3373, I feel like having the following options would be ok for most use cases:
What do you think? |
This seems like a reasonable criterion to fix in conda-build. When we do the rpath replacement, shouldn't we then also be able to search for any symlinks and make them copies appropriately? Relative symlinks within a folder are probably fine, but anything cross-folder is a no-no, right? |
If I understand correctly the binary |
Pretty sure relative symlinks are generally fine even cross folder as long as they point to another file within the prefix that ultimately ends up in the conda package being built. Am I wrong? |
Nice! Looking at conda/conda-build/pull/1521 I think it should be ok for perl too, but I'll test it to confirm and let you know |
Wait, do I understand correctly that #1521 handles symlinks when building the packages? |
Ok, I understand better now, the perl problem is indeed different |
So conda/conda-build#1501 does not address the problem? That was a shot in the dark, more or less, so any feedback would be greatly appreciated. |
Sorry I haven't participated much in this thread - I'm not working with conda at the level of the code base where I can try out things in the master branch or evaluate. What I can say is that I have a beta version of centrally installed conda working here at LCLS/SLAC, things are pretty good - but it is awkward for users to create their own environments or clone centrally installed environments because each user gets a 3GB copy of the packages since they are on a different file system. If the soft link feature develops to the point where a user can add an option like --allow-soft-links when they clone a conda environment, or create an environment, or even better if I could put allow-soft-links option it in the centrally installed condarc so that users don't have to think about it, that would be neat. If the feature develops to the point where you can put it in a conda version in the conda-canary channel I could try it. Anyways, thanks for working on this issue! best, David Schneider From: Mike Sarahan [notifications@github.com] So conda/conda-build#1501conda/conda-build#1501 does not address the problem? That was a shot in the dark, more or less, so any feedback would be greatly appreciated. � |
David I think (hope) this part is done with the recent use-softlinks PR. There will be a canary release out soon if you want to test. Maybe a week or so.
|
Hi there, thank you for your contribution to Conda! This issue has been automatically locked since it has not had recent activity after it was closed. Please open a new issue if needed. |
conda prefers hard links to soft links, and has an option to dissallow soft links. I prefer soft links - with a ls I can see what is what, and as I run into weird problems, I trust them more, can debug them easier. Are there technical reasons why you can't use soft links in site-packages or other places that conda uses them? I understand performance may be a reason to prefer hard links.
The text was updated successfully, but these errors were encountered: