Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake: discover more libs, build shared #411

Closed
joelb123 opened this issue Feb 9, 2021 · 7 comments
Closed

cmake: discover more libs, build shared #411

joelb123 opened this issue Feb 9, 2021 · 7 comments

Comments

@joelb123
Copy link

joelb123 commented Feb 9, 2021

From the packaging POV, it's far better to do cmake discovery for zstd like you do for bz2 and zlib. I am told the Debian build does this already.

xxhash is also a really common dependency on many distros. The other libs do not seem to be common, but if you're adding discovery for two, might as well add them for more.

The other cmake issue is BUILD_SHARED_LIBS. Unless set to OFF, the build fails because of relocation problems against the cacode and alp libraries. Forcing the build to be static means the external dependencies have to be built static also. Most distros don't like static-only builds for reasons widely shared.

@martin-steinegger
Copy link
Member

We can compile MMseqs2 without bz2 and zlib but we need zstd and xxhash. Therefore we included it into the repository. It might be not optimal for shipping but it decreases the dependencies to compile the software.

@joelb123
Copy link
Author

I realize that those are both required libraries, and needed for many platforms.

The request is for adding something like USE_SYSTEM_ZSTD and USE_SYSTEM_XXHASH. The Debian patch for system zstd has the minor changes needed except the cmake ifdefs. Debian also uses system gzstream (Debian is an outlier in having that lib), but no patch for xxhash. xxhash is central to your algorithm and also latest-release in your repo, but it's in pretty active development with fairly major bug fixes and performance improvements in the last 6 months, so it's good to have a pinned version in the repo that can be overridden with system if desired.

Being able to build shared libs is pretty important, in either case.

@milot-mirdita
Copy link
Member

I am still not 100% sure why the shared zstd works in debian. We use the zstd function ZSTD_findDecompressedSize that should only be available if ZSTD_STATIC_LINKING_ONLY is set and therefore zstd is linked statically.
I just spent a few minutes trying to figure out how debian defeats this check, but didn't find where they do that. As we can't make sure that every native libzstd package does something similar, it's probably more appropriate to have distributions patch that themselves if they are confident it will work out fine. If ZSTD_findDecompressedSize becomes part of the stable zstd api, then we can provide as USE_SYSTEM_ZSTD option.

Regarding xxhash I am very uncomfortable to encourage anyone to provide their own library. xxhash is a central part of the linclust algorithm and we have extensive tests to make sure that linclust produces exactly the same results on every platform/distribution/architecture. I think scientific reproducibility outweighs other concerns for our software.

@joelb123
Copy link
Author

TL; DR: Enabling building against shared system libraries gives better modularity and a better end-user experience.

The way that the Debian patch works is that on distros such as Debian (and Fedora and Gentoo) the system zstd library can optionally be built/packaged/installed both static and shared. Packages that can be built shared get built shared and will get marked for updating when the shared library updates to an incompatible version. Packages like MMseqs that will only build static require the static package/flag and should, in principle, get updated any time the static library gets updated. The latter mechanism is not as reliable and requires more rebuilds, which is why distros prefer that executables can be built with shared libraries. This request I'm making to you now is part of the acceptance QC for that the distro gateway person will check off. If upstream (you) says that building shared is a WONTFIX, then I'll probably just put up my hands and move on.

I understand your point about xxhash being pinned to a tested version. But it's a better practice to have the package pinned to a specific, internal version by default, but enable packagers to use the system version at their own risk. Why? Because you can't test and optimize the library and compiler flags for all the architectures out there and also because fixes and optimizations show up all the time. A highly-optimized distro such as Clear Linux may have optimizations and bug fixes that neither you nor a packager like me has access to at your release time. This is especially true for unusual architectures (e.g., AVX512) and for common-but-new architectures (e.g., Zen3 or M1), and especially for widely-used libraries like those for hashing and compression that make heavy use of SIMD optimizations.

Putting on my computational biologist hat for a moment, when one is doing a long-running project where reproducibility is key, I'll put the binaries in a container with all dependencies pinned. Development and short projects always lean heavily on the distro.

As for bugs, you can always strongly suggest that bug reports and issues be filed against your own project-compiled binaries and test against those binaries first when tracking bugs.

@joelb123
Copy link
Author

I'm guessing this comment is the reason you only supported static linking:

We use ZSTD_findDecompressedSize which is only available with ZSTD_STATIC_LINKING_ONLY

This hasn't been true since zstd 1.4.7 when it went from the experimental features into supported features.

@milot-mirdita
Copy link
Member

I see that this function is still defined still inside the static linking only block:
Latest release: https://github.com/facebook/zstd/blob/v1.4.8/lib/zstd.h#L1282
Current git HEAD: https://github.com/facebook/zstd/blob/f39178b445f96b456881268d29779964633f527d/lib/zstd.h#L1297

But I added a cmake flag to use the (static) system provided libzstd. Hope it helps.

@joelb123
Copy link
Author

Oops, I was wrong and you were right, I misread the zstd docs. readelf shows that ZSTD_findDecompressedSize gets built into libzstd.so, but the header won't define it unless built ZSTD_STATIC_LINKING_ONLY is defined. It's a policy matter because the method signature may change in the future.

Yes, the system libzstd does help. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants