Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for openssl3 #2

Merged
merged 13 commits into from
Dec 21, 2022
Merged

Rebuild for openssl3 #2

merged 13 commits into from
Dec 21, 2022

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Dec 10, 2022

Let's see why the migration bot is complaining...

Ah... windows+CUDA has been failing from the very start, and not fixed yet (see #1) - disabling it for now.

@h-vetinari h-vetinari requested a review from hadim as a code owner December 10, 2022 22:27
@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@hadim
Copy link
Member

hadim commented Dec 10, 2022

Ah... windows+CUDA has been failing from the very start, and not fixed yet (see #1) - disabling it for now.

yes indeed. I just gave up on this... Let me know if you have an idea.

@h-vetinari
Copy link
Member Author

Does this package actually need openssl?

The link check doesn't seem to think so:

# linux
WARNING (katago): run-exports library package conda-forge::openssl-1.1.1s-h0b41bf4_1 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)

# win
WARNING (katago): run-exports library package conda-forge::openssl-1.1.1s-hcfcfb64_1 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)

#osx
WARNING (katago): run-exports library package conda-forge::openssl-1.1.1s-hfd90126_1 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)

@hadim
Copy link
Member

hadim commented Dec 10, 2022

I haven't tried without but I know it has some networking features (distributed training) so it might be required.

@h-vetinari
Copy link
Member Author

I haven't tried without but I know it has some networking features (distributed training) so it might be required.

Seems to pass fine without openssl. I'm presuming the library doesn't implement networking itself - maybe there's a dependency missing (like requests)?

@pkgw
Copy link
Contributor

pkgw commented Dec 13, 2022

It looks like the distributed stuff is only built if the BUILD_DISTRIBUTED CMake flag is activated, which it is not currently. If it's turned on, OpenSSL should be required:

https://github.com/lightvector/KataGo/blob/3c42da50ec0e1dd7192ea384e9c643981b946d5e/cpp/CMakeLists.txt#L360-L367

@h-vetinari
Copy link
Member Author

Thanks for digging @pkgw! Do you want us to try to build the networking bits @hadim, or how do we proceed?

@hadim
Copy link
Member

hadim commented Dec 14, 2022

Thanks for digging @pkgw! Do you want us to try to build the networking bits @hadim, or how do we proceed?

Yes please. I don't use the distributed mode but other users might want to.

@h-vetinari
Copy link
Member Author

NO_GIT_REVISION is set, but BUILD_DISTRIBUTED is also set and distributed requires git revision, so ignoring NO_GIT_REVISION.
[...]
[  0%] Generating program/gitinfoupdated.h
fatal: not a git repository (or any of the parent directories): .git

@h-vetinari
Copy link
Member Author

So the clang docs say that "compiler-rt contains an implementation of an atomics library", but I cannot find libatomic.{a,dylib} anywhere in our llvm artefacts, and I have no idea which one of these libs (if any) is supposed to fill in for -latomic.

The problem is that we cannot really patch the upstream source without breaking the ability to use distributed builds, because katago makes it abundantly clear that only vanilla, unpatched versions may share artefacts, and if we apply a patch we get a dirty git tag.

I cannot currently see how to square this circle, maybe @isuruf @xhochy have some experience with -latomic on llvm?

PS: Upstream code in question is here; we could propose a fix that actually works with (our) clang and wait for another release. In the meantime I think we should just accept the dirty tag on osx.

@pkgw
Copy link
Contributor

pkgw commented Dec 21, 2022

@h-vetinari I'm not able to test this right now, but maybe we could get away with symlinking libclang_rt.builtins_$ARCH_osx.a to $PREFIX/lib/libatomic.a? Or, if I'm correctly understanding the docs that you linked, we might be able to create some kind of empty libatomic.a just so that the linker can find something.

@h-vetinari
Copy link
Member Author

So you definitely got past the linker error, well done!

But now conda complains that the symlink you created may not be possible to resolve later.

Packaging katago
INFO:conda_build.build:Packaging katago
Packaging katago-1.11.0-cpu_hfce7730_0
INFO:conda_build.build:Packaging katago-1.11.0-cpu_hfce7730_0
number of files: 22
Error: lib/libatomic.a is a symlink to a path that may not exist after the build is completed (/Users/runner/miniforge3/conda-bld/katago_1671600604722/_build_env/lib/clang/14.0.6/lib/libclang_rt.builtins_x86_64_osx.a)
##[error]Bash exited with code '1'.

I think it may be enough to just delete the symlink after the build to make this pass.

@h-vetinari
Copy link
Member Author

So, looks like this worked! :)

From the output of katago CLI, I don't see any particular distributed tests we could be running:


---Testing/debugging subcommands-------------
evalsgf : Utility/debug tool, analyze a single position of a game from an SGF file.

runtests : Test important board algorithms and datastructures
runnnlayertests : Test a few subcomponents of the current neural net backend

runnnontinyboardtest : Run neural net on a tiny board and dump result to stdout
runnnsymmetriestest : Run neural net on a hardcoded rectangle board and dump symmetries result
runownershiptests : Run neural net search on some hardcoded positions and print avg ownership

runoutputtests : Run a bunch of things and dump details to stdout
runsearchtests : Run a bunch of things using a neural net and dump details to stdout
runsearchtestsv3 : Run a bunch more things using a neural net and dump details to stdout
runsearchtestsv8 : Run a bunch more things using a neural net and dump details to stdout
runsearchtestsv9 : Run a bunch more things using a neural net and dump details to stdout
runselfplayinittests : Run some tests involving selfplay training init using a neural net and dump details to stdout
runsekitrainwritetests : Run some tests involving seki train output

@hadim, anything else you can think of for testing distributed support here? I checked upstream and didn't find much (except one that says "you should never need this", and some not very involved mentions in cpp/tests)

I also noticed that there are python bindings upstream. I guess those could/should be packaged as well...? Not a thing for this PR though.

@h-vetinari
Copy link
Member Author

@hadim, anything else you can think of for testing distributed support here? I checked upstream and didn't find much (except one that says "you should never need this", and some not very involved mentions in cpp/tests)

OK, I added what little tests I found that somehow mentioned distributed.

From my side, this PR is about as good as it's going to get now... 🙃

@hadim
Copy link
Member

hadim commented Dec 21, 2022

Thanks a lot @h-vetinari . Not much more to say on my side. Maybe I could try the windows/cuda build again given the changes in that PR.

@hadim hadim merged commit be45632 into conda-forge:main Dec 21, 2022
@h-vetinari h-vetinari deleted the openssl3 branch December 21, 2022 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants