You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenSSL 1.x reaches end-of-life in September, and recent distros like Ubuntu 22.04+ (last year) and Debian 12+ (next month) ship only OpenSSL 3.
I have gloo (inside PyTorch) working with OpenSSL 3.x as far as I can tell everything works fine. The APIs it uses are both API- and ABI-compatible between 1.1 and 3.x. (This is important because PyTorch configures gloo with USE_TCP_OPENSSL_LOAD, i.e., it dlopens the library instead of compiling against it.) But there are a few things to adjust:
In gloo: Look for openssl when building tls #306 cmake does find_package(OpenSSL 1.1 REQUIRED EXACT), which fails out on 3.0. Something like find_package(OpenSSL 1.1...<4.0 REQUIRED) would be better. Alternatively, perhaps this shouldn't be invoked at all in the USE_TCP_OPENSSL_LOAD case, since OpenSSL isn't needed at build time then?
gloo/transport/tcp/tls/openssl.cc attempts to dlopen libssl.so, if present, else libssl.so.1.1. The first library is only available if the development package for OpenSSL is installed. And the development package can be any version (3.x, 4.x, etc.) It's probably safer to make this libssl.so.1.1 + libssl.so.3 (all 3.x uses the same soname).
If a PR is helpful I can do the CLA dance but hopefully this is simple enough that the more interesting thing is agreeing on what the change is.
The text was updated successfully, but these errors were encountered:
Seems pretty reasonable to me. That said I'm probably not the exact right person for review. Maybe consider opening a linked issue in Pytorch code based and tag it with oncall: distributed to make sure this gets properly reviewed?
OpenSSL 1.x reaches end-of-life in September, and recent distros like Ubuntu 22.04+ (last year) and Debian 12+ (next month) ship only OpenSSL 3.
I have gloo (inside PyTorch) working with OpenSSL 3.x as far as I can tell everything works fine. The APIs it uses are both API- and ABI-compatible between 1.1 and 3.x. (This is important because PyTorch configures gloo with
USE_TCP_OPENSSL_LOAD
, i.e., it dlopens the library instead of compiling against it.) But there are a few things to adjust:find_package(OpenSSL 1.1 REQUIRED EXACT)
, which fails out on 3.0. Something likefind_package(OpenSSL 1.1...<4.0 REQUIRED)
would be better. Alternatively, perhaps this shouldn't be invoked at all in theUSE_TCP_OPENSSL_LOAD
case, since OpenSSL isn't needed at build time then?gloo/transport/tcp/tls/openssl.cc
attempts to dlopenlibssl.so
, if present, elselibssl.so.1.1
. The first library is only available if the development package for OpenSSL is installed. And the development package can be any version (3.x, 4.x, etc.) It's probably safer to make thislibssl.so.1.1
+libssl.so.3
(all 3.x uses the same soname).If a PR is helpful I can do the CLA dance but hopefully this is simple enough that the more interesting thing is agreeing on what the change is.
The text was updated successfully, but these errors were encountered: