Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only look for libcufile.so.0 #203

Merged
merged 2 commits into from
Apr 26, 2023

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Apr 25, 2023

Rather than searching for a list of possible library names for libcufile, only look for the single ABI-compatible version of libcufile that we support. This avoids both ABI incompatibilities and loading stub libraries. It morally does what #127 attempted, but with the correct versioning suffix.

Closes #200.

Rather than searching for a list of possible library names for
libcufile, only look for the single ABI-compatible version of
libcufile that we support. This avoids both ABI incompatibilities and
loading stub libraries. It morally does what rapidsai#127 attempted, but with
the correct versioning suffix.
@wence- wence- added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Apr 25, 2023
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.

@madsbk
Copy link
Member

madsbk commented Apr 25, 2023

Is libcufile.so.0 always there?
I seem to remember installations where libcufile.so.1 or libcufile.so was the only one installed?
E.g. looking for libcufile.so.1 exclusively did work when #127 was merged.

@madsbk
Copy link
Member

madsbk commented Apr 25, 2023

CUDA v11.5 has:

mkristensen@mkristensen-workstation:~/apps/cuda-11.5.2/lib64$ ls -l libcufile*
lrwxrwxrwx 1 mkristensen mkristensen       23 Aug  2  2022 libcufile_rdma.so -> libcufile_rdma.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen    39128 Nov  4  2021 libcufile_rdma.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen    61778 Nov  4  2021 libcufile_rdma_static.a
lrwxrwxrwx 1 mkristensen mkristensen       18 Aug  2  2022 libcufile.so -> libcufile.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen  1293192 Nov  4  2021 libcufile.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen 62822740 Nov  4  2021 libcufile_static.a

@wence-
Copy link
Contributor Author

wence- commented Apr 25, 2023

CUDA v11.5 has:

mkristensen@mkristensen-workstation:~/apps/cuda-11.5.2/lib64$ ls -l libcufile*
lrwxrwxrwx 1 mkristensen mkristensen       23 Aug  2  2022 libcufile_rdma.so -> libcufile_rdma.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen    39128 Nov  4  2021 libcufile_rdma.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen    61778 Nov  4  2021 libcufile_rdma_static.a
lrwxrwxrwx 1 mkristensen mkristensen       18 Aug  2  2022 libcufile.so -> libcufile.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen  1293192 Nov  4  2021 libcufile.so.1.1.1
-rw-r--r-- 1 mkristensen mkristensen 62822740 Nov  4  2021 libcufile_static.a

Oh, ugh. Although, where do you see that? In the nvidia/cuda:11.5.1-runtime-ubuntu20.04 docker image I see:

root@1e1fde448191:/# ls -l /usr/local/cuda/lib64/libcufile*
lrwxrwxrwx 1 root root      18 Dec 14 22:20 /usr/local/cuda/lib64/libcufile.so.0 -> libcufile.so.1.1.1
-rw-r--r-- 1 root root 1293192 Nov  3  2021 /usr/local/cuda/lib64/libcufile.so.1.1.1
lrwxrwxrwx 1 root root      23 Dec 14 22:20 /usr/local/cuda/lib64/libcufile_rdma.so.1 -> libcufile_rdma.so.1.1.1
-rw-r--r-- 1 root root   39128 Nov  3  2021 /usr/local/cuda/lib64/libcufile_rdma.so.1.1.1

@wence-
Copy link
Contributor Author

wence- commented Apr 25, 2023

Is libcufile.so.0 always there?

In all the official docker images from 11.5 up, yes.

I seem to remember installations where libcufile.so.1 or libcufile.so was the only one installed?
E.g. looking for libcufile.so.1 exclusively did work when #127 was merged.

I think someone must have played around with those installations to do that manually.

@madsbk
Copy link
Member

madsbk commented Apr 25, 2023

@bdice
Copy link
Contributor

bdice commented Apr 25, 2023

I can confirm the Ubuntu 20.04 Debian package linked above only contains libcufile.so.1.1.1 and libcufile.so (a symlink to libcufile.so.1.1.1).

@wence-
Copy link
Contributor Author

wence- commented Apr 25, 2023

I claim that package is broken and should never have shipped in that state, but I realise that is not a useful point of view :(.

@wence-
Copy link
Contributor Author

wence- commented Apr 25, 2023

I think then, given that actually the ABI has never changed we could just use libcufile.so.0 and then encode the particular exact library versions as well that we know work.

Prior to 11.7.1 despite being ABI compatible, libcufile packages do not ship with a `.so.0` symlink. To handle this, just explicitly enumerate all the compatible versions we know about.

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@wence-
Copy link
Contributor Author

wence- commented Apr 26, 2023

OK, I think this is good for another look @madsbk.

@jakirkham jakirkham requested a review from madsbk April 26, 2023 09:36
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks all!

@madsbk
Copy link
Member

madsbk commented Apr 26, 2023

/merge

@rapids-bot rapids-bot bot merged commit 31cb5c1 into rapidsai:branch-23.06 Apr 26, 2023
@wence- wence- deleted the wence/fix/issue-200 branch April 26, 2023 12:41
@jakirkham
Copy link
Member

Thanks all! 🙏

rapids-bot bot pushed a commit to rapidsai/cucim that referenced this pull request Apr 27, 2023
Fixes the `dlopen` logic around `libcufile` to load either the SOVERSION library or the fully versioned library. This includes logic to handle the missing SOVERSION library on older CUDA versions. Drops loading the stub library, which typically only shows up in development environments.

<hr>

Fixes #504

Borrows from PR ( rapidsai/cudf#13210 ) and PR ( rapidsai/kvikio#203 ). Also adapts this logic to cuCIM ( rapidsai/kvikio#141 )

Authors:
  - https://github.com/jakirkham
  - Gregory Lee (https://github.com/grlee77)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Gregory Lee (https://github.com/grlee77)
  - Gigon Bae (https://github.com/gigony)

URL: #548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KvikIO: Need to canonicalize dlopen'd library names
4 participants