Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] 'libamdhip64.so' has too-recent versioned symbols for "manylinux_2014" and "manylinux_2_28" wheels builds #3620

Closed
fkouteib opened this issue Apr 9, 2024 · 3 comments · Fixed by #3646

Comments

@fkouteib
Copy link
Contributor

fkouteib commented Apr 9, 2024

Nightly wheels (4/08/24) are broken in tip of main branch with error

auditwheel: error: cannot repair "/tmp/cibuildwheel/built_wheel/triton_nightly-3.0.0.post20240409022326-cp37-cp37m-linux_x86_64.whl" to "manylinux2014_x86_64" ABI because of the presence of too-recent versioned symbols. You'll need to compile the wheel on an older toolchain.

I ran into this issue as well as part of my PR to upgrade wheels to many_linux_2_28. While debugging the issue, I found auditwheel-symbols which was more helpful in pinpointing the exact issue.

Here are the incompatible/offending symbols per manylinux version.

manylinux_2014 (current POR on Triton)

$ auditwheel-symbols -m 2014 triton-3.0.0-cp310-cp310-linux_x86_64.whl
triton/_C/libtriton.so is not manylinux_2_17(aka manylinux2014) compliant because it links the following forbidden libraries:
libm.so.6 offending symbols: exp@GLIBC_2.29, log@GLIBC_2.29, powf@GLIBC_2.27, pow@GLIBC_2.29, expf@GLIBC_2.27, exp2@GLIBC_2.29, log2f@GLIBC_2.27, exp2f@GLIBC_2.27, log2@GLIBC_2.29, logf@GLIBC_2.27
libc.so.6 offending symbols: stat@GLIBC_2.33, pthread_sigmask@GLIBC_2.32, lstat@GLIBC_2.33, pthread_join@GLIBC_2.34, dlerror@GLIBC_2.34, __pthread_key_create@GLIBC_2.34, pthread_rwlock_rdlock@GLIBC_2.34, dlopen@GLIBC_2.34, dlclose@GLIBC_2.34, pthread_create@GLIBC_2.34, dlsym@GLIBC_2.34, pthread_once@GLIBC_2.34, pthread_detach@GLIBC_2.34, pthread_attr_setstacksize@GLIBC_2.34, pthread_getname_np@GLIBC_2.34, pthread_setname_np@GLIBC_2.34, pthread_rwlock_unlock@GLIBC_2.34, dladdr@GLIBC_2.34, pthread_rwlock_wrlock@GLIBC_2.34
libstdc++.so.6 offending symbols: _ZNKSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv@GLIBCXX_3.4.21, _ZNSt28__atomic_futex_unsigned_base19_M_futex_notify_allEPj@GLIBCXX_3.4.21, _ZSt24__throw_out_of_range_fmtPKcz@GLIBCXX_3.4.20, _ZNSt19_Sp_make_shared_tag5_S_eqERKSt9type_info@GLIBCXX_3.4.26, ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1ERKS4@GLIBCXX_3.4.21, _ZNKSt3_V214error_category10equivalentEiRKSt15error_condition@GLIBCXX_3.4.21, _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE5rfindEPKcmm@GLIBCXX_3.4.21, _ZNSt13runtime_errorC1EPKc@GLIBCXX_3.4.21, _ZNSt13runtime_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@GLIBCXX_3.4.21, _ZNSt6thread6_StateD2Ev@GLIBCXX_3.4.22, _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm@GLIBCXX_3.4.21, _ZNKSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEE3strEv@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE14_M_replace_auxEmmmc@GLIBCXX_3.4.21, _ZNSt16invalid_argumentC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructEmc@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm@GLIBCXX_3.4.21, _ZNSt3_V216generic_categoryEv@GLIBCXX_3.4.21, _ZNSt3_V215system_categoryEv@GLIBCXX_3.4.21, _ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE@GLIBCXX_3.4.22, _ZNSt13runtime_errorC2EPKc@GLIBCXX_3.4.21, ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4@GLIBCXX_3.4.21, _ZNSt3_V214error_categoryD2Ev@GLIBCXX_3.4.21, _ZNKSt3_V214error_category10equivalentERKSt10error_codei@GLIBCXX_3.4.21, _ZNSt16invalid_argumentC1EPKc@GLIBCXX_3.4.21, _ZNKSt3_V214error_category23default_error_conditionEi@GLIBCXX_3.4.21, _ZNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEE7_M_syncEPcmm@GLIBCXX_3.4.21, _ZNSt13runtime_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@GLIBCXX_3.4.21, _ZSt28__throw_bad_array_new_lengthv@GLIBCXX_3.4.29, ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEOS4@GLIBCXX_3.4.21, _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE@GLIBCXX_3.4.30, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@GLIBCXX_3.4.21, _ZNKSt3_V214error_category10_M_messageB5cxx11Ei@GLIBCXX_3.4.21, _ZNSt28__atomic_futex_unsigned_base19_M_futex_wait_untilEPjjbNSt6chrono8durationIlSt5ratioILl1ELl1EEEENS2_IlS3_ILl1ELl1000000000EEEE@GLIBCXX_3.4.21, _ZNSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEED1Ev@GLIBCXX_3.4.21, _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc@GLIBCXX_3.4.21, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE8_M_eraseEmm@GLIBCXX_3.4.21, _ZNSt12domain_errorC1EPKc@GLIBCXX_3.4.21, _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6substrEmm@GLIBCXX_3.4.21, _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEcm@GLIBCXX_3.4.21
libstdc++.so.6 offending symbols: _ZdaPvm@CXXABI_1.3.9, _ZdlPvm@CXXABI_1.3.9, _ZdlPvSt11align_val_t@CXXABI_1.3.11, _ZdlPvmSt11align_val_t@CXXABI_1.3.11, _ZnwmSt11align_val_t@CXXABI_1.3.11, _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv@CXXABI_1.3.13, _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv@CXXABI_1.3.13
triton/backends/amd/lib/libamdhip64.so is not manylinux_2_17(aka manylinux2014) compliant because it links the following forbidden libraries:
libnuma.so.1
libhsa-runtime64.so.1
libamd_comgr.so.2
libm.so.6 offending versions: GLIBC_2.29
libstdc++.so.6 offending versions: GLIBCXX_3.4.20, GLIBCXX_3.4.21, GLIBCXX_3.4.26
libstdc++.so.6 offending versions: CXXABI_1.3.8, CXXABI_1.3.9
triton/backends/nvidia/bin/cuobjdump is manylinux_2_17(aka manylinux2014) compliant.
triton/backends/nvidia/bin/nvdisasm is manylinux_2_17(aka manylinux2014) compliant.
triton/backends/nvidia/bin/ptxas is manylinux_2_17(aka manylinux2014) compliant.

many_linux_2_28 (latest and upgrade target for Triton):

$ auditwheel-symbols -m 2_28 triton-3.0.0-cp310-cp310-linux_x86_64.whl
triton/_C/libtriton.so is not manylinux_2_28 compliant because it links the following forbidden libraries:
libstdc++.so.6 offending symbols: _ZNSt19_Sp_make_shared_tag5_S_eqERKSt9type_info@GLIBCXX_3.4.26, _ZSt28__throw_bad_array_new_lengthv@GLIBCXX_3.4.29, _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE@GLIBCXX_3.4.30
libstdc++.so.6 offending symbols: _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv@CXXABI_1.3.13, _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv@CXXABI_1.3.13
libc.so.6 offending symbols: stat@GLIBC_2.33, pthread_sigmask@GLIBC_2.32, lstat@GLIBC_2.33, pthread_join@GLIBC_2.34, dlerror@GLIBC_2.34, __pthread_key_create@GLIBC_2.34, pthread_rwlock_rdlock@GLIBC_2.34, dlopen@GLIBC_2.34, dlclose@GLIBC_2.34, pthread_create@GLIBC_2.34, dlsym@GLIBC_2.34, pthread_once@GLIBC_2.34, pthread_detach@GLIBC_2.34, pthread_attr_setstacksize@GLIBC_2.34, pthread_getname_np@GLIBC_2.34, pthread_setname_np@GLIBC_2.34, pthread_rwlock_unlock@GLIBC_2.34, dladdr@GLIBC_2.34, pthread_rwlock_wrlock@GLIBC_2.34
libm.so.6 offending symbols: exp@GLIBC_2.29, log@GLIBC_2.29, pow@GLIBC_2.29, exp2@GLIBC_2.29, log2@GLIBC_2.29
triton/backends/amd/lib/libamdhip64.so is not manylinux_2_28 compliant because it links the following forbidden libraries:
libm.so.6 offending versions: GLIBC_2.29
libamd_comgr.so.2
libhsa-runtime64.so.1
libstdc++.so.6 offending versions: GLIBCXX_3.4.26
libnuma.so.1
triton/backends/nvidia/bin/cuobjdump is manylinux_2_28 compliant.
triton/backends/nvidia/bin/nvdisasm is manylinux_2_28 compliant.
triton/backends/nvidia/bin/ptxas is manylinux_2_28 compliant.

manylinux_2_28 uses AlmaLinux 8 (RHEL 8 derivative). Its newest GLIBCXX is 3.4.25 and GLIC 2.28. By comparison, Ubuntu 22.04 supports GLIBCXX 3.4.30 and GLIBC 2.35. I assume this is why it's not blocking Triton devs' builds I don't know to fix this issue completely but it seems like 2_28 wheels upgrade would get closer to closing the gap.

@fkouteib
Copy link
Contributor Author

'libtriton.so' being flagged is a user error on my part since that binary is built every time and would be linked against the libraries in the container image. I ran the above against natively built wheel on Ubuntu 22.04. So it got flagged as well.

The 'libamdhip64.so' error is real since that's provided in the source repo by the AMD team. That's the root cause of the wheels failure in CI.

manylinux build repro script:


#!/bin/bash

export LATEST_DATE=$(TZ=UTC0 git show --quiet --date='format-local:%Y%m%d%H%M%S' --format="%cd")
export CIBW_ENVIRONMENT="MAX_JOBS=4
TRITON_WHEEL_NAME=triton-debug
TRITON_WHEEL_VERSION_SUFFIX=-$LATEST_DATE"
export CIBW_MANYLINUX_X86_64_IMAGE="quay.io/pypa/manylinux2014_x86_64:latest"
export CIBW_SKIP="cp{35,36}-*"
export CIBW_BUILD="cp38-manylinux_x86_64"

python3 -m cibuildwheel python --output-dir wheelhouse


To find the exact incompatible symbols, install auditwheel-symbols using pip, run 'auditwheel-symbols <wheel_file>'. I couldn't figure out how to get the built wheels from the container before they are deleted when the repair step fails. So I just did a native ubuntu build and ran the above command on it to see the incompatible symbols in the AMD library.

As for the fix, I propose that static binaries like these should be built in the many_linux container version intended for full manylinux wheel build to avoid this in the future. In this case, triton is currently using 'manylinux2014' but trying to move to 'manylinux_2_28'. So I am not sure if it makes sense to fix it for the former then upgrade the library again to the latter or just leap frog to the latter.

@jlebar can you please help loop in the right AMD contact to address this? Thank you.

@fkouteib fkouteib changed the title Nightly wheels broken [AMD] 'libamdhip64.so' has too-recent versioned symbols for "manylinux_2014" and "manylinux_2_28" wheels builds Apr 10, 2024
@jlebar
Copy link
Collaborator

jlebar commented Apr 10, 2024

cc @antiagainst ^^

@antiagainst
Copy link
Collaborator

Thanks for the report! Yeah, the HIP dependency issue is what I'm about to work on next. Will factor this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants