-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] Tests are not being linked properly, fail with 'symbol lookup error' #408
Comments
This is probably an ABI issue, are you putting the python package pytorch libs in your LD_LIBRARY_PATH? Try using the libtorch distribution downloaded with bazel. Usually I do something like this:
|
You could also throw in the compile flag |
Yes I was using lib torch from the installed package. Pointed to downloaded distribution. It starts up, now, but crashes later. Does that ring any bells? Active CUDA on my box is 11.2 so I had to add libnvrtc from 11.1 to the path - probably that did not go well: Running main() from gmock_main.cc
unknown file: Failure |
Hmm, yeah with PyTorch 1.7.1, try using CUDA 11.0 libraries, you could also try an NGC container that has PyTorch built with 11.2. |
@narendasan : I have tried build and test in container based on 21.02 - same result. I am using local cudnn and tensorrt. I think we need to make sure that fairly common configuration case works. |
oh i didnt realize you are running the test suite. Did you download the models for the tests? You can download them by running the |
No I did not - please do mention it in readme :) |
Yeah I guess we only mention it here https://github.com/NVIDIA/TRTorch/blob/master/tests/modules/README.md but I'll add a note to the testing README. The timeout issue for elementwise should be fixed in master, you just need to set the testing timeout to moderate like we do here,https://github.com/NVIDIA/TRTorch/blob/d6a3c4561e62d7806b9190c935672ffeaf93e58d/tests/core/conversion/converters/converter_test.bzl#L15. We probably need to start breaking up that file |
Bug Description
To Reproduce
Steps to reproduce the behavior:
You will see all the tests fail. I am using stock 1.7.1 PyTorch.
boris@snikolaev-DGXStation:
/git/TRTorch$ /home/boris/.cache/bazel/_bazel_boris/c6ee020343103959b26b654eb14e89ac/execroot/TRTorch/bazel-out/k8-dbg/bin/tests/core/conversion/converters/test_linear.runfiles/TRTorch/tests/core/conversion/converters/test_linear/git/TRTorch$ nm /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so | grep _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE/home/boris/.cache/bazel/_bazel_boris/c6ee020343103959b26b654eb14e89ac/execroot/TRTorch/bazel-out/k8-dbg/bin/tests/core/conversion/converters/test_linear.runfiles/TRTorch/tests/core/conversion/converters/test_linear: symbol lookup error: /home/boris/.cache/bazel/_bazel_boris/c6ee020343103959b26b654eb14e89ac/execroot/TRTorch/bazel-out/k8-dbg/bin/tests/core/conversion/converters/../../../../_solib_k8/libcore_Sutil_Slibtrt_Uutil.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
boris@snikolaev-DGXStation:
boris@snikolaev-DGXStation:~/git/TRTorch$ nm /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so | grep SourceLocation
000000000004f130 T _ZN3c1014WarningHandler7processERKNS_14SourceLocationERKSsb
0000000000051870 T _ZN3c105ErrorC1ENS_14SourceLocationESs
0000000000051870 T _ZN3c105ErrorC2ENS_14SourceLocationESs
000000000004f210 T _ZN3c107Warning4warnENS_14SourceLocationERKSsb
00000000000527c0 t _ZN3c10lsERSoRKNS_14SourceLocationE
Expected behavior
Tests run (or at least start up) successfully.
Environment
conda
,pip
,libtorch
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: