Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building from source on OSX without Cuda fails #35478

Closed
D0miH opened this issue Mar 26, 2020 · 13 comments
Closed

Building from source on OSX without Cuda fails #35478

D0miH opened this issue Mar 26, 2020 · 13 comments
Labels
module: build Build system issues module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@D0miH
Copy link

D0miH commented Mar 26, 2020

🐛 Bug

I am trying to build PyTorch from source without Cuda on OSX. I am currently on version 10.15 of MacOS and followed the steps at https://github.com/pytorch/pytorch#from-source to build it from source. However, I am not sure whether this is a bug in PyTorch or the problem is on Apple's side.

To Reproduce

Steps to reproduce the behavior:

  1. Create a fresh conda environment
  2. Follow the manual on how to build from source over at https://github.com/pytorch/pytorch#from-source
  3. Use the following commands to start the build process:
    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}  
    MACOSX_DEPLOYMENT_TARGET=10.15 CC=clang CXX=clang++ USE_CUDA=0 python setup.py install
    

Expected behavior

PyTorch can be built and is installed.

Environment

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A

OS: Mac OSX 10.15.3
GCC version: Could not collect
CMake version: version 3.14.0

Python version: 3.7
Is CUDA available: N/A
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.18.1
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl-include 2020.0 166
[conda] mkl-service 2.3.0 py37hfbe908c_0
[conda] mkl_fft 1.0.15 py37h5e564d8_0
[conda] mkl_random 1.1.0 py37ha771720_0

Additional context

While retrying multiple times I noticed that the build always fails at the SobolEngineOps file.
Error during the build process:

[29/339] Building CXX object caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o
FAILED: caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++  -DAT_PARALLEL_OPENMP=1 -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -D_FILE_OFFSET_BITS=64 -Dtorch_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -I../c10/.. -Ithird_party/ideep/mkl-dnn/include -I../third_party/ideep/mkl-dnn/src/../include -I../third_party/QNNPACK/include -I../third_party/pthreadpool/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/QNNPACK/deps/clog/include -I../third_party/NNPACK/include -I../third_party/cpuinfo/include -I../third_party/fbgemm/include -I../third_party/fbgemm -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/FP16/include -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ~/anaconda3/envs/pytorch/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party -isystem ../cmake/../third_party/eigen -isystem ~/anaconda3/envs/pytorch/include/python3.7m -isystem ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /opt/rocm/hip/include -isystem /include -isystem ../third_party/ideep/mkl-dnn/include -isystem ../third_party/ideep/include -isystem include -Wno-deprecated -fvisibility-inlines-hidden -Wno-deprecated-declarations -Xpreprocessor -fopenmp -I/usr/local/include -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.15 -fPIC   -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -DASMJIT_STATIC -std=gnu++11 -MD -MT caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o -MF caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o.d -o caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o -c ../aten/src/ATen/native/SobolEngineOps.cpp
clang: error: unable to execute command: Segmentation fault: 11
clang: error: clang frontend command failed due to signal (use -v to see invocation)
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to http://developer.apple.com/bugreporter/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/SobolEngineOps-175bae.cpp
clang: note: diagnostic msg: /var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/SobolEngineOps-175bae.sh
clang: note: diagnostic msg: /var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/SobolEngineOps-175bae.crash
clang: note: diagnostic msg: 

********************
[40/339] Building CXX object caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/Unique.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 755, in <module>
    build_deps()
  File "setup.py", line 316, in build_deps
    cmake=cmake)
  File "~/Desktop/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "~/Desktop/pytorch/tools/setup_helpers/cmake.py", line 335, in build
    self.run(build_args, my_env)
  File "~/Desktop/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "~/anaconda3/envs/pytorch/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '12']' returned non-zero exit status 1.

cc @ezyang @gchanan @zou3519

@mruberry mruberry added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module high priority labels Mar 27, 2020
@mruberry mruberry added the module: build Build system issues label Mar 27, 2020
@CaoZhongZ
Copy link
Contributor

CaoZhongZ commented Mar 27, 2020

Same. I think it may be the problem of Apple compiler. 🤔️

Since brew clang-9 is not compatible with sdk10.15 (no idea why), protobuf compiler can't be compiled with it. A workable solution is to download llvm 10.0.0 release. (deactivate quarantine by xattr -r -d)

Then compile PyTorch with llvm-10 like:

CC=$llvm10/bin/clang CXX=$llvm10/bin/clang++ CFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk" LIBRARY_PATH="$llvm10/lib" LDFLAGS="-L$llvm10/lib -Wl,-rpath,$llvm10/lib" python -m pip install --user -ve .

I'm installing PyTorch in develop mode, feel free to adjust the installation mode, a little perk is you also get OpenMP work on MacOS.

@mruberry
Copy link
Collaborator

@D0miH Does @CaoZhongZ's response resolve your problem?

@D0miH
Copy link
Author

D0miH commented Mar 30, 2020

Hey guys,

thanks for the quick response and sorry for the late reply. So I downloaded the latest llvm 10.0.0 release, unzipped it and moved the resulting folder (here called llvm in the following) into the pytorch repository. Then I deactivated quarantine using xattr -r -d com.apple.quarantine llvm/.
Then I run the following command:

CC=llvm/bin/clang CXX=llvm/bin/clang++ CFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk" LIBRARY_PATH="llvm/lib" LDFLAGS="-L./llvm/lib -Wl,-rpath,./llvm/lib" python -m pip install --user -v .

However I get the following error message after a few seconds:

User install by explicit request
Created temporary directory: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-ephem-wheel-cache-w61ef7vv
Created temporary directory: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan
Initialized build tracking at /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan
Created build tracker: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan
Entered build tracker: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan
Created temporary directory: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-install-vqpqt3x6
Processing ~/Desktop/pytorch
  Created temporary directory: /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi
  Added file://~/Desktop/pytorch to build tracker '/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan'
    Running setup.py (path:/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/setup.py) egg_info for package from file://~/Desktop/pytorch
    Running command python setup.py egg_info
    CMake Error at ~/anaconda3/share/cmake-3.14/Modules/CMakeDetermineCXXCompiler.cmake:48 (message):
      Could not find compiler set in environment variable CXX:

      llvm/bin/clang++.

    Call Stack (most recent call first):
      CMakeLists.txt:23 (project)


    CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
    CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
    -- Configuring incomplete, errors occurred!
    See also "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/setup.py", line 734, in <module>
        build_deps()
      File "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/setup.py", line 316, in build_deps
        cmake=cmake)
      File "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/tools/build_pytorch_libs.py", line 59, in build_caffe2
        rerun_cmake)
      File "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/tools/setup_helpers/cmake.py", line 323, in generate
        self.run(args, env=my_env)
      File "/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/tools/setup_helpers/cmake.py", line 141, in run
        check_call(command, cwd=self.build_dir, env=env)
      File "~/anaconda3/lib/python3.7/subprocess.py", line 347, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '-GNinja', '-DBUILD_PYTHON=True', '-DBUILD_TEST=True', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/torch', '-DCMAKE_PREFIX_PATH=~/anaconda3/lib/python3.7/site-packages', '-DNUMPY_INCLUDE_DIR=~/anaconda3/lib/python3.7/site-packages/numpy/core/include', '-DPYTHON_EXECUTABLE=~/anaconda3/bin/python', '-DPYTHON_INCLUDE_DIR=~/anaconda3/include/python3.7m', '-DPYTHON_LIBRARY=~/anaconda3/lib/libpython3.7m.dylib', '-DTORCH_BUILD_VERSION=1.6.0a0+340048b', '-DUSE_NUMPY=True', '/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi']' returned non-zero exit status 1.
    Building wheel torch-1.6.0a0+340048b
    -- Building version 1.6.0a0+340048b
    cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi/torch -DCMAKE_PREFIX_PATH=~/anaconda3/lib/python3.7/site-packages -DNUMPY_INCLUDE_DIR=~/anaconda3/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=~/anaconda3/bin/python -DPYTHON_INCLUDE_DIR=~/anaconda3/include/python3.7m -DPYTHON_LIBRARY=~/anaconda3/lib/libpython3.7m.dylib -DTORCH_BUILD_VERSION=1.6.0a0+340048b -DUSE_NUMPY=True /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi
Cleaning up...
  Removing source in /private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-build-x_fl1jdi
Removed file://~/Desktop/pytorch from build tracker '/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan'
Removed build tracker: '/private/var/folders/cx/trr2xjp56fs_rq2yzlb8xpb40000gn/T/pip-req-tracker-wf9kblan'
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Exception information:
Traceback (most recent call last):
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 186, in _main
    status = self.run(options, args)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 331, in run
    resolver.resolve(requirement_set)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/legacy_resolve.py", line 177, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for
    abstract_dist = self.preparer.prepare_linked_requirement(req)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 516, in prepare_linked_requirement
    req, self.req_tracker, self.finder, self.build_isolation,
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 95, in _get_prepared_distribution
    abstract_dist.prepare_distribution_metadata(finder, build_isolation)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/distributions/sdist.py", line 40, in prepare_distribution_metadata
    self.req.prepare_metadata()
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 564, in prepare_metadata
    self.metadata_directory = self._generate_metadata()
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 544, in _generate_metadata
    details=self.name or "from {}".format(self.link)
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/operations/build/metadata_legacy.py", line 118, in generate_metadata
    command_desc='python setup.py egg_info',
  File "~/anaconda3/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 242, in call_subprocess
    raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

What is strange is that the llvm/bin/clang++ binary is in the folder and I double checked that the path is correct.

What am I doing wrong?

@CaoZhongZ
Copy link
Contributor

I think you'd better specify an absolute path for your clang/clang++ as well as all the -L, -rpath stuff. $llvm10 in my script is a shell variable that'll expand to an absolute path.

@adamjstewart
Copy link
Contributor

I've been facing the same issue for the last few days. PyTorch 1.4.0 compiled just fine on macOS 10.15.3 with Clang 11.0.0, but with Clang 11.0.3 I get the exact same seg fault as you. I just reported this issue to the Apple Developers, so I'll let you know if I get any feedback.

@D0miH
Copy link
Author

D0miH commented Mar 31, 2020

I think you'd better specify an absolute path for your clang/clang++ as well as all the -L, -rpath stuff. $llvm10 in my script is a shell variable that'll expand to an absolute path.

That did the trick! Thank you very much @CaoZhongZ.

I've been facing the same issue for the last few days. PyTorch 1.4.0 compiled just fine on macOS 10.15.3 with Clang 11.0.0, but with Clang 11.0.3 I get the exact same seg fault as you. I just reported this issue to the Apple Developers, so I'll let you know if I get any feedback.

Great! I hope Apple will find out why this is happening and will fix it.

@mruberry
Copy link
Collaborator

mruberry commented Apr 3, 2020

Glad to hear a solution was found!

@mruberry mruberry closed this as completed Apr 3, 2020
@adamjstewart
Copy link
Contributor

More of a workaround than a solution... The workaround is not to use the system compiler and instead build your own compiler. You will likely have dozens of other macOS users who stumble on this same problem until Apple Clang is fixed.

P.S. Still haven't heard anything back from Apple.

@adamjstewart
Copy link
Contributor

For the record, I just tried PyTorch 1.5.0 on macOS 10.15.4 with Xcode 11.4.1 and clang-1103.0.32.59 but the build crashes with the same segfault.

@adamjstewart
Copy link
Contributor

Still tracking this down. It looks like one difference between Apple Clang 11.0.3 and 11.0.0 is that 11.0.3 is based off of LLVM 9.0.0, while 11.0.0 is based off of LLVM 8.0.0: https://en.wikipedia.org/wiki/Xcode#Toolchain_versions. Thanks to @ax3l for pointing this out.

So it's possible that this is a LLVM 9.0.0 bug that may or may not have already been fixed and we just need to apply it for Apple Clang 11.0.3 as well.

@adamjstewart
Copy link
Contributor

I believe this bug is a duplicate of #36676 and #30584, which should be fixed by #37086. I'm going to try applying this patch to the latest release and see if I can get things building.

@ax3l
Copy link

ax3l commented May 5, 2020

Accordingly to a similar issue in OpenBLAS, it is possible that this is LLVM 9.0.0 / AppleClang 11.0.3 only and already fixed in later versions: OpenMathLib/OpenBLAS#2329 . Maybe one should check this again for the specific case here.

@adamjstewart
Copy link
Contributor

I can confirm that #37086 fixes this issue, thanks @malfet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants