-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
up to 2.0.1 #172
up to 2.0.1 #172
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge-admin, please rerender |
…nda-forge-pinning 2023.06.16.18.10.30
GCC 10 builds fail due to OneDNN: /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1686968646633/work/torch/csrc/jit/codegen/onednn/graph_helper.h:3:10: fatal error: oneapi/dnnl/dnnl_graph.hpp: No such file or directory
3 | #include <oneapi/dnnl/dnnl_graph.hpp>
| OneDNN does not seem to be listed in the requirements for build: https://github.com/conda-forge/pytorch-cpu-feedstock/blob/f2474772cb070bd3131bbb349e0d6f856c79747a/recipe/meta.yaml export CXXFLAGS+='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull'
export CFLAGS+='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' I wanna help, but I do not really know how a feedstock works -.- |
Thanks, I will follow up with your suggested fixes soon. We all learn the conda-forge ways by watching 😉 |
Hi! This is the friendly automated conda-forge-webservice. I tried to rerender for you, but it looks like there was nothing to do. This message was generated by GitHub actions workflow run https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/5315469498. |
recipe/meta.yaml
Outdated
@@ -70,6 +70,7 @@ outputs: | |||
- cudnn # [cuda_compiler_version != "None"] | |||
- nccl # [cuda_compiler_version != "None"] | |||
- magma # [cuda_compiler_version != "None"] | |||
- onednn # [cuda_compiler_version == "11.2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this for run too? Unsure, but we will assess later
We still have the onednn issue. Let me think how we address this...... |
Inspecting the first lines of the failed builds: ## Package Plan ##
environment location: /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1687227140059/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh
The following NEW packages will be INSTALLED:
_libgcc_mutex: 0.1-conda_forge conda-forge
_openmp_mutex: 4.5-2_kmp_llvm conda-forge
brotli: 1.0.9-h166bdaf_8 conda-forge
brotli-bin: 1.0.9-h166bdaf_8 conda-forge
bzip2: 1.0.8-h7f98852_4 conda-forge
ca-certificates: 2023.5.7-hbcca054_0 conda-forge
certifi: 2023.5.7-pyhd8ed1ab_0 conda-forge
charset-normalizer: 3.1.0-pyhd8ed1ab_0 conda-forge
cuda-version: 11.8-h70ddcb2_2 conda-forge
cudatoolkit: 11.8.0-h37601d7_11 conda-forge
cudnn: 8.8.0.121-h0800d71_1 conda-forge
future: 0.18.3-pyhd8ed1ab_0 conda-forge
icu: 72.1-hcb278e6_0 conda-forge
idna: 3.4-pyhd8ed1ab_0 conda-forge
ld_impl_linux-64: 2.40-h41732ed_0 conda-forge
libblas: 3.9.0-16_linux64_mkl conda-forge
libbrotlicommon: 1.0.9-h166bdaf_8 conda-forge
libbrotlidec: 1.0.9-h166bdaf_8 conda-forge
libbrotlienc: 1.0.9-h166bdaf_8 conda-forge
libcblas: 3.9.0-16_linux64_mkl conda-forge
libffi: 3.4.2-h7f98852_5 conda-forge
libgcc-ng: 13.1.0-he5830b7_0 conda-forge
libgomp: 13.1.0-he5830b7_0 conda-forge
libhwloc: 2.9.1-nocuda_h7313eea_6 conda-forge
libiconv: 1.17-h166bdaf_0 conda-forge
liblapack: 3.9.0-16_linux64_mkl conda-forge
libmagma: 2.7.1-hc72dce7_3 conda-forge
libmagma_sparse: 2.7.1-hc72dce7_4 conda-forge
libnsl: 2.0.0-h7f98852_0 conda-forge
libprotobuf: 3.21.12-h3eb15da_0 conda-forge
libsqlite: 3.42.0-h2797004_0 conda-forge
libstdcxx-ng: 13.1.0-hfd8a6a1_0 conda-forge
libuuid: 2.38.1-h0b41bf4_0 conda-forge
libuv: 1.44.2-h166bdaf_0 conda-forge
libxml2: 2.11.4-h0d562d8_0 conda-forge
libzlib: 1.2.13-hd590300_5 conda-forge
llvm-openmp: 16.0.6-h4dfa4b3_0 conda-forge
magma: 2.7.1-ha770c72_4 conda-forge
mkl: 2022.2.1-h84fe81f_16997 conda-forge
mkl-devel: 2022.2.1-ha770c72_16998 conda-forge
mkl-include: 2022.2.1-h84fe81f_16997 conda-forge
nccl: 2.18.3.1-h12f7317_0 conda-forge
ncurses: 6.4-hcb278e6_0 conda-forge
numpy: 1.21.6-py310h45f3432_0 conda-forge
openssl: 3.1.1-hd590300_1 conda-forge
pip: 23.1.2-pyhd8ed1ab_0 conda-forge
pkg-config: 0.29.2-h36c2ea0_1008 conda-forge
pysocks: 1.7.1-pyha2e5f31_6 conda-forge
python: 3.10.11-he550d4f_0_cpython conda-forge
python_abi: 3.10-3_cp310 conda-forge
pyyaml: 6.0-py310h5764c6d_5 conda-forge
readline: 8.2-h8228510_1 conda-forge
requests: 2.31.0-pyhd8ed1ab_0 conda-forge
setuptools: 67.7.2-pyhd8ed1ab_0 conda-forge
six: 1.16.0-pyh6c4a22f_0 conda-forge
sleef: 3.5.1-h9b69904_2 conda-forge
tbb: 2021.9.0-hf52228f_0 conda-forge
tk: 8.6.12-h27826a3_0 conda-forge
typing: 3.10.0.0-pyhd8ed1ab_0 conda-forge
typing_extensions: 4.6.3-pyha770c72_0 conda-forge
tzdata: 2023c-h71feb2d_0 conda-forge
urllib3: 2.0.3-pyhd8ed1ab_0 conda-forge
wheel: 0.40.0-pyhd8ed1ab_0 conda-forge
xz: 5.2.6-h166bdaf_0 conda-forge
yaml: 0.2.5-h7f98852_2 conda-forge
zstd: 1.5.2-h3eb15da_6 conda-forge OneDNN is not there |
I see you added mkl_include to meta.yml, but this is a different package with different headers and names. |
I tried adding onednn in 48b2813 and it didn’t work. I got the mkl-include idea from looking at how pytorch builds their own conda package (no onednn, but they use mkl-include) |
What was the error when you added onednn? I cannot see the log. |
My plan was to add onednn again if mkl-include didn't work (i.e., adding one piece at a time) |
…nda-forge-pinning 2023.06.19.09.23.46
The compiler finds oneDNN, but eventually it fails with this error:
It looks to me like a version mismatch between oneDNN and pytorch. Honestly no idea how to solve it -.- These official tips are 4 years old, but its the best I could find. https://github.com/pytorch/pytorch/blame/main/CONTRIBUTING.md#c-development-tips EDIT: |
That was gonna be the next question --- why didn't we encounter this problem before? I am not sure what else is changing, but in our end, it seems nothing changed since the last build. Is there anything that could be related in here: pytorch/pytorch@v2.0.0...v2.0.1 |
We can either:
Do you have a preference? |
As far as I can tell there is nothing in that diff messing with the building process. |
I would use the older DNN |
OTOH the CPU builds work, and I believe it is due to this USE_MKLDNN here: pytorch-cpu-feedstock/recipe/build_pytorch.sh Lines 128 to 139 in 6b2d2b8
This branch only runs when cuda compiler is None, so maybe that makes it so oneDNN is not even needed. |
Smart! Let’s add MKLDNN to the cuda ones too. I will edit the PR with this first, then the older onednn if it doesn’t work |
Btw, do you know if they added support for grace hopper gpus yet? If so, we may need to edit the compute list we have ( |
Don’t hold your breath! The cuda builds won’t finish in 6 hours. What we are hoping for here is that they time out… |
Well damn… they’re finishing in time. Could you test one of the gpu artifacts to see if they work fine locally? Let me know if you don’t know how to do that |
@conda-forge/pytorch-cpu @hmaarrfk @h-vetinari @Tobias-Fischer, since when are we able to build cuda112 on the CI here??? This can't be right! Could you please have a look? Would you like me to incorporate the cuda12 migration in here too? |
@RaulPPelaez I will add you as a coauthor to commits in this PR if you don't mind. Please object if you don't want me to do that. |
I should have checked the logs more carefully:
|
The local build looks fine to me. The docker container appears to provide CUDA in /usr/local/cuda. I do not see that message about "CUDA not found". export CMAKE_TOOLCHAIN_FILE="${RECIPE_DIR}/cross-linux.cmake" |
I don’t know how to fix the problem, but we should check whether we can catch this somehow in a test case. It’s problematic that the build still passes. |
In my local build, adding the CMAKE_TOOLCHAIN_FILE makes CMake not pick CUDA up. By removing it the error goes away. Not sure how to go about catching this particular mistake. |
What's the status of this? Confusingly for me, I see |
The CUDA builds are giving us a hard time. For some reason CMake is not finding CUDA correctly and we ran out of ideas. Check out the channels in your env, in a base env I only see up to 2.0.0 in conda-forge: $ mamba search pytorch
...
pytorch 1.13.1 cuda112py39hb0b7ed5_200 conda-forge
pytorch 2.0.0 cpu_py310hd11e9c7_0 conda-forge
pytorch 2.0.0 cpu_py311h410fd25_0 conda-forge
pytorch 2.0.0 cpu_py38h019455c_0 conda-forge
pytorch 2.0.0 cpu_py39he4d1dc0_0 conda-forge
pytorch 2.0.0 cuda112py310he33e0d6_200 conda-forge
pytorch 2.0.0 cuda112py311h13fee9e_200 conda-forge
pytorch 2.0.0 cuda112py38h5e67e12_200 conda-forge
pytorch 2.0.0 cuda112py39ha9981d0_200 conda-forge Note that the pytorch channel offers 2.0.1. |
@conda-forge-admin, please rerender |
…nda-forge-pinning 2023.07.22.19.35.58
Ah yeah I'm actually seeing it in
Yeah it is definitely available there and I assume most people install it from that channel. But for the purposes of the conda-forge feedstock I like to make sure everything is available before updating. Those that need the latest can always use |
By looking here: modified recipe/build_pytorch.sh
@@ -125,7 +125,7 @@ if [[ ${cuda_compiler_version} != "None" ]]; then
export USE_STATIC_CUDNN=0
export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
export MAGMA_HOME="${PREFIX}"
- export USE_MKLDNN=1
+ export USE_MKLDNN=0
else
if [[ "$target_platform" == *-64 ]]; then
export BLAS="MKL"
modified recipe/meta.yaml
@@ -79,8 +79,8 @@ outputs:
- requests
- future
- six
- - mkl-devel {{ mkl }} # [x86]
- - mkl-include {{ mkl }} # [x86]
+ - mkl==2021.4.0 # [x86]
+ - mkl-include==2021.4.0 # [x86]
- libcblas * *_mkl # [x86]
- libcblas # [not x86]
- liblapack # [not x86] The test then fails with:
|
See #176 for updates |
#176 was closed because the author didn't have enough time to "debug so much for a patch release." |
At this point I think that going directly for 2.1.0 https://github.com/pytorch/pytorch/releases/tag/v2.1.0 Alas the problems I had compiling this one are probably still there. Looking at the build instructions, it seems like they are somewhat simpler than before, though: |
i'm not too sure what changed in 4 months but seems like something got fixed working on getting through this. CPU build passed, so now onto everything else.... |
Do we still want this given that PR ( #195 ) added 2.1.0? |
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)