Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Oct 21, 2022

Manywheel binary build currently doesn't have any compiler cache, so it takes more than 2h+ to just build PyTorch https://github.com/pytorch/pytorch/actions/runs/3285968556/jobs/5413580528.

I'll make a similar change to libtorch binary build later in a separate PR (lower priority cause building libtorch is not that slow)

Testing

  • Build the new image locally with GPU_ARCH_TYPE=cuda GPU_ARCH_VERSION=11.6 manywheel/build_docker.sh
  • Build PyTorch binary locally with the locally built image, i.e. pytorch/manylinux-builder:cuda11.6-e7608179efd287af102e40941fc24abff8d8a5bd. Here is the exact command I run inside the container:
export PYTORCH_ROOT=/tmp/pytorch
export BUILDER_ROOT=/tmp/builder
export PACKAGE_TYPE=manywheel
export DESIRED_CUDA=cu116
export GPU_ARCH_VERSION=11.6
export GPU_ARCH_TYPE=cuda
export DESIRED_PYTHON=3.7
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
export SCCACHE_S3_KEY_PREFIX=debug
/tmp/builder/manywheel/build.sh

@huydhn huydhn requested review from a team and atalman October 21, 2022 21:41
@huydhn huydhn self-assigned this Oct 21, 2022
@huydhn huydhn marked this pull request as ready for review October 21, 2022 23:16
@huydhn
Copy link
Contributor Author

huydhn commented Oct 21, 2022

@huydhn
Copy link
Contributor Author

huydhn commented Oct 22, 2022

The ROCm failure https://github.com/pytorch/builder/actions/runs/3301097112/jobs/5446221931 is due to the recent change in #1160. cc @jataylo Could you help do a fix for that? It looks like 5.1.1 is missing from the list (-eq v.s. -ge)

@jataylo
Copy link
Contributor

jataylo commented Oct 24, 2022

@huydhn Will take a look at this, thank you.

CC @jithunnair-amd

@jithunnair-amd
Copy link
Contributor

@huydhn Attempting a fix in PR #1170

@huydhn
Copy link
Contributor Author

huydhn commented Oct 24, 2022

Push to https://github.com/pytorch/builder/tree/add-sccache-support so that the docker image can be published for testing

@huydhn
Copy link
Contributor Author

huydhn commented Oct 24, 2022

The new docker images are published for testing cuda11.6

@malfet
Copy link
Contributor

malfet commented Oct 25, 2022

Not using sccache for binary builds is a design decision:

  • sccache in the past is known to return stale/invalid objects (as it failed to account that file needs to be recompiled)
  • This also a security risk, as one can submit a PR that populates cache with some malicious data that will be incorporated into a nightly/release builds.

@atalman
Copy link
Contributor

atalman commented Oct 25, 2022

I would also add, that for binary builds we want to have predictable dependency versions from what I saw in failure logs it failed to return some dependency - dependency was missing or incorrect version. This maybe a same use case as @malfet point 1.

@huydhn huydhn marked this pull request as draft October 25, 2022 21:18
@huydhn huydhn closed this Oct 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants