Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cog build --use-cog-base-image=false fails on invalid wheel filename #1963

Closed
josephhaaga opened this issue Sep 20, 2024 · 3 comments
Closed

Comments

@josephhaaga
Copy link

Looks like cog build --use-cog-base-image=false fails with this particular combination of CUDA 12.3 and Python 3.11 due to:

  • no r8.im base image to start from
  • the nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 base image being picky about wheel filenames

I suspect the generated wheel needs a filename that will pass this regex

My machine

$ cog --version
cog version 0.9.23 (built 2024-09-13T09:49:08Z)

$ neofetch
                    'c.          josephhaaga@computer
                 ,xNMM.          ---------------------
               .OMMMMo           OS: macOS 14.7 23H124 arm64
               OMMM0,            Host: MacBookPro18,2
     .;loddo:' loolloddol;.      Kernel: 23.6.0
   cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 1 day, 1 hour, 23 mins
 .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 183 (brew)
 XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: zsh 5.9
;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 1728x1117
:MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua
.MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: yabai
 kMMMMMMMMMMMMMMMMMMMMMMMMWd.    Terminal: tmux
 .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   CPU: Apple M1 Max
  .XMMMMMMMMMMMMMMMMMMMMMMMMK.   GPU: Apple M1 Max
    kMMMMMMMMMMMMMMMMMMMMMMd     Memory: 2481MiB / 32768MiB
     ;KMMMMMMMWXXWMMMMMMMk.
       .cooc,.    .,coo:.

Use Rosetta for x86_64/amd64 emulation on Apple Silicon is disabled

details

There doesn't seem to be a CUDA 12.3 + Python 3.11 base image available

# cog.yaml
build:
  cuda: "12.3" # https://www.tensorflow.org/install/source#gpu
  gpu: true
  python_version: "3.11"
  python_packages:
    - "pip==24.2"
    - "pandas==2.2.2"
    - "tensorflow==2.16.2"
    - "tensorflow-datasets==4.9.6"
    - "tensorflow-recommenders==0.7.3"
    - "tf-keras==2.16.0"
    - "scann"
train: "train.py:train"
predict: "predict.py:Predictor"
Logs

$ cog build  -t user-to-buzz:$(git rev-parse HEAD)                                                                  

⚠ Cog doesn't know if CUDA 12.3 is compatible with Tensorflow 2.16.2. This might cause CUDA problems.
Building Docker image from environment in cog.yaml as user-to-buzz:30af21cb81876f3681c190ecbc844d5c8e7c2750...
[+] Building 1.0s (6/6) FINISHED                                                                                                                                        docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                    0.0s
 => => transferring dockerfile: 442B                                                                                                                                                    0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                                             0.3s
 => [auth] docker/dockerfile:pull token for registry-1.docker.io                                                                                                                        0.0s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                       0.0s
 => => transferring context: 58B                                                                                                                                                        0.0s
 => ERROR [internal] load metadata for r8.im/cog-base:cuda12.3-python3.11                                                                                                               0.5s
------
 > [internal] load metadata for r8.im/cog-base:cuda12.3-python3.11:
------
Dockerfile:2
--------------------
   1 |     #syntax=docker/dockerfile:1.4
   2 | >>> FROM r8.im/cog-base:cuda12.3-python3.11
   3 |     COPY .cog/tmp/build20240920112133.9388834066866224/requirements.txt /tmp/requirements.txt
   4 |     ENV CFLAGS="-O3 -funroll-loops -fno-strict-aliasing -flto -S"
--------------------
ERROR: failed to solve: failed to resolve source metadata for r8.im/cog-base:cuda12.3-python3.11: r8.im/cog-base:cuda12.3-python3.11: not found
ⅹ Failed to build Docker image: exit status 1

Setting --use-cog-base-image=false results in an error with how the wheel file is named

[deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl:
9.250 ERROR: cog.whl is not a valid wheel filename.
10.33
10.33 [notice] A new release of pip is available: 24.0 -> 24.2
10.33 [notice] To update, run: pip install --upgrade pip

I suspect this could be fixed by updating pip to 24.2 – which I'm already doing via python_packages – but I don't think weeven get that far due to the wheel filename issue :-/

Logs

$ cog build --use-cog-base-image=false -t user-to-buzz:$(git rev-parse HEAD)

⚠ Cog doesn't know if CUDA 12.3 is compatible with Tensorflow 2.16.2. This might cause CUDA problems.
Building Docker image from environment in cog.yaml as user-to-buzz:30af21cb81876f3681c190ecbc844d5c8e7c2750...
[+] Building 12.8s (14/24)                                                                                                                           docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                 0.0s
 => => transferring dockerfile: 2.71kB                                                                                                                               0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                          0.5s
 => [auth] docker/dockerfile:pull token for registry-1.docker.io                                                                                                     0.0s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                    0.0s
 => [internal] load .dockerignore                                                                                                                                    0.0s
 => => transferring context: 58B                                                                                                                                     0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04                                                                               0.5s
 => [internal] load metadata for docker.io/library/python:3.11                                                                                                       0.2s
 => [auth] nvidia/cuda:pull token for registry-1.docker.io                                                                                                           0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                        0.0s
 => [deps 1/5] FROM docker.io/library/python:3.11@sha256:157a371e60389919fe4a72dff71ce86eaa5234f59114c23b0b346d0d02c74d39                                            0.0s
 => [internal] load build context                                                                                                                                    1.0s
 => => transferring context: 2.70MB                                                                                                                                  0.8s
 => CANCELED [stage-1 1/9] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413       11.7s
 => => resolve docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413                         0.0s
 => => sha256:5d846bce3f9896ccd22114c9d44658c38798b5bd2660bc3048199d3840a2444d 19.68kB / 19.68kB                                                                     0.0s
 => => sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218d7a55b3a0e4b4413 743B / 743B                                                                           0.0s
 => => sha256:4f00d5116a3679bab6bc13318c8555d7207206de2318e77348a9a93f66e73e21 2.84kB / 2.84kB                                                                       0.0s
 => => sha256:01007420e9b005dc14a8c8b0f996a2ad8e0d4af6c3d01e62f123be14fe48eec7 29.54MB / 29.54MB                                                                     1.1s
 => => sha256:bfc08b17629d5dde3f9b4b837997c26fee28c86d20cf6c65834066dff820c8fa 4.62MB / 4.62MB                                                                       0.8s
 => => sha256:86fc789646b553a337ffae04223a669744a8112e6e77d01ec87f9595e83e4b4f 57.07MB / 57.07MB                                                                     5.8s
 => => sha256:6b62141c2a212c553952737153b7ca35189c2fa4e1ba75e88f5e31b50de2c2d7 185B / 185B                                                                           0.9s
 => => sha256:e0e30e504698762f2cab0281477b911293c1b67c7b3b7a45d917b7fc68702c33 6.89kB / 6.89kB                                                                       1.0s
 => => sha256:346eb11560eafe7714b88308af8b0e03b3642b96787d05600dcbe5059b1c34e7 59.77MB / 1.29GB                                                                     11.7s
 => => extracting sha256:01007420e9b005dc14a8c8b0f996a2ad8e0d4af6c3d01e62f123be14fe48eec7                                                                            0.8s
 => => sha256:a011ef94b5587a8899dbd1b4e17f045365a367cf31892800eee099e48c60ddf9 63.93kB / 63.93kB                                                                     1.3s
 => => sha256:7543c096139519189025b8e57e2b2a1f2b1edc7c60b4aa56346063bc15e0cc1f 1.68kB / 1.68kB                                                                       1.4s
 => => sha256:43c77217e0094adc5276f4b8d9f01d9368adb606df79ab43e454477daf9a6b7a 1.52kB / 1.52kB                                                                       1.4s
 => => sha256:8ebe7e080c37469e9f54a4a0506785a20a769e656bd6f745c02d2e034ae5a2f8 138.41MB / 2.57GB                                                                    11.7s
 => => extracting sha256:bfc08b17629d5dde3f9b4b837997c26fee28c86d20cf6c65834066dff820c8fa                                                                            0.1s
 => => sha256:11f6815212a58a0087458334d15e35038ede10c7808b67285af9460e170f6648 88.61kB / 88.61kB                                                                     5.9s
 => => extracting sha256:86fc789646b553a337ffae04223a669744a8112e6e77d01ec87f9595e83e4b4f                                                                            0.6s
 => => sha256:cc5e7ed01d80eaef50bafb4073ac585f37ac8419515e1674746f21d5e10eb82b 55.57MB / 675.30MB                                                                   11.7s
 => => extracting sha256:6b62141c2a212c553952737153b7ca35189c2fa4e1ba75e88f5e31b50de2c2d7                                                                            0.0s
 => => extracting sha256:e0e30e504698762f2cab0281477b911293c1b67c7b3b7a45d917b7fc68702c33                                                                            0.0s
 => CACHED [deps 2/5] COPY .cog/tmp/build20240920115210.3368532414966447/cog.whl /tmp/cog.whl                                                                        0.0s
 => ERROR [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl                                                 10.7s
------
 > [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl:
9.148 ERROR: cog.whl is not a valid wheel filename.
10.32
10.32 [notice] A new release of pip is available: 24.0 -> 24.2
10.32 [notice] To update, run: pip install --upgrade pip
------
Dockerfile:5
--------------------
   3 |     COPY .cog/tmp/build20240920115210.3368532414966447/cog.whl /tmp/cog.whl
   4 |     ENV CFLAGS="-O3 -funroll-loops -fno-strict-aliasing -flto -S"
   5 | >>> RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -t /dep /tmp/cog.whl
   6 |     ENV CFLAGS=
   7 |     COPY .cog/tmp/build20240920115210.3368532414966447/requirements.txt /tmp/requirements.txt
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install --no-cache-dir -t /dep /tmp/cog.whl" did not complete successfully: exit code: 1
ⅹ Failed to build Docker image: exit status 1

@jesusmartinoza
Copy link

Having the same issue. Did you manage to solve it?

@nickstenning
Copy link
Member

We've tracked down what's causing this and have a patch open to Homebrew to fix it.

@nickstenning
Copy link
Member

This should now be fixed! Please brew reinstall cog before retrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants