Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1 jobs are all failing #8456

Closed
NicolasHug opened this issue May 30, 2024 · 2 comments
Closed

M1 jobs are all failing #8456

NicolasHug opened this issue May 30, 2024 · 2 comments

Comments

@NicolasHug
Copy link
Member

NicolasHug commented May 30, 2024

We moved them to M1 instances 2 days ago and things were fine in the PR but these jobs have been extremely flaky since then. E.g. the current failures on b0f9f7b

  • Conda build: 3.9 is passing but all other Python versions are failing on import torchvision.
  • Wheel build: all job are failing
 import torchvision
 File "/Users/ec2-user/runner/_work/_temp/conda_environment_9300373488/lib/python3.8/site-packages/torchvision/__init__.py", line 6, in <module>
   from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
 File "/Users/ec2-user/runner/_work/_temp/conda_environment_9300373488/lib/python3.8/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
   def meta_nms(dets, scores, iou_threshold):
 File "/Users/ec2-user/runner/_work/_temp/conda_environment_9300373488/lib/python3.8/site-packages/torch/library.py", line 639, in register
   use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
 File "/Users/ec2-user/runner/_work/_temp/conda_environment_9300373488/lib/python3.8/site-packages/torch/library.py", line 139, in _register_fake
   handle = entry.abstract_impl.register(func_to_register, source)
 File "/Users/ec2-user/runner/_work/_temp/conda_environment_9300373488/lib/python3.8/site-packages/torch/_library/abstract_impl.py", line 30, in register
   if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

Maybe #8407 can help? EDIT: NOPE.


This is probably related to these compilation warnings we're observing for every single op?

  ld: warning: object file (/Users/ec2-user/runner/_work/vision/vision/pytorch/vision/build/temp.macosx-11.1-arm64-cpython-311/Users/ec2-user/runner/_work/vision/vision/pytorch/vision/torchvision/csrc/ops/autocast/nms_kernel.o) was built for newer macOS version (13.0) than being linked (11.1)

It's clear that the installed torch version is for MacOS 11 so if torchvision is being compiled on MacOS 13 this might be the cause?

EDIT: NOPE. https://github.com/pytorch/vision/actions/runs/9288139810/job/25565749027?pr=8452 from yesterday has the exact same warnings but it runs fine.

@malfet
Copy link
Contributor

malfet commented Jun 7, 2024

@huydhn I guess it has something to do with MacOS runners update.. Let me have a look at what is going on.

Though it fails on my Mac even with MPS disabled, so I have no idea what is going on...

@NicolasHug
Copy link
Member Author

Thanks a lot for looking into this @malfet and @huydhn , I was about to ping you both as I saw some related changes in the PyTorch repo from last week. I've opened #8478 to keep track of the MPS issue and with more details (I'll close this one if you don't mind as it contains a bit of noise from my investigations above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants