-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit-tests for M1 #6111
Add unit-tests for M1 #6111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -6,6 +6,7 @@ on: | |||
push: | |||
branches: | |||
- nightly | |||
- main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to execute this on main ? Since you already compiling vision inside test-m1.yml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test-m1
only tests on py3.8 and executes only part of the checks (no wheel validation). I thought it would be best to do this at least once in each PR on main to be able to catch the issues easier. It's a middle ground tradeoff between executing the tests in every PR/commit and doing so only once a day. Hopefully the volume of commits that get merged every day on TorchVision isn't significant to affect the overall resources.
As suspected there are failures on some of the m1 tests:
I'm going to rerun the test with more verbosity show more details. @pmeier Any initial ideas what could be the problem with the Dataset test? @NicolasHug We had quite a few issues with the specific jpeg decoding tests. I remember you did an investigation previously on Windows and the main issue was the underlying JPEG lib? Any ideas whether we should take these failures seriously? |
I don't think it's related to comparison tests. From the log, the error is
|
The Dataset failures:
|
@NicolasHug Thanks for checking. You are right, the Dataset is related to the AV dependency. The log is a bit hard to read because it truncates the output. I've added the dependency. @atalman @malfet It seems on #5948, you installed |
No reason, I just copy-n-pasted requirements from https://github.com/pytorch/builder/blob/bfce31f0d6af712081504ccda5c8d6c6d9ee7c94/build_m1_domains.sh#L12 But I'm a bit surprised that it is not used, as |
Hmm, so I can read both png and jpeg images on M1 using torchvision pip wheel (which were built using OpenJPEG):
But I can not do it using nightly:
Ok, and the problem should have been obvious, considering https://github.com/pytorch/vision/runs/6669674559?check_suite_focus=true#step:3:126 and https://github.com/pytorch/pytorch/blob/release/1.11/aten/src/ATen/core/ivalue_inl.h#L2088 |
@malfet I'm sorry, I don't think I follow you. Here is my understanding of the situation:
Please note that we are currently 5 calendar days away from the RC cut on Domains (6/6). Since in the UK the 2nd and 3rd of June are bank holidays, we effectively have 2 biz days (today and the 6th). We currently don't have any proof that the M1 build of TorchVision works properly and unfortunately we are running out of time to test and fix things. This is where we could use your help:
If you have the bandwidth to provide support on the above, once our team is back on the 6th of June we will try to address any broken unit-tests on the M1, cherrypick the fixes and move them to the release branch. If you are not able to provide support due to bandwidth issues, then our team will revert all the M1 PRs to avoid releasing a broken binary in the next release and we can review adding M1 support to TorchVision in the weeks ahead. (cc @NicolasHug who is the PoC for this release) I'll close this PR but feel free to copy it and use any part that you find useful to help you in your work. Thanks! |
Addressing #5948 (comment)
This PR: