-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align diffusers CI tests with examples #1679
Align diffusers CI tests with examples #1679
Conversation
16a1e33
to
39afbd4
Compare
39afbd4
to
3d8a509
Compare
@dsocek thanks for this PR. Great work. Have you run the CI for this PR? Please post the CI results and do the slow test with your modification. I am getting some failure testing. |
The custom op issue has already been addressed in this pull request #1655 |
8ba13e8
to
cc6ca4a
Compare
@yafshar all your changes are incorporated (I squashed many of these into 1 commit), and I also added commit which fixes the quant issue that you found. Thanks for great review! |
@dsocek thanks! Would you please post the test results? I am still seeing some issue. >>> RUN_SLOW=true GAUDI2_CI=1 python -m pytest tests/test_diffusers.py -s -v |
7f663cb
to
cc6ca4a
Compare
@dsocek thanks! Would you please post the test results? Everything else sounds great to me. We can ask regiss to check the PR. |
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
8ac49f1
to
83561e3
Compare
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@yafshar Seems there were some issues on 1.19 (I originally tested only 1.18), related to SDP on BF16. After PT 2.5 we now have SDP in full precision as opposed to bf16. I added proper enablement for all diffusers and updated tests to use bf16 sdp. Now all (slow and fast) diffusers tests pass on both 1.18 and 1.19.
Multi-card tests:
|
@dsocek Thank you for your excellent work! I'll complete my review as soon as possible. |
@dsocek I am still getting some random fails from ''textual_inversion" test, but I think that overall everything is great. We should merge this PR and fix anything else later. [rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 75, in __torch_function__
[rank0]: return super().__torch_function__(func, types, new_args, kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py", line 81, in _lazy_init
[rank0]: _hpu_C.init()
[rank0]: RuntimeError: synStatus=8 [Device not found] Device acquire failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi @regisss, this PR is ready for your final review. Could you please take a look?
@yafshar actually this is some issue which I noticed. The CI will fail if you run multi-card tests with other tests. However, if you run it alone, it will pass. Should be investigated and fixed independently of this PR |
Yes, this is something I have noticed too and that's actually why the Diffusers slow tests are not run with one command but several specific ones: Line 101 in 3c251d0
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@dsocek It seems the loss at the end of the training of the textual inversion test (regular SD, not SDXL) is NaN. It is not caused by this PR and it happens with previous commits too. Any idea why this could happen? It doesn't happen with SDXL. |
What does this PR do?
Comprehensive fixes and alignment between CI and examples README.md for diffusers
1st Commit re-organizes sections in README for better clarity and structure, and removes a number of over-represented samples (e.g. for controlnet and inpainting). We also now have text-to-video integrated here rather than in standalone examples folder. Please see improved README doc:
https://github.com/dsocek/optimum-habana/tree/align-diffusers-ci-tests-with-examples/examples/stable-diffusion
2nd Commit modifies tests/test_diffusers.py and closes a gap between README and ci. A number of over-represented tests were removed and all slow tests were aligned to parameters used in README. Also, some tests were merged into single test for efficiency improvement (e.g. separate tests which tested functionality and performance now merged to single test which runs pipeline once and then tests both)