-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial IPEX support for Intel Arc GPU #14171
Conversation
I'd like to not have
Also I assume I wouldn't be able to use it with just an AMD CPU, right? |
Thanks for the quick feedback! Will update soon.
Right. At the moment IPEX XPU only works for Intel Arc dGPU. It doesn't even work for Intel iGPU (UHD or Iris Xe Graphics). |
Got the issue when using this, my CPU is being used to render, not my GPU (Intel Arc SE 16GB) |
Same on thee ARC770. I am using --use-ipex in the command line but only the CPU is used. Not sure if it's because the ReActor plugin is always "preheating" a device, and it only sees the CPU. The onboard iGPU770 is disabled too, so it's not causing any interference. |
@gmbhneo @tusharbhutt are you using dev branch? and also is iGPU enabled? Ensure python version for the webui env is python 3.10 on windows. if iGPU is enabled, add
in just tried dev branch on windows and it worked. To monitor GPU utilization on windows, open task manager --> change one of the metric to |
@@ -352,6 +372,8 @@ def prepare_environment(): | |||
run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True) | |||
startup_timer.record("install torch") | |||
|
|||
if args.use_ipex: | |||
args.skip_torch_cuda_test = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to include a torch version check, if users have other torch packages installed in the env then run pip install to install required ipex, torch, torchvision packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if args.use_ipex:
if is_installed("torch"):
import torch
if torch.__version__ != "2.0.0a0+git9ebda2" or not is_installed("intel_extension_for_pytorch"):
run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True)
startup_timer.record("install torch")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could check_run_python("import torch; import intel_extension_for_pytorch; assert torch.xpu.is_available()")
to perform a sanity test, so that we don't assume a specific torch version -- Intel may release newer versions and user could build from source with a custom version.
@gmbhneo @tusharbhutt A few tips:
|
possible issue #14224 |
I'll give it a try in a bit, yanked out the A770 and put the 3060 back in. However, previously, I had disabled the iGPU and had "--use-ipex" in the args. I'll try the fresh Venv folder next. This is on Python 3.10 in Windows 10 using the dev branch |
Getting |
File an issue ticket with more detail, please. |
I can confirm it is working! I needed to disable my iGPU (UHD, Iris) in Device Manager and delete my old |
—use-ipex has reduced my render time from ~1 minute using 512x512 20 step to ~30 seconds. Is there any other command line arg that might explain the difference in performance from what you’ve seen? I have an a770 16GB |
try adding --opt-sdp-attention , in master branch it's default to invoke ai which causes perf issues. dev branch has this fixed |
--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds. |
do you have iGPU enabled? if so please disable iGPU. perf on A770 for 512x512 20 steps should take about 3 seconds |
ReBAR is the bottleneck for sure. ReBAR-ON is ~5x faster than ReBAR-OFF for IPEX. |
@uxdesignerhector Apologies for the late reply. I did get it working in my old machine (the one without ReBAR) as per this thread: However, was about 2x slower than my 3060 and I haven't bothered to put the ARC in my new machine in about six weeks simply because I am swamped with work. I'll give it ago once I can wrestle it away from my son. if it works and is materially close to the 3060, at least I'll have 16GB of VRAM instead of 12. Then he can have the 3060 and I'll keep the ARC. |
I actually built a new PC for this card to have rebar and it runs 512x512
sd1.5 in 4 seconds! It’s amazing!
…On Fri, Feb 9, 2024 at 9:00 PM tusharbhutt ***@***.***> wrote:
@uxdesignerhector <https://github.com/uxdesignerhector> Apologies for the
late reply. I did get it working in my old machine (the one without ReBAR)
as per this thread:
#14338
<#14338>
However, was about 2x slower than my 3060 and I haven't bothered to put
the ARC in my new machine in about six weeks simply because I am swamped
with work. I'll give it ago once I can wrestle it away from my son. if it
works and is materially close to the 3060, at least I'll have 16GB of VRAM
instead of 12. Then he can have the 3060 and I'll keep the ARC.
—
Reply to this email directly, view it on GitHub
<#14171 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APWKRXSZIDHMMBUGVQIVKUTYS3PDZAVCNFSM6AAAAABAD3FP7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWHAZDSNBZGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello, what about using Intel AI Boost NPUs? Is this planned? |
Description
This is the initial PR of IPEX Windows support for Intel Arc GPU.
Related feature request: #6417
--use-ipex
to usexpu
as the torch device.xpu_specific
for IPEX XPU specific hijacks.--use-ipex
toCOMMANDLINE_ARGS
to use IPEX backend.With this PR, an Intel Arc A770 16GB can now generate one 512x512 image (sdp cross attention opt, fp16, DPM++ 2M Karras, 20 steps) in 3~4 seconds (~6it/s).
Notes: I only verified basic txt2img functionality at the moment. Based on my experience with SD.Next, we will need more hijacks for IPEX to unlock more functionalities, but I'd like to keep this change minimal and address more IPEX issues in follow-up PRs.
Screenshots/videos:
QQ2023122-16130.mp4
Checklist:
test/test_extras.py::test_simple_upscaling_performed PASSED [ 3%]
test/test_extras.py::test_png_info_performed PASSED [ 6%]
test/test_extras.py::test_interrogate_performed PASSED [ 10%]
test/test_img2img.py::test_img2img_simple_performed PASSED [ 13%]
test/test_img2img.py::test_inpainting_masked_performed PASSED [ 17%]
test/test_img2img.py::test_inpainting_with_inverted_masked_performed PASSED [ 20%]
test/test_img2img.py::test_img2img_sd_upscale_performed PASSED [ 24%]
test/test_txt2img.py::test_txt2img_simple_performed PASSED [ 27%]
test/test_txt2img.py::test_txt2img_with_negative_prompt_performed PASSED [ 31%]
test/test_txt2img.py::test_txt2img_with_complex_prompt_performed PASSED [ 34%]
test/test_txt2img.py::test_txt2img_not_square_image_performed PASSED [ 37%]
test/test_txt2img.py::test_txt2img_with_hrfix_performed PASSED [ 41%]
test/test_txt2img.py::test_txt2img_with_tiling_performed PASSED [ 44%]
test/test_txt2img.py::test_txt2img_with_restore_faces_performed PASSED [ 48%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[PLMS] PASSED [ 51%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[DDIM] PASSED [ 55%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[UniPC] PASSED [ 58%]
test/test_txt2img.py::test_txt2img_multiple_batches_performed PASSED [ 62%]
test/test_txt2img.py::test_txt2img_batch_performed PASSED [ 65%]
test/test_utils.py::test_options_write PASSED [ 68%]
test/test_utils.py::test_get_api_url[sdapi/v1/cmd-flags] PASSED [ 72%]
test/test_utils.py::test_get_api_url[sdapi/v1/samplers] PASSED [ 75%]
test/test_utils.py::test_get_api_url[sdapi/v1/upscalers] PASSED [ 79%]
test/test_utils.py::test_get_api_url[sdapi/v1/sd-models] PASSED [ 82%]
test/test_utils.py::test_get_api_url[sdapi/v1/hypernetworks] PASSED [ 86%]
test/test_utils.py::test_get_api_url[sdapi/v1/face-restorers] PASSED [ 89%]
test/test_utils.py::test_get_api_url[sdapi/v1/realesrgan-models] PASSED [ 93%]
test/test_utils.py::test_get_api_url[sdapi/v1/prompt-styles] PASSED [ 96%]
test/test_utils.py::test_get_api_url[sdapi/v1/embeddings] PASSED [100%]
============================================================================================= 29 passed in 8.39s =============================================================================================