We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For reproduction.
Input Model: https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet.mlir
Input data : wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.0.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.1.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.2.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.3.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.4.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.5.bin wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet_weights.irpa
I built IREE on main and used the TD script in https://github.com/nod-ai/sdxl-scripts/blob/shared/sdxl_on_main/int8-model/specs/attention_and_matmul_spec.mlir
Compilation command for IREE on main
iree-compile \ --iree-execution-model=async-external \ --iree-hal-target-backends=rocm \ --iree-hip-target=gfx942 \ --iree-hip-waves-per-eu=2 \ --iree-codegen-gpu-native-math-precision=true \ --iree-codegen-llvmgpu-use-vector-distribution \ --iree-codegen-transform-dialect-library= \ --iree-dispatch-creation-enable-aggressive-fusion=true \ --iree-global-opt-propagate-transposes=true \ --iree-llvmgpu-enable-prefetch=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-const-eval=false \ --iree-opt-outer-dim-concat=true \ --iree-opt-data-tiling=false \ --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline, iree-preprocessing-pad-to-intrinsics, util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \ --iree-vm-target-truncate-unsupported-floats \ ${PUNET_MODEL} \ -o ${VMFB} \
Run Command :
iree-benchmark-module \ --device=hip:0 \ --device_allocator=caching \ --function=main \ --hip_allow_inline_execution=true \ --hip_use_stream=true \ --input=1x4x128x128xf16=@inference_input.0.bin \ --input=1xf16=@inference_input.1.bin \ --input=2x64x2048xf16=@inference_input.2.bin \ --input=2x1280xf16=@inference_input.3.bin \ --input=2x6xf16=@inference_input.4.bin \ --input=1xf16=@inference_input.5.bin \ --module=${VMFB} \ --parameters=model=punet_weights.irpa
For compilation on MLPerf I used the same inputs/weights but used IREE Commit : https://github.com/iree-org/iree/tree/mlperf_v4.1_20240726 TD script : https://github.com/nod-ai/sdxl-scripts/blob/mlperf_v4.1_20240726/int8-model/specs/attention_and_matmul_spec.mlir
iree-compile --iree-execution-model=async-external \ --iree-hal-target-backends=rocm \ --iree-rocm-target-chip=gfx942 \ --iree-rocm-waves-per-eu=2 \ --iree-codegen-gpu-native-math-precision=true \ --iree-codegen-llvmgpu-use-vector-distribution \ --iree-codegen-transform-dialect-library=${TD_SPEC} \ --iree-flow-enable-aggressive-fusion=true \ --iree-global-opt-propagate-transposes=true \ --iree-llvmgpu-enable-prefetch=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-const-eval=false \ --iree-opt-outer-dim-concat=true \ --iree-opt-data-tiling=false \ --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pa\d-to-intrinsics), util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \ --iree-vm-target-truncate-unsupported-floats \ ${PUNET_MODEL} \ -o ${VMFB} \
and same run command
The following dispatches regress
attention_48_* 41ms -> 53 ms attention_146_* 48 ms -> 56 ms
Below is IR dumps for MLPerf branch and ToM for the two attention dispatches.
sdxl_mlperf_attention_48.dump.mlir.txt sdxl_mlperf_attention_146.dump.mlir.txt sdxl_tom_attention_48.dump.mlir.txt sdxl_tom_attention_146.dump.mlir.txt
The text was updated successfully, but these errors were encountered:
Putting here for better visibility, this is the last commit we were working on for FP8 IREE: https://github.com/iree-org/iree/commits/shared/sdxl_fp8_model SDXL: https://github.com/nod-ai/sdxl-scripts/commits/shared/sdxl_fp8_model
Sorry, something went wrong.
manupak
No branches or pull requests
For reproduction.
Input Model:
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet.mlir
Input data :
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.0.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.1.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.2.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.3.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.4.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.5.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet_weights.irpa
I built IREE on main and used the TD script in https://github.com/nod-ai/sdxl-scripts/blob/shared/sdxl_on_main/int8-model/specs/attention_and_matmul_spec.mlir
Compilation command for IREE on main
Run Command :
For compilation on MLPerf I used the same inputs/weights but used
IREE Commit : https://github.com/iree-org/iree/tree/mlperf_v4.1_20240726
TD script : https://github.com/nod-ai/sdxl-scripts/blob/mlperf_v4.1_20240726/int8-model/specs/attention_and_matmul_spec.mlir
and same run command
The following dispatches regress
attention_48_* 41ms -> 53 ms
attention_146_* 48 ms -> 56 ms
Below is IR dumps for MLPerf branch and ToM for the two attention dispatches.
sdxl_mlperf_attention_48.dump.mlir.txt
sdxl_mlperf_attention_146.dump.mlir.txt
sdxl_tom_attention_48.dump.mlir.txt
sdxl_tom_attention_146.dump.mlir.txt
The text was updated successfully, but these errors were encountered: