Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDXL punet_quant.mlir fails to compile at ConvertConvToChannelsLastPass #17643

Closed
aviator19941 opened this issue Jun 11, 2024 · 8 comments
Closed
Assignees
Labels
bug 🐞 Something isn't working

Comments

@aviator19941
Copy link
Contributor

aviator19941 commented Jun 11, 2024

What happened?

Running --mlir-print-ir-after-all, ConvertConvToChannelsLastPass failed:
// -----// IR Dump After ConvertConvToChannelsLastPass Failed (iree-preprocessing-convert-conv-to-channels-last) //----- //

After turning off the layout propagation for packs/unpacks, I got this error (full error in the attached file):

punet_quant.mlir:18483:13: error: 'arith.cmpf' op requires attribute 'predicate'
    %6229 = torch.aten.clamp %6228, %int-128_296, %int127_297 : !torch.vtensor<[2,320,128,128],f16>,
    !torch.int, !torch.int -> !torch.vtensor<[2,320,128,128],f16>

Steps to reproduce your issue

  1. Download punet_quant.mlir
  2. Compile on gfx942:
    ../iree-build-trace/tools/iree-compile punet_quant.mlir --iree-global-opt-propagate-transposes=true --iree-opt-const-eval=false --iree-opt-outer-dim-concat=true --iree-vm-target-truncate-unsupported-floats --iree-llvmgpu-enable-prefetch=true --iree-opt-data-tiling=false --iree-codegen-gpu-native-math-precision=true --iree-rocm-waves-per-eu=2 --iree-flow-inline-constants-max-byte-length=1 --iree-preprocessing-pass-pipeline="builtin.module(iree-preprocessing-transpose-convolution-pipeline, util.func(iree-preprocessing-pad-to-intrinsics))" --iree-flow-enable-aggressive-fusion --iree-global-opt-enable-fuse-horizontal-contractions=true --iree-opt-aggressively-propagate-transposes=true --iree-codegen-llvmgpu-use-vector-distribution=true --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 --iree-vm-bytecode-module-output-format=flatbuffer-binary -o punet.vmfb
  3. See error

What component(s) does this issue relate to?

Compiler

Version information

b44581a

Additional context

No response

@aviator19941 aviator19941 added the bug 🐞 Something isn't working label Jun 11, 2024
@qedawkins
Copy link
Contributor

cc @hanhanW

@hanhanW
Copy link
Contributor

hanhanW commented Jun 11, 2024

It looks like there is a bug in ConvertConvToChannelsLastPass. @IanWood1 please help triage when you're available, thank you!

@aviator19941
Copy link
Contributor Author

After commenting out the propagation layout from ConvertConvToChannelsLastPass, I found that the predicate op was missing from the distributedOp in GPUDistributionPatterns.cpp. I will add a PR for that fix, but still need help triaging the ConvertConvToChannelsLastPass.

@hanhanW
Copy link
Contributor

hanhanW commented Jun 12, 2024

It would be good if you can share the IR before the pass, then @IanWood1 can start from there.

@hanhanW
Copy link
Contributor

hanhanW commented Jun 12, 2024

The command would be something like iree-compile --mlir-print-ir-before=iree-preprocessing-convert-conv-to-channels-last --mlir-elide-elementsattrs-if-larger=0 --mlir-elide-resource-strings-if-larger=0 ...

@IanWood1
Copy link
Contributor

IanWood1 commented Jun 12, 2024

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

@aviator19941
Copy link
Contributor Author

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

let me check if I can reproduce this with the ToM iree and I'll also get you the IR before the pass.

@aviator19941
Copy link
Contributor Author

aviator19941 commented Jun 12, 2024

@aviator19941 did you get an error before disabling layout propagation for packs/unpack (other than output from --mlir-print-ir-after-failure)?

Without propagation disabled, it appears like DataLayoutPropagationPatterns rewrite patterns are failing to converge. I let the greedy rewriter run with no limit and got an error during verification after the pass https://gist.github.com/IanWood1/269798dffcde630a06ef70b2c5fcdebd/raw/f5001f37673e57ce4d811da4cca897d6822999c8/punet-compile.log

This is the IR I got before disabling layout propagation on the IREE version specified above (b44581a): https://gist.github.com/aviator19941/f76d3e86754517578807a710ed9d1195.

IanWood1 added a commit that referenced this issue Jun 13, 2024
…Last.cpp (#17668)

- Added `GreedyRewriteConfig` set to `kNoLimit` since the patterns were
failing to converge within the 10 iterations
- Changed the way re-association indices are calculated for
`GeneralizeOuterUnitDimsPackOps`. (There might be a helper function
somewhere for this but i couldn't find one)




#### Before (verification error)
```mlir
%30042 = "tensor.expand_shape"(%30041) 
	<{reassociation = [[0, 1], [2, 3], [4], [5]], 
		static_output_shape = array<i64: 1, 1, 3, 3, 320, 4>}> 
	: (tensor<3x3x320x4xi8>) -> tensor<1x1x3x3x320x4xi8>
```

#### After
```mlir
%30042 = "tensor.expand_shape"(%30041) 
	<{reassociation = [[0, 1, 2], [3], [4], [5]], 
		static_output_shape = array<i64: 1, 1, 3, 3, 320, 4>}> 
	: (tensor<3x3x320x4xi8>) -> tensor<1x1x3x3x320x4xi8>
```



#17643

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
…Last.cpp (iree-org#17668)

- Added `GreedyRewriteConfig` set to `kNoLimit` since the patterns were
failing to converge within the 10 iterations
- Changed the way re-association indices are calculated for
`GeneralizeOuterUnitDimsPackOps`. (There might be a helper function
somewhere for this but i couldn't find one)

#### Before (verification error)
```mlir
%30042 = "tensor.expand_shape"(%30041)
	<{reassociation = [[0, 1], [2, 3], [4], [5]],
		static_output_shape = array<i64: 1, 1, 3, 3, 320, 4>}>
	: (tensor<3x3x320x4xi8>) -> tensor<1x1x3x3x320x4xi8>
```

#### After
```mlir
%30042 = "tensor.expand_shape"(%30041)
	<{reassociation = [[0, 1, 2], [3], [4], [5]],
		static_output_shape = array<i64: 1, 1, 3, 3, 320, 4>}>
	: (tensor<3x3x320x4xi8>) -> tensor<1x1x3x3x320x4xi8>
```

iree-org#17643

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants