Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HVX] Failure on seemingly valid schedule #7806

Closed
rootjalex opened this issue Aug 24, 2023 · 5 comments · Fixed by #7825
Closed

[HVX] Failure on seemingly valid schedule #7806

rootjalex opened this issue Aug 24, 2023 · 5 comments · Fixed by #7825

Comments

@rootjalex
Copy link
Member

I'm getting a failure related to interleaving on a simple HVX schedule:

#include "Halide.h"
#include <iostream>
#include <stdio.h>

using namespace Halide;
using namespace Halide::ConciseCasts;

void test0() {
    Var x("x");
    ImageParam in0(UInt(8), 1);
    Target hvx("hexagon-32-noos-hvx_128-hvx_v66");
    Func out("out");
    out(x) = u8_sat(u16(in0(x)) * 5);
    out.vectorize(x, 128);
    out.compile_to_assembly("hvx-test0.asm", out.infer_arguments(), hvx);
}

int main(int arch, char **argv) {
    test0();
}
ajroot@Alexanders-MacBook-Pro Halide % ./bin/correctness_hvx
Internal Error at /Users/ajroot/projects/Halide/src/CodeGen_Hexagon.cpp:1883 triggered by user code at :
Condition failed: op->type.lanes() % (native_vector_bits() * 2 / op->type.bits()) == 0:
zsh: abort      ./bin/correctness_hvx

@pranavb-ca do you know what might fix this?

@pranavb-ca
Copy link
Contributor

Hey @rootjalex - I am sorry I missed this on Friday. Is this still an issue? Does your PR #7805 fix this? Let me take a look at this on Tuesday.

@rootjalex
Copy link
Member Author

@pranavb-ca No worries! Unfortunately no, that PR does not address this issue

@pranavb-ca
Copy link
Contributor

I think I have a fix for this. This problem sounded vaguely familiar and sure enough, I had fixed this downstream recently but hadn't upstreamed my fkx. My fault entirely. Sorry about that. I'll post the fix later today (or early tomorrow if that's ok with you as I have a series of meetings today)

@rootjalex
Copy link
Member Author

Great, thank you so much! No rush!

@pranavb-ca
Copy link
Contributor

FWIW, this stmt is the probem

  for (out.s0.x.x, 0, out.extent.0/128) {
   out[ramp(out.s0.x.x*128, 1, 128) aligned(128, 0)] = (uint8x128)halide.hexagon.interleave.vb((uint8x128)saturating_cast((uint16x128)halide.hexagon.mpy.vub.ub(p0[ramp((out.s0.x.x*128) + (out.min.0 - p0.min.0), 1, 128)], (uint8)5)))
  }

halide.hexagon.interleave.vb is a native_interleave as defined by CodeGen_Hexagon. So, it works only on multiples of double vectors.

The problem is that there is no saturating downcast of u16 values to u8, that is

u8_sat(u16)

isn't supported on HVX. So, the solution is to pattern match it and then custom expand it in bitcode (hvx_128.ll)

pranavb-ca added a commit that referenced this issue Sep 18, 2023
) (#7825)

* Dump the IR more frequently in HexagonOptimize.cpp

* Fix 8bit unsigned saturating downcasts for HVX

We do not have a way of reliably lowering the following expression
to LLVM bitcode for HVX.

u8_sat(uint16x)

where uint16x is a vector (preferably a HVX double vector) with
element type uint16.
Since there is no native HVX instruction to do this, this patch
introduces two helper functions in hvx_128.ll to perform this
operation. One function interleaves its input (trunc_satub.vuh) and the
other does not (pack_satub.vuh)

This patch also removes declaration of some intrinsics not use any
longer in hvx_128.ll

* Make IR dump messages in HexagonOptimize.cpp consistent with those in CodeGen_Hexagon.cpp

* fix clang-format complaints

---------

Co-authored-by: Steven Johnson <srj@google.com>
ardier pushed a commit to ardier/Halide-mutation that referenced this issue Mar 3, 2024
…lide#7806) (halide#7825)

* Dump the IR more frequently in HexagonOptimize.cpp

* Fix 8bit unsigned saturating downcasts for HVX

We do not have a way of reliably lowering the following expression
to LLVM bitcode for HVX.

u8_sat(uint16x)

where uint16x is a vector (preferably a HVX double vector) with
element type uint16.
Since there is no native HVX instruction to do this, this patch
introduces two helper functions in hvx_128.ll to perform this
operation. One function interleaves its input (trunc_satub.vuh) and the
other does not (pack_satub.vuh)

This patch also removes declaration of some intrinsics not use any
longer in hvx_128.ll

* Make IR dump messages in HexagonOptimize.cpp consistent with those in CodeGen_Hexagon.cpp

* fix clang-format complaints

---------

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants