-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen_LLVM] Directly load scalar that we'd load as vector and rein… #6809
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only has basic (stride=+1/-1) correctness test coverage,
but no tests that would ensure that the load actually got widened.
If we'd do that at halide ir level, it would be obvious to check,
but here i'm not really sure how to do that.
bool test_all(Target t) { | ||
bool success = true; | ||
|
||
success &= test_with_chunk_type<uint8_t>(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some kind of failure to simplify for the i8 chunk type somewhere in halide:
let t5 = b0[(store.min.0 + store.s0.x.rebased)*2]
store[store.s0.x.rebased] = uint64((uint16)reinterpret(ramp(t5, b0[((store.min.0 + store.s0.x.rebased)*2) + 1] - t5, 2)))
but suddenly it's fine for larger types:
store$1[store$1.s0.x.rebased] = uint64((uint32)reinterpret(b3[ramp((store$1.min.0 + store$1.s0.x.rebased)*4, 1, 4) aligned(4, 0)]))
…terpret Second (of three) pieces of the load widening puzzle. Here, the codegen is taught to directly emit scalar loads, instead of doing vector load and `bitcast`ing it. Refs. halide#6801 Refs. halide#6756 Refs. halide#6775
096e85c
to
be5cba7
Compare
LoadInst *load = builder->CreateAlignedLoad( | ||
llvm_dst, ptr, llvm::Align(l->type.bytes())); | ||
// FIXME: can we emit better TBAA for constant indexes here? | ||
add_tbaa_metadata(load, l->name, /*index=*/Expr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct to use the original load index here, because the tbaa metadata is all in terms of the allocated type before the reinterpret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, wait, of course it is, because we really don't change which bytes we load.
The implementation makes sense, aside from my concerns about using the reinterpret intrinsic to change vector lanes, but the test is not great - We don't use vector types in front-end code, and I have no idea what might break if we did. I guess it's OK as a temporary measure, though the weird failure with uint8s is alarming and might indicate that some assumption is being violated somewhere. |
Right. Well, i'm not sure how else to write that test given what's currently available :)
It's some missed simplification. In good case, it is simplified during |
Where does this PR stand? |
I guess, |
Is this PR still active? |
Yes, it's all connected (©)!
|
…terpret
Second (of three) pieces of the load widening puzzle.
Here, the codegen is taught to directly emit scalar loads,
instead of doing vector load and
bitcast
ing it.Refs. #6801
Refs. #6756
Refs. #6775