Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen_LLVM] Directly load scalar that we'd load as vector and rein… #6809

Closed
wants to merge 1 commit into from

Conversation

LebedevRI
Copy link
Contributor

…terpret

Second (of three) pieces of the load widening puzzle.
Here, the codegen is taught to directly emit scalar loads,
instead of doing vector load and bitcasting it.

Refs. #6801
Refs. #6756
Refs. #6775

Copy link
Contributor Author

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only has basic (stride=+1/-1) correctness test coverage,
but no tests that would ensure that the load actually got widened.
If we'd do that at halide ir level, it would be obvious to check,
but here i'm not really sure how to do that.

bool test_all(Target t) {
bool success = true;

success &= test_with_chunk_type<uint8_t>(t);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some kind of failure to simplify for the i8 chunk type somewhere in halide:

   let t5 = b0[(store.min.0 + store.s0.x.rebased)*2]
   store[store.s0.x.rebased] = uint64((uint16)reinterpret(ramp(t5, b0[((store.min.0 + store.s0.x.rebased)*2) + 1] - t5, 2)))

but suddenly it's fine for larger types:

   store$1[store$1.s0.x.rebased] = uint64((uint32)reinterpret(b3[ramp((store$1.min.0 + store$1.s0.x.rebased)*4, 1, 4) aligned(4, 0)]))

…terpret

Second (of three) pieces of the load widening puzzle.
Here, the codegen is taught to directly emit scalar loads,
instead of doing vector load and `bitcast`ing it.

Refs. halide#6801
Refs. halide#6756
Refs. halide#6775
LoadInst *load = builder->CreateAlignedLoad(
llvm_dst, ptr, llvm::Align(l->type.bytes()));
// FIXME: can we emit better TBAA for constant indexes here?
add_tbaa_metadata(load, l->name, /*index=*/Expr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's correct to use the original load index here, because the tbaa metadata is all in terms of the allocated type before the reinterpret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, wait, of course it is, because we really don't change which bytes we load.

@abadams
Copy link
Member

abadams commented Jun 14, 2022

The implementation makes sense, aside from my concerns about using the reinterpret intrinsic to change vector lanes, but the test is not great - We don't use vector types in front-end code, and I have no idea what might break if we did. I guess it's OK as a temporary measure, though the weird failure with uint8s is alarming and might indicate that some assumption is being violated somewhere.

@LebedevRI
Copy link
Contributor Author

The implementation makes sense, aside from my concerns about using the reinterpret intrinsic to change vector lanes, but the test is not great - We don't use vector types in front-end code, and I have no idea what might break if we did.

Right. Well, i'm not sure how else to write that test given what's currently available :)

the weird failure with uint8s is alarming and might indicate that some assumption is being violated somewhere.

It's some missed simplification. In good case, it is simplified during Lowering after second simplifcation:,
but clearly not in 2xi8 case...

@steven-johnson
Copy link
Contributor

Where does this PR stand?

@LebedevRI
Copy link
Contributor Author

I guess, reinterpret intrinsic first needs to be promoted into a first-class IR node.

@LebedevRI LebedevRI marked this pull request as draft July 8, 2022 16:01
@steven-johnson
Copy link
Contributor

Is this PR still active?

@LebedevRI
Copy link
Contributor Author

LebedevRI commented Jul 26, 2022

Is this PR still active?

Yes, it's all connected (©)!
The rough queue is:

@LebedevRI LebedevRI closed this Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants