[SYCL] Fix address space in casts from derived to base class #512

Fznamznon · 2019-08-14T18:57:26Z

In case of multiple inheritance of non-empty classes clang emits two
bitcasts and a GEP for conversion from derived to a base class. First
bitcast converts pointer to derived class to int8 pointer, then GEP
takes this int8 pointer and applies base class offset, next second
bitcast converts int8 pointer with offset to base class pointer. With
new SYCL address space rules pointer to derived class may have generic
address space. In this case two address space casts instead of bitcasts
should be emitted.
This problem caused assertion fail inside the CodeGen and invalid module
generation.

Signed-off-by: Mariya Podchishchaeva mariya.podchishchaeva@intel.com

asavonic · 2019-08-14T19:13:43Z

clang/lib/CodeGen/CGClass.cpp

Shouldn't you cast a pointer to an i8* pointer of the same address space? Casting from generic (or actually any other than private) address space to private is illegal, unless you know the actual address space of a pointer.

Could you please provide a link to SPIRV/OpenCL specification which tells that cast from generic to private is illegal?

There is no spec that defines OpenCL/SYCL address spaces in LLVM IR (unless you consider the old SPIR spec).

Generally, if you have a generic pointer and you want to cast it to global/local/private, you should do this if and only if you know for sure that this generic pointer is in fact global, local, or private, and it can never be in any other address space. In other words, you have to preserve the original address space of a pointer or leave it generic. Otherwise, you're basically lying to the compiler, and this could result in miscompilation, or an inference pass can go crazy when it finds out the original address space and a mismatch between source and dest address spaces in an addrspacecast instruction.

So, seems that actually it's not illegal, but it probably could lead to address space inference failure.
I was a little bit confused, because OpenCL allows casts from generic address space to named address spaces and I think it's easy to get address space cast from generic address space to named address space in IR generated from OpenCL.
Okay, I will change this patch to emit int8* pointer of the same address space for safety reason.

In case of multiple inheritance of non-empty classes clang emits two bitcasts and a GEP for conversion from derived to a base class. First bitcast converts pointer to derived class to int8 pointer, then GEP takes this int8 pointer and applies base class offset, next second bitcast converts int8 pointer with offset to base class pointer. With new SYCL address space rules pointer to derived class may have generic address space. In this case two bitcasts to int8 pointer with generic address space should be emitted. This problem caused assertion fail inside the CodeGen and invalid module generation. Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>

Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>

Fznamznon · 2019-08-16T13:34:28Z

@asavonic, are all your concerns addressed?

asavonic

LGTM. Please fix the name in CGClass.cpp

asavonic · 2019-08-16T13:39:26Z

clang/lib/CodeGen/CGClass.cpp

  // Apply the base offset.
  llvm::Value *ptr = addr.getPointer();
-  ptr = CGF.Builder.CreateBitCast(ptr, CGF.Int8PtrTy);
+  llvm::Type *ResTy = llvm::PointerType::getInt8PtrTy(


ResTy name is not consistent with the rest of the names in this function.

Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, #16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, #16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, #48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, #32] mul x3, x15, x1 cmp x9, #512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437

Fznamznon requested a review from asavonic August 14, 2019 18:57

Fznamznon assigned asavonic Aug 14, 2019

asavonic reviewed Aug 14, 2019

View reviewed changes

Fznamznon force-pushed the private/mpodchis/fixasforderived branch from aefb169 to 0f7903f Compare August 15, 2019 16:44

Apply reviewer's comment

afb9089

Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>

asavonic previously approved these changes Aug 16, 2019

View reviewed changes

Fznamznon dismissed asavonic’s stale review via afb9089 August 16, 2019 14:00

asavonic approved these changes Aug 16, 2019

View reviewed changes

bader merged commit 76e223c into intel:sycl Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Fix address space in casts from derived to base class #512

[SYCL] Fix address space in casts from derived to base class #512

Uh oh!

Fznamznon commented Aug 14, 2019

Uh oh!

asavonic Aug 14, 2019

Uh oh!

Fznamznon Aug 15, 2019

Uh oh!

asavonic Aug 15, 2019

Uh oh!

Fznamznon Aug 15, 2019

Uh oh!

Fznamznon commented Aug 16, 2019

Uh oh!

asavonic left a comment

Uh oh!

asavonic Aug 16, 2019

Uh oh!

Fznamznon Aug 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SYCL] Fix address space in casts from derived to base class #512

[SYCL] Fix address space in casts from derived to base class #512

Uh oh!

Conversation

Fznamznon commented Aug 14, 2019

Uh oh!

asavonic Aug 14, 2019

Choose a reason for hiding this comment

Uh oh!

Fznamznon Aug 15, 2019

Choose a reason for hiding this comment

Uh oh!

asavonic Aug 15, 2019

Choose a reason for hiding this comment

Uh oh!

Fznamznon Aug 15, 2019

Choose a reason for hiding this comment

Uh oh!

Fznamznon commented Aug 16, 2019

Uh oh!

asavonic left a comment

Choose a reason for hiding this comment

Uh oh!

asavonic Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

Fznamznon Aug 16, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants