Skip to content

[SYCL] [FPGA] Fix num_simd_work_items and reqd_work_group_size argument check #3728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 129 additions & 7 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -2503,26 +2503,77 @@ device kernel, the attribute is not ignored and it is propagated to the kernel.
[[intel::num_simd_work_items(N)]] void operator()() const {}
};

If the`` intel::reqd_work_group_size`` or ``cl::reqd_work_group_size``
attribute is specified on a declaration along with a
intel::num_simd_work_items attribute, the work group size attribute
argument (the first argument) must be evenly divisible by the argument specified
in the ``intel::num_simd_work_items`` attribute.
If the ``reqd_work_group_size`` attribute is specified on a declaration along
with ``num_simd_work_items``, the required work group size specified
by ``num_simd_work_items`` attribute must evenly divide the index that
increments fastest in the ``reqd_work_group_size`` attribute.

The arguments to ``reqd_work_group_size`` are ordered based on which index
increments the fastest. In OpenCL, the first argument is the index that
increments the fastest, and in SYCL, the last argument is the index that
increments the fastest.

In OpenCL, all three arguments are required.

In SYCL, the attribute accepts either one, two, or three arguments; in each
form, the last (or only) argument is the index that increments fastest.
The number of arguments passed to the attribute must match the dimensionality
of the kernel the attribute is applied to.

.. code-block:: c++

// Note, '64' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct func {
[[intel::num_simd_work_items(4)]]
[[intel::reqd_work_group_size(64, 64, 64)]]
[[intel::reqd_work_group_size(7, 4, 64)]]
void operator()() const {}
};

// Note, '8' is evenly divisible by '8'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct bar {
[[intel::reqd_work_group_size(64, 64, 64)]]
[[intel::reqd_work_group_size(1, 1, 8)]]
[[intel::num_simd_work_items(8)]]
void operator()() const {}
};

// Note, '10' is evenly divisible by '5'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[cl::reqd_work_group_size(7, 5, 10)]]
[[intel::num_simd_work_items(5)]] void fun2() {}

// Note, '8' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[intel::num_simd_work_items(4)]]
[[cl::reqd_work_group_size(5, 4, 8)]] void fun3() {}

// Note, '8' is evenly divisible by '8'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct func1 {
[[intel::num_simd_work_items(8)]]
[[cl::reqd_work_group_size(1, 1, 8)]]
void operator()() const {}
};

// Note, '8' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct bar1 {
[[cl::reqd_work_group_size(7, 4, 8)]]
[[intel::num_simd_work_items(4)]]
void operator()() const {}
};

// Note, '4' is evenly divisible by '2'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[intel::num_simd_work_items(2)]]
__attribute__((reqd_work_group_size(3, 2, 4))) void test();

// Note, '8' is evenly divisible by '2'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
__attribute__((reqd_work_group_size(3, 2, 8)))
[intel::num_simd_work_items(2)]] void test();

}];
}

Expand Down Expand Up @@ -2636,6 +2687,77 @@ In OpenCL C, this attribute is available in GNU spelling

__kernel __attribute__((reqd_work_group_size(8, 16, 32))) void test() {}

The arguments to ``reqd_work_group_size`` are ordered based on which index
increments the fastest. In OpenCL, the first argument is the index that
increments the fastest, and in SYCL, the last argument is the index that
increments the fastest.

In OpenCL, all three arguments are required.

In SYCL, the attribute accepts either one, two, or three arguments; in each
form, the last (or only) argument is the index that increments fastest. The
number of arguments passed to the attribute must match the dimensionality of
the kernel the attribute is applied to.

If the ``reqd_work_group_size attribute`` is specified on a declaration along
with ``num_simd_work_items``, the required work group size specified by
``num_simd_work_items`` must evenly divide the index that increments fastest
in the ``reqd_work_group_size`` attribute.

.. code-block:: c++

// Note, '64' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct func {
[[intel::num_simd_work_items(4)]]
[[intel::reqd_work_group_size(7, 4, 64)]]
void operator()() const {}
};

// Note, '8' is evenly divisible by '8'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct bar {
[[intel::reqd_work_group_size(1, 1, 8)]]
[[intel::num_simd_work_items(8)]]
void operator()() const {}
};

// Note, '10' is evenly divisible by '5'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[cl::reqd_work_group_size(7, 5, 10)]]
[[intel::num_simd_work_items(5)]] void fun2() {}

// Note, '8' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[intel::num_simd_work_items(4)]]
[[cl::reqd_work_group_size(5, 4, 8)]] void fun3() {}

// Note, '8' is evenly divisible by '8'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct func1 {
[[intel::num_simd_work_items(8)]]
[[cl::reqd_work_group_size(1, 1, 8)]]
void operator()() const {}
};

// Note, '8' is evenly divisible by '4'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
struct bar1 {
[[cl::reqd_work_group_size(7, 4, 8)]]
[[intel::num_simd_work_items(4)]]
void operator()() const {}
};

// Note, '4' is evenly divisible by '2'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
[[intel::num_simd_work_items(2)]]
__attribute__((reqd_work_group_size(3, 2, 4))) void test();

// Note, '8' is evenly divisible by '2'; in SYCL, the last
// argument to the attribute is the one which increments fastest.
__attribute__((reqd_work_group_size(3, 2, 8)))
[intel::num_simd_work_items(2)]] void test();

}];
}

Expand Down
49 changes: 39 additions & 10 deletions clang/lib/Sema/SemaDeclAttr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3086,17 +3086,25 @@ static void handleWorkGroupSize(Sema &S, Decl *D, const ParsedAttr &AL) {
return;
ZDimExpr = ZDim.get();

// If the num_simd_work_items attribute is specified on a declaration it
// must evenly divide the index that increments fastest in the
// reqd_work_group_size attribute. In OpenCL, the first argument increments
// the fastest, and in SYCL, the last argument increments the fastest.
if (const auto *A = D->getAttr<SYCLIntelNumSimdWorkItemsAttr>()) {
int64_t NumSimdWorkItems =
A->getValue()->getIntegerConstantExpr(Ctx)->getSExtValue();

if (XDimVal.getZExtValue() % NumSimdWorkItems != 0) {
unsigned WorkGroupSize = S.getLangOpts().OpenCL ? XDimVal.getZExtValue()
: ZDimVal.getZExtValue();

if (WorkGroupSize % NumSimdWorkItems != 0) {
S.Diag(A->getLocation(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< A << AL;
S.Diag(AL.getLoc(), diag::note_conflicting_attribute);
return;
}
}

if (const auto *ExistingAttr = D->getAttr<WorkGroupAttr>()) {
// Compare attribute arguments value and warn for a mismatch.
if (ExistingAttr->getXDimVal(Ctx) != XDimVal ||
Expand Down Expand Up @@ -3280,17 +3288,38 @@ void Sema::AddSYCLIntelNumSimdWorkItemsAttr(Decl *D,
}
}

// If the declaration has an [[intel::reqd_work_group_size]] attribute,
// check to see if the first argument can be evenly divided by the
// num_simd_work_items attribute.
// If the reqd_work_group_size attribute is specified on a declaration
// along with num_simd_work_items, the required work group size specified
// by num_simd_work_items attribute must evenly divide the index that
// increments fastest in the reqd_work_group_size attribute.
//
// The arguments to reqd_work_group_size are ordered based on which index
// increments the fastest. In OpenCL, the first argument is the index that
// increments the fastest, and in SYCL, the last argument is the index that
// increments the fastest.
if (const auto *DeclAttr = D->getAttr<ReqdWorkGroupSizeAttr>()) {
Optional<llvm::APSInt> XDimVal = DeclAttr->getXDimVal(Context);
Expr *XDimExpr = DeclAttr->getXDim();
Expr *YDimExpr = DeclAttr->getYDim();
Expr *ZDimExpr = DeclAttr->getZDim();

if (*XDimVal % ArgVal != 0) {
Diag(CI.getLoc(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< CI << DeclAttr;
Diag(DeclAttr->getLocation(), diag::note_conflicting_attribute);
return;
if (!XDimExpr->isValueDependent() && !YDimExpr->isValueDependent() &&
!ZDimExpr->isValueDependent()) {
llvm::APSInt XDimVal, ZDimVal;
ExprResult XDim = VerifyIntegerConstantExpression(XDimExpr, &XDimVal);
ExprResult ZDim = VerifyIntegerConstantExpression(ZDimExpr, &ZDimVal);

if (XDim.isInvalid() || ZDim.isInvalid())
return;

unsigned WorkGroupSize = getLangOpts().OpenCL ? XDimVal.getZExtValue()
: ZDimVal.getZExtValue();

if (WorkGroupSize % ArgVal.getSExtValue() != 0) {
Diag(CI.getLoc(), diag::err_sycl_num_kernel_wrong_reqd_wg_size)
<< CI << DeclAttr;
Diag(DeclAttr->getLocation(), diag::note_conflicting_attribute);
return;
}
}
}
}
Expand Down
Loading