diff --git a/clang-tools-extra/docs/clang-tidy/Contributing.rst b/clang-tools-extra/docs/clang-tidy/Contributing.rst index 92074bd4dae8b..b04809c3308f1 100644 --- a/clang-tools-extra/docs/clang-tidy/Contributing.rst +++ b/clang-tools-extra/docs/clang-tidy/Contributing.rst @@ -127,14 +127,15 @@ Writing a clang-tidy Check So you have an idea of a useful check for :program:`clang-tidy`. -First, if you're not familiar with LLVM development, read through the `Getting -Started with LLVM`_ document for instructions on setting up your workflow and +First, if you're not familiar with LLVM development, read through the `Getting Started +with the LLVM System`_ document for instructions on setting up your workflow and the `LLVM Coding Standards`_ document to familiarize yourself with the coding -style used in the project. For code reviews we mostly use `LLVM Phabricator`_. +style used in the project. For code reviews we currently use `LLVM Github`_, +though historically we used Phabricator. -.. _Getting Started with LLVM: https://llvm.org/docs/GettingStarted.html +.. _Getting Started with the LLVM System: https://llvm.org/docs/GettingStarted.html .. _LLVM Coding Standards: https://llvm.org/docs/CodingStandards.html -.. _LLVM Phabricator: https://llvm.org/docs/Phabricator.html +.. _LLVM Github: https://github.com/llvm/llvm-project Next, you need to decide which module the check belongs to. Modules are located in subdirectories of `clang-tidy/ @@ -336,13 +337,24 @@ a starting point for your test cases. A rough outline of the process looks like The quickest way to prototype your matcher is to use :program:`clang-query` to interactively build up your matcher. For complicated matchers, build up a matching expression incrementally and use :program:`clang-query`'s ``let`` command to save named -matching expressions to simplify your matcher. Just like breaking up a huge function -into smaller chunks with intention-revealing names can help you understand a complex -algorithm, breaking up a matcher into smaller matchers with intention-revealing names -can help you understand a complicated matcher. Once you have a working matcher, the -C++ API will be virtually identical to your interactively constructed matcher. You can -use local variables to preserve your intention-revealing names that you applied to -nested matchers. +matching expressions to simplify your matcher. + +.. code-block:: console + clang-query> let c1 cxxRecordDecl() + clang-query> match c1 + +Alternatively, pressing the tab key after a previous matcher's open parentheses would also +show which matchers can be chained with the previous matcher, though some matchers that work +may not be listed. + +Just like breaking up a huge function into smaller chunks with intention-revealing names +can help you understand a complex algorithm, breaking up a matcher into smaller matchers +with intention-revealing names can help you understand a complicated matcher. + +Once you have a working clang-query matcher, the C++ API matchers will be the same or similar +to your interactively constructed matcher (there can be cases where they differ slightly). +You can use local variables to preserve your intention-revealing names that you applied +to nested matchers. Creating private matchers ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -646,10 +658,13 @@ directory. The path to this directory is available in a lit test with the varia Out-of-tree check plugins ------------------------- + Developing an out-of-tree check as a plugin largely follows the steps -outlined above. The plugin is a shared library whose code lives outside +outlined above, including creating a new module and doing the hacks to +register the module. The plugin is a shared library whose code lives outside the clang-tidy build system. Build and link this shared library against -LLVM as done for other kinds of Clang plugins. +LLVM as done for other kinds of Clang plugins. If using CMake, use the keyword +``MODULE`` while invoking ``add_library`` or ``llvm_add_library``. The plugin can be loaded by passing `-load` to `clang-tidy` in addition to the names of the checks to enable. @@ -664,6 +679,19 @@ compiled against the version of clang-tidy that will be loading the plugin. The plugins can use threads, TLS, or any other facilities available to in-tree code which is accessible from the external headers. +Note that testing out-of-tree checks might involve getting ``llvm-lit`` from an LLVM +installation compiled from source. See `Getting Started with the LLVM System`_ for ways +to do so. + +Alternatively, get `lit`_ following the `test-suite guide`_ and get the `FileCheck`_ binary, +and write a version of `check_clang_tidy.py`_ to suit your needs. + +.. _Getting Started with the LLVM System: https://llvm.org/docs/GettingStarted.html +.. _test-suite guide: https://llvm.org/docs/TestSuiteGuide.html +.. _lit: https://llvm.org/docs/CommandGuide/lit.html +.. _FileCheck: https://llvm.org/docs/CommandGuide/FileCheck.html +.. _check_clang_tidy.py: https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/test/clang-tidy/check_clang_tidy.py + Running clang-tidy on LLVM -------------------------- @@ -688,10 +716,10 @@ warnings and errors. The script provides multiple configuration flags. * To restrict the files examined you can provide one or more regex arguments that the file names are matched against. - ``run-clang-tidy.py clang-tidy/.*Check\.cpp`` will only analyze clang-tidy + ``run-clang-tidy.py clang-tidy/.*Check\.cpp`` will only analyze `clang-tidy` checks. It may also be necessary to restrict the header files that warnings - are displayed from using the ``-header-filter`` flag. It has the same behavior - as the corresponding :program:`clang-tidy` flag. + are displayed from by using the ``-header-filter`` and ``-exclude-header-filter`` flags. + They have the same behavior as the corresponding :program:`clang-tidy` flags. * To apply suggested fixes ``-fix`` can be passed as an argument. This gathers all changes in a temporary directory and applies them. Passing ``-format`` @@ -758,4 +786,4 @@ There is only one argument that controls profile storage: * If you run :program:`clang-tidy` from within ``/foo`` directory, and specify ``-store-check-profile=.``, then the profile will still be saved to - ``/foo/-example.cpp.json`` + ``/foo/-example.cpp.json`` \ No newline at end of file diff --git a/clang/test/Headers/__clang_hip_math.hip b/clang/test/Headers/__clang_hip_math.hip index 6ee10976f1207..9d202e0d04682 100644 --- a/clang/test/Headers/__clang_hip_math.hip +++ b/clang/test/Headers/__clang_hip_math.hip @@ -2361,198 +2361,395 @@ extern "C" __device__ double test_modf(double x, double* y) { return modf(x, y); } -// CHECK-LABEL: @test_nanf( -// CHECK-NEXT: entry: -// CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 -// CHECK-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] -// CHECK: if.then.i.i: -// CHECK-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 -// CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ -// CHECK-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] -// CHECK-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] -// CHECK-NEXT: ] -// CHECK: while.cond.i30.i.i.preheader: -// CHECK-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] -// CHECK: while.cond.i30.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] -// CHECK-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] -// CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL4NANFPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] -// CHECK: while.body.i34.i.i: -// CHECK-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 -// CHECK-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 -// CHECK-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] -// CHECK: if.else.i.i.i: -// CHECK-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 -// CHECK-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 -// CHECK-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] -// CHECK: if.else17.i.i.i: -// CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 -// CHECK-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 -// CHECK-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] -// CHECK: if.end31.i.i.i: -// CHECK-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 -// CHECK-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 -// CHECK-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] -// CHECK-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I36_I_I]] -// CHECK: cleanup.i36.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP11]] -// CHECK: while.cond.i.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] -// CHECK-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] -// CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] -// CHECK: while.body.i.i.i: -// CHECK-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 -// CHECK-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 -// CHECK-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] -// CHECK: if.then.i.i.i: -// CHECK-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 -// CHECK-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 -// CHECK-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 -// CHECK-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I_I_I]] -// CHECK: cleanup.i.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] -// CHECK-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] -// CHECK-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP7]] -// CHECK: while.cond.i14.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] -// CHECK-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] -// CHECK-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] -// CHECK: while.body.i18.i.i: -// CHECK-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 -// CHECK-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 -// CHECK-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] -// CHECK: if.then.i24.i.i: -// CHECK-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 -// CHECK-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 -// CHECK-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 -// CHECK-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I20_I_I]] -// CHECK: cleanup.i20.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] -// CHECK-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] -// CHECK-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP10]] -// CHECK: _ZL4nanfPKc.exit: -// CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] -// CHECK-NEXT: [[CONV_I:%.*]] = trunc i64 [[RETVAL_0_I_I]] to i32 -// CHECK-NEXT: [[BF_VALUE_I:%.*]] = and i32 [[CONV_I]], 4194303 -// CHECK-NEXT: [[BF_SET9_I:%.*]] = or disjoint i32 [[BF_VALUE_I]], 2143289344 -// CHECK-NEXT: [[TMP10:%.*]] = bitcast i32 [[BF_SET9_I]] to float -// CHECK-NEXT: ret float [[TMP10]] +// DEFAULT-LABEL: @test_nanf( +// DEFAULT-NEXT: entry: +// DEFAULT-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 +// DEFAULT-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] +// DEFAULT: if.then.i.i: +// DEFAULT-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 +// DEFAULT-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ +// DEFAULT-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] +// DEFAULT-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] +// DEFAULT-NEXT: ] +// DEFAULT: while.cond.i30.i.i.preheader: +// DEFAULT-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] +// DEFAULT: while.cond.i30.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] +// DEFAULT-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] +// DEFAULT-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL4NANFPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] +// DEFAULT: while.body.i34.i.i: +// DEFAULT-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 +// DEFAULT-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 +// DEFAULT-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] +// DEFAULT: if.else.i.i.i: +// DEFAULT-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 +// DEFAULT-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 +// DEFAULT-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] +// DEFAULT: if.else17.i.i.i: +// DEFAULT-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 +// DEFAULT-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 +// DEFAULT-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] +// DEFAULT: if.end31.i.i.i: +// DEFAULT-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 +// DEFAULT-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 +// DEFAULT-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] +// DEFAULT-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I36_I_I]] +// DEFAULT: cleanup.i36.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP11]] +// DEFAULT: while.cond.i.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] +// DEFAULT-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] +// DEFAULT-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] +// DEFAULT: while.body.i.i.i: +// DEFAULT-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 +// DEFAULT-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 +// DEFAULT-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] +// DEFAULT: if.then.i.i.i: +// DEFAULT-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 +// DEFAULT-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 +// DEFAULT-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 +// DEFAULT-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I_I_I]] +// DEFAULT: cleanup.i.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// DEFAULT-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// DEFAULT-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP7]] +// DEFAULT: while.cond.i14.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] +// DEFAULT-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] +// DEFAULT-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] +// DEFAULT: while.body.i18.i.i: +// DEFAULT-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 +// DEFAULT-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 +// DEFAULT-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] +// DEFAULT: if.then.i24.i.i: +// DEFAULT-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 +// DEFAULT-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 +// DEFAULT-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 +// DEFAULT-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I20_I_I]] +// DEFAULT: cleanup.i20.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] +// DEFAULT-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] +// DEFAULT-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP10]] +// DEFAULT: _ZL4nanfPKc.exit: +// DEFAULT-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] +// DEFAULT-NEXT: [[CONV_I:%.*]] = trunc i64 [[RETVAL_0_I_I]] to i32 +// DEFAULT-NEXT: [[BF_VALUE_I:%.*]] = and i32 [[CONV_I]], 4194303 +// DEFAULT-NEXT: [[BF_SET9_I:%.*]] = or disjoint i32 [[BF_VALUE_I]], 2143289344 +// DEFAULT-NEXT: [[TMP10:%.*]] = bitcast i32 [[BF_SET9_I]] to float +// DEFAULT-NEXT: ret float [[TMP10]] +// +// FINITEONLY-LABEL: @test_nanf( +// FINITEONLY-NEXT: entry: +// FINITEONLY-NEXT: ret float poison +// +// APPROX-LABEL: @test_nanf( +// APPROX-NEXT: entry: +// APPROX-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 +// APPROX-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] +// APPROX: if.then.i.i: +// APPROX-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 +// APPROX-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ +// APPROX-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] +// APPROX-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] +// APPROX-NEXT: ] +// APPROX: while.cond.i30.i.i.preheader: +// APPROX-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] +// APPROX: while.cond.i30.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] +// APPROX-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] +// APPROX-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL4NANFPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] +// APPROX: while.body.i34.i.i: +// APPROX-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 +// APPROX-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 +// APPROX-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] +// APPROX: if.else.i.i.i: +// APPROX-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 +// APPROX-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 +// APPROX-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] +// APPROX: if.else17.i.i.i: +// APPROX-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 +// APPROX-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 +// APPROX-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] +// APPROX: if.end31.i.i.i: +// APPROX-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 +// APPROX-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 +// APPROX-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] +// APPROX-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I36_I_I]] +// APPROX: cleanup.i36.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP11]] +// APPROX: while.cond.i.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] +// APPROX-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] +// APPROX-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] +// APPROX: while.body.i.i.i: +// APPROX-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 +// APPROX-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 +// APPROX-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] +// APPROX: if.then.i.i.i: +// APPROX-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 +// APPROX-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 +// APPROX-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 +// APPROX-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I_I_I]] +// APPROX: cleanup.i.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// APPROX-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// APPROX-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP7]] +// APPROX: while.cond.i14.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] +// APPROX-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] +// APPROX-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL4NANFPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] +// APPROX: while.body.i18.i.i: +// APPROX-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 +// APPROX-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 +// APPROX-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] +// APPROX: if.then.i24.i.i: +// APPROX-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 +// APPROX-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 +// APPROX-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 +// APPROX-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I20_I_I]] +// APPROX: cleanup.i20.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] +// APPROX-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] +// APPROX-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL4NANFPKC_EXIT]], !llvm.loop [[LOOP10]] +// APPROX: _ZL4nanfPKc.exit: +// APPROX-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] +// APPROX-NEXT: [[CONV_I:%.*]] = trunc i64 [[RETVAL_0_I_I]] to i32 +// APPROX-NEXT: [[BF_VALUE_I:%.*]] = and i32 [[CONV_I]], 4194303 +// APPROX-NEXT: [[BF_SET9_I:%.*]] = or disjoint i32 [[BF_VALUE_I]], 2143289344 +// APPROX-NEXT: [[TMP10:%.*]] = bitcast i32 [[BF_SET9_I]] to float +// APPROX-NEXT: ret float [[TMP10]] // extern "C" __device__ float test_nanf(const char *tag) { return nanf(tag); } -// CHECK-LABEL: @test_nan( -// CHECK-NEXT: entry: -// CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 -// CHECK-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] -// CHECK: if.then.i.i: -// CHECK-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 -// CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ -// CHECK-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] -// CHECK-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] -// CHECK-NEXT: ] -// CHECK: while.cond.i30.i.i.preheader: -// CHECK-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] -// CHECK: while.cond.i30.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] -// CHECK-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] -// CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL3NANPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] -// CHECK: while.body.i34.i.i: -// CHECK-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 -// CHECK-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 -// CHECK-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] -// CHECK: if.else.i.i.i: -// CHECK-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 -// CHECK-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 -// CHECK-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] -// CHECK: if.else17.i.i.i: -// CHECK-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 -// CHECK-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 -// CHECK-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] -// CHECK: if.end31.i.i.i: -// CHECK-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 -// CHECK-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 -// CHECK-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] -// CHECK-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I36_I_I]] -// CHECK: cleanup.i36.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] -// CHECK-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP11]] -// CHECK: while.cond.i.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] -// CHECK-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] -// CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] -// CHECK: while.body.i.i.i: -// CHECK-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 -// CHECK-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 -// CHECK-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] -// CHECK: if.then.i.i.i: -// CHECK-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 -// CHECK-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 -// CHECK-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 -// CHECK-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I_I_I]] -// CHECK: cleanup.i.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] -// CHECK-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] -// CHECK-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP7]] -// CHECK: while.cond.i14.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] -// CHECK-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] -// CHECK-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] -// CHECK-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 -// CHECK-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] -// CHECK: while.body.i18.i.i: -// CHECK-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 -// CHECK-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 -// CHECK-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] -// CHECK: if.then.i24.i.i: -// CHECK-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 -// CHECK-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 -// CHECK-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 -// CHECK-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] -// CHECK-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 -// CHECK-NEXT: br label [[CLEANUP_I20_I_I]] -// CHECK: cleanup.i20.i.i: -// CHECK-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] -// CHECK-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] -// CHECK-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP10]] -// CHECK: _ZL3nanPKc.exit: -// CHECK-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] -// CHECK-NEXT: [[BF_VALUE_I:%.*]] = and i64 [[RETVAL_0_I_I]], 2251799813685247 -// CHECK-NEXT: [[BF_SET9_I:%.*]] = or disjoint i64 [[BF_VALUE_I]], 9221120237041090560 -// CHECK-NEXT: [[TMP10:%.*]] = bitcast i64 [[BF_SET9_I]] to double -// CHECK-NEXT: ret double [[TMP10]] +// DEFAULT-LABEL: @test_nan( +// DEFAULT-NEXT: entry: +// DEFAULT-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 +// DEFAULT-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] +// DEFAULT: if.then.i.i: +// DEFAULT-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 +// DEFAULT-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ +// DEFAULT-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] +// DEFAULT-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] +// DEFAULT-NEXT: ] +// DEFAULT: while.cond.i30.i.i.preheader: +// DEFAULT-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] +// DEFAULT: while.cond.i30.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] +// DEFAULT-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] +// DEFAULT-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL3NANPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] +// DEFAULT: while.body.i34.i.i: +// DEFAULT-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 +// DEFAULT-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 +// DEFAULT-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] +// DEFAULT: if.else.i.i.i: +// DEFAULT-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 +// DEFAULT-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 +// DEFAULT-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] +// DEFAULT: if.else17.i.i.i: +// DEFAULT-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 +// DEFAULT-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 +// DEFAULT-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] +// DEFAULT: if.end31.i.i.i: +// DEFAULT-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 +// DEFAULT-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 +// DEFAULT-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] +// DEFAULT-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I36_I_I]] +// DEFAULT: cleanup.i36.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] +// DEFAULT-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP11]] +// DEFAULT: while.cond.i.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] +// DEFAULT-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] +// DEFAULT-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] +// DEFAULT: while.body.i.i.i: +// DEFAULT-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 +// DEFAULT-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 +// DEFAULT-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] +// DEFAULT: if.then.i.i.i: +// DEFAULT-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 +// DEFAULT-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 +// DEFAULT-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 +// DEFAULT-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I_I_I]] +// DEFAULT: cleanup.i.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// DEFAULT-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// DEFAULT-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP7]] +// DEFAULT: while.cond.i14.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] +// DEFAULT-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] +// DEFAULT-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] +// DEFAULT-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 +// DEFAULT-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] +// DEFAULT: while.body.i18.i.i: +// DEFAULT-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 +// DEFAULT-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 +// DEFAULT-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] +// DEFAULT: if.then.i24.i.i: +// DEFAULT-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 +// DEFAULT-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 +// DEFAULT-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 +// DEFAULT-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] +// DEFAULT-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 +// DEFAULT-NEXT: br label [[CLEANUP_I20_I_I]] +// DEFAULT: cleanup.i20.i.i: +// DEFAULT-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] +// DEFAULT-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] +// DEFAULT-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP10]] +// DEFAULT: _ZL3nanPKc.exit: +// DEFAULT-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] +// DEFAULT-NEXT: [[BF_VALUE_I:%.*]] = and i64 [[RETVAL_0_I_I]], 2251799813685247 +// DEFAULT-NEXT: [[BF_SET9_I:%.*]] = or disjoint i64 [[BF_VALUE_I]], 9221120237041090560 +// DEFAULT-NEXT: [[TMP10:%.*]] = bitcast i64 [[BF_SET9_I]] to double +// DEFAULT-NEXT: ret double [[TMP10]] +// +// FINITEONLY-LABEL: @test_nan( +// FINITEONLY-NEXT: entry: +// FINITEONLY-NEXT: ret double poison +// +// APPROX-LABEL: @test_nan( +// APPROX-NEXT: entry: +// APPROX-NEXT: [[TMP0:%.*]] = load i8, ptr [[TAG:%.*]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP0]], 48 +// APPROX-NEXT: br i1 [[CMP_I_I]], label [[IF_THEN_I_I:%.*]], label [[WHILE_COND_I14_I_I:%.*]] +// APPROX: if.then.i.i: +// APPROX-NEXT: [[INCDEC_PTR_I_I:%.*]] = getelementptr inbounds i8, ptr [[TAG]], i64 1 +// APPROX-NEXT: [[TMP1:%.*]] = load i8, ptr [[INCDEC_PTR_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: switch i8 [[TMP1]], label [[WHILE_COND_I_I_I:%.*]] [ +// APPROX-NEXT: i8 120, label [[WHILE_COND_I30_I_I_PREHEADER:%.*]] +// APPROX-NEXT: i8 88, label [[WHILE_COND_I30_I_I_PREHEADER]] +// APPROX-NEXT: ] +// APPROX: while.cond.i30.i.i.preheader: +// APPROX-NEXT: br label [[WHILE_COND_I30_I_I:%.*]] +// APPROX: while.cond.i30.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I31_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I37_I_I:%.*]], [[CLEANUP_I36_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[WHILE_COND_I30_I_I_PREHEADER]] ] +// APPROX-NEXT: [[__R_0_I32_I_I:%.*]] = phi i64 [ [[__R_2_I_I_I:%.*]], [[CLEANUP_I36_I_I]] ], [ 0, [[WHILE_COND_I30_I_I_PREHEADER]] ] +// APPROX-NEXT: [[TMP2:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I31_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I33_I_I:%.*]] = icmp eq i8 [[TMP2]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I33_I_I]], label [[_ZL3NANPKC_EXIT:%.*]], label [[WHILE_BODY_I34_I_I:%.*]] +// APPROX: while.body.i34.i.i: +// APPROX-NEXT: [[TMP3:%.*]] = add i8 [[TMP2]], -48 +// APPROX-NEXT: [[OR_COND_I35_I_I:%.*]] = icmp ult i8 [[TMP3]], 10 +// APPROX-NEXT: br i1 [[OR_COND_I35_I_I]], label [[IF_END31_I_I_I:%.*]], label [[IF_ELSE_I_I_I:%.*]] +// APPROX: if.else.i.i.i: +// APPROX-NEXT: [[TMP4:%.*]] = add i8 [[TMP2]], -97 +// APPROX-NEXT: [[OR_COND33_I_I_I:%.*]] = icmp ult i8 [[TMP4]], 6 +// APPROX-NEXT: br i1 [[OR_COND33_I_I_I]], label [[IF_END31_I_I_I]], label [[IF_ELSE17_I_I_I:%.*]] +// APPROX: if.else17.i.i.i: +// APPROX-NEXT: [[TMP5:%.*]] = add i8 [[TMP2]], -65 +// APPROX-NEXT: [[OR_COND34_I_I_I:%.*]] = icmp ult i8 [[TMP5]], 6 +// APPROX-NEXT: br i1 [[OR_COND34_I_I_I]], label [[IF_END31_I_I_I]], label [[CLEANUP_I36_I_I]] +// APPROX: if.end31.i.i.i: +// APPROX-NEXT: [[DOTSINK:%.*]] = phi i64 [ -48, [[WHILE_BODY_I34_I_I]] ], [ -87, [[IF_ELSE_I_I_I]] ], [ -55, [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[MUL24_I_I_I:%.*]] = shl i64 [[__R_0_I32_I_I]], 4 +// APPROX-NEXT: [[CONV25_I_I_I:%.*]] = zext nneg i8 [[TMP2]] to i64 +// APPROX-NEXT: [[ADD26_I_I_I:%.*]] = add i64 [[MUL24_I_I_I]], [[DOTSINK]] +// APPROX-NEXT: [[ADD28_I_I_I:%.*]] = add i64 [[ADD26_I_I_I]], [[CONV25_I_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I40_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I31_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I36_I_I]] +// APPROX: cleanup.i36.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I37_I_I]] = phi ptr [ [[INCDEC_PTR_I40_I_I]], [[IF_END31_I_I_I]] ], [ [[__TAGP_ADDR_0_I31_I_I]], [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[__R_2_I_I_I]] = phi i64 [ [[ADD28_I_I_I]], [[IF_END31_I_I_I]] ], [ [[__R_0_I32_I_I]], [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: [[COND_I_I_I:%.*]] = phi i1 [ true, [[IF_END31_I_I_I]] ], [ false, [[IF_ELSE17_I_I_I]] ] +// APPROX-NEXT: br i1 [[COND_I_I_I]], label [[WHILE_COND_I30_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP11]] +// APPROX: while.cond.i.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I_I_I:%.*]], [[CLEANUP_I_I_I:%.*]] ], [ [[INCDEC_PTR_I_I]], [[IF_THEN_I_I]] ] +// APPROX-NEXT: [[__R_0_I_I_I:%.*]] = phi i64 [ [[__R_1_I_I_I:%.*]], [[CLEANUP_I_I_I]] ], [ 0, [[IF_THEN_I_I]] ] +// APPROX-NEXT: [[TMP6:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I_I_I:%.*]] = icmp eq i8 [[TMP6]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I_I_I:%.*]] +// APPROX: while.body.i.i.i: +// APPROX-NEXT: [[TMP7:%.*]] = and i8 [[TMP6]], -8 +// APPROX-NEXT: [[OR_COND_I_I_I:%.*]] = icmp eq i8 [[TMP7]], 48 +// APPROX-NEXT: br i1 [[OR_COND_I_I_I]], label [[IF_THEN_I_I_I:%.*]], label [[CLEANUP_I_I_I]] +// APPROX: if.then.i.i.i: +// APPROX-NEXT: [[MUL_I_I_I:%.*]] = shl i64 [[__R_0_I_I_I]], 3 +// APPROX-NEXT: [[CONV5_I_I_I:%.*]] = zext nneg i8 [[TMP6]] to i64 +// APPROX-NEXT: [[ADD_I_I_I:%.*]] = add i64 [[MUL_I_I_I]], -48 +// APPROX-NEXT: [[SUB_I_I_I:%.*]] = add i64 [[ADD_I_I_I]], [[CONV5_I_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I_I_I]] +// APPROX: cleanup.i.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I_I_I]] = phi ptr [ [[INCDEC_PTR_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__TAGP_ADDR_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// APPROX-NEXT: [[__R_1_I_I_I]] = phi i64 [ [[SUB_I_I_I]], [[IF_THEN_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_BODY_I_I_I]] ] +// APPROX-NEXT: br i1 [[OR_COND_I_I_I]], label [[WHILE_COND_I_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP7]] +// APPROX: while.cond.i14.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_0_I15_I_I:%.*]] = phi ptr [ [[__TAGP_ADDR_1_I21_I_I:%.*]], [[CLEANUP_I20_I_I:%.*]] ], [ [[TAG]], [[ENTRY:%.*]] ] +// APPROX-NEXT: [[__R_0_I16_I_I:%.*]] = phi i64 [ [[__R_1_I22_I_I:%.*]], [[CLEANUP_I20_I_I]] ], [ 0, [[ENTRY]] ] +// APPROX-NEXT: [[TMP8:%.*]] = load i8, ptr [[__TAGP_ADDR_0_I15_I_I]], align 1, !tbaa [[TBAA4]] +// APPROX-NEXT: [[CMP_NOT_I17_I_I:%.*]] = icmp eq i8 [[TMP8]], 0 +// APPROX-NEXT: br i1 [[CMP_NOT_I17_I_I]], label [[_ZL3NANPKC_EXIT]], label [[WHILE_BODY_I18_I_I:%.*]] +// APPROX: while.body.i18.i.i: +// APPROX-NEXT: [[TMP9:%.*]] = add i8 [[TMP8]], -48 +// APPROX-NEXT: [[OR_COND_I19_I_I:%.*]] = icmp ult i8 [[TMP9]], 10 +// APPROX-NEXT: br i1 [[OR_COND_I19_I_I]], label [[IF_THEN_I24_I_I:%.*]], label [[CLEANUP_I20_I_I]] +// APPROX: if.then.i24.i.i: +// APPROX-NEXT: [[MUL_I25_I_I:%.*]] = mul i64 [[__R_0_I16_I_I]], 10 +// APPROX-NEXT: [[CONV5_I26_I_I:%.*]] = zext nneg i8 [[TMP8]] to i64 +// APPROX-NEXT: [[ADD_I27_I_I:%.*]] = add i64 [[MUL_I25_I_I]], -48 +// APPROX-NEXT: [[SUB_I28_I_I:%.*]] = add i64 [[ADD_I27_I_I]], [[CONV5_I26_I_I]] +// APPROX-NEXT: [[INCDEC_PTR_I29_I_I:%.*]] = getelementptr inbounds i8, ptr [[__TAGP_ADDR_0_I15_I_I]], i64 1 +// APPROX-NEXT: br label [[CLEANUP_I20_I_I]] +// APPROX: cleanup.i20.i.i: +// APPROX-NEXT: [[__TAGP_ADDR_1_I21_I_I]] = phi ptr [ [[INCDEC_PTR_I29_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__TAGP_ADDR_0_I15_I_I]], [[WHILE_BODY_I18_I_I]] ] +// APPROX-NEXT: [[__R_1_I22_I_I]] = phi i64 [ [[SUB_I28_I_I]], [[IF_THEN_I24_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_BODY_I18_I_I]] ] +// APPROX-NEXT: br i1 [[OR_COND_I19_I_I]], label [[WHILE_COND_I14_I_I]], label [[_ZL3NANPKC_EXIT]], !llvm.loop [[LOOP10]] +// APPROX: _ZL3nanPKc.exit: +// APPROX-NEXT: [[RETVAL_0_I_I:%.*]] = phi i64 [ 0, [[CLEANUP_I_I_I]] ], [ [[__R_0_I_I_I]], [[WHILE_COND_I_I_I]] ], [ 0, [[CLEANUP_I36_I_I]] ], [ [[__R_0_I32_I_I]], [[WHILE_COND_I30_I_I]] ], [ 0, [[CLEANUP_I20_I_I]] ], [ [[__R_0_I16_I_I]], [[WHILE_COND_I14_I_I]] ] +// APPROX-NEXT: [[BF_VALUE_I:%.*]] = and i64 [[RETVAL_0_I_I]], 2251799813685247 +// APPROX-NEXT: [[BF_SET9_I:%.*]] = or disjoint i64 [[BF_VALUE_I]], 9221120237041090560 +// APPROX-NEXT: [[TMP10:%.*]] = bitcast i64 [[BF_SET9_I]] to double +// APPROX-NEXT: ret double [[TMP10]] // extern "C" __device__ double test_nan(const char *tag) { return nan(tag); diff --git a/clang/tools/scan-build-py/CMakeLists.txt b/clang/tools/scan-build-py/CMakeLists.txt index 3aca22c0b0a8d..9273eb5ed977e 100644 --- a/clang/tools/scan-build-py/CMakeLists.txt +++ b/clang/tools/scan-build-py/CMakeLists.txt @@ -88,7 +88,7 @@ foreach(lib ${LibScanbuild}) DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/lib/libscanbuild/${lib}) list(APPEND Depends ${CMAKE_BINARY_DIR}/lib/libscanbuild/${lib}) install(FILES lib/libscanbuild/${lib} - DESTINATION lib${CLANG_LIBDIR_SUFFIX}/libscanbuild + DESTINATION lib/libscanbuild COMPONENT scan-build-py) endforeach() @@ -106,7 +106,7 @@ foreach(resource ${LibScanbuildResources}) DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/lib/libscanbuild/resources/${resource}) list(APPEND Depends ${CMAKE_BINARY_DIR}/lib/libscanbuild/resources/${resource}) install(FILES lib/libscanbuild/resources/${resource} - DESTINATION lib${CLANG_LIBDIR_SUFFIX}/libscanbuild/resources + DESTINATION lib/libscanbuild/resources COMPONENT scan-build-py) endforeach() @@ -122,7 +122,7 @@ foreach(lib ${LibEar}) DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/lib/libear/${lib}) list(APPEND Depends ${CMAKE_BINARY_DIR}/lib/libear/${lib}) install(FILES lib/libear/${lib} - DESTINATION lib${CLANG_LIBDIR_SUFFIX}/libear + DESTINATION lib/libear COMPONENT scan-build-py) endforeach() diff --git a/compiler-rt/lib/builtins/CMakeLists.txt b/compiler-rt/lib/builtins/CMakeLists.txt index 13adbd6c4d57d..2c3b0fa84a478 100644 --- a/compiler-rt/lib/builtins/CMakeLists.txt +++ b/compiler-rt/lib/builtins/CMakeLists.txt @@ -868,10 +868,12 @@ else () endif() endif() endif() - check_c_source_compiles("_Float16 foo(_Float16 x) { return x; }" + check_c_source_compiles("_Float16 foo(_Float16 x) { return x; } + int main(void) { return 0; }" COMPILER_RT_HAS_${arch}_FLOAT16) append_list_if(COMPILER_RT_HAS_${arch}_FLOAT16 -DCOMPILER_RT_HAS_FLOAT16 BUILTIN_CFLAGS_${arch}) - check_c_source_compiles("__bf16 foo(__bf16 x) { return x; }" + check_c_source_compiles("__bf16 foo(__bf16 x) { return x; } + int main(void) { return 0; }" COMPILER_RT_HAS_${arch}_BFLOAT16) # Build BF16 files only when "__bf16" is available. if(COMPILER_RT_HAS_${arch}_BFLOAT16) diff --git a/compiler-rt/lib/rtsan/rtsan.cpp b/compiler-rt/lib/rtsan/rtsan.cpp index 8a7ff03c611c6..b2c4616b5fd0d 100644 --- a/compiler-rt/lib/rtsan/rtsan.cpp +++ b/compiler-rt/lib/rtsan/rtsan.cpp @@ -58,11 +58,11 @@ SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_realtime_exit() { __rtsan::GetContextForThisThread().RealtimePop(); } -SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_off() { +SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_disable() { __rtsan::GetContextForThisThread().BypassPush(); } -SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_on() { +SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_enable() { __rtsan::GetContextForThisThread().BypassPop(); } diff --git a/compiler-rt/lib/rtsan/rtsan.h b/compiler-rt/lib/rtsan/rtsan.h index 3d665c98aed18..ae23609f97d2d 100644 --- a/compiler-rt/lib/rtsan/rtsan.h +++ b/compiler-rt/lib/rtsan/rtsan.h @@ -38,11 +38,11 @@ SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_realtime_exit(); // Disable all RTSan error reporting. // Injected into the code if "nosanitize(realtime)" is on a function. -SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_off(); +SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_disable(); // Re-enable all RTSan error reporting. -// The counterpart to `__rtsan_off`. -SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_on(); +// The counterpart to `__rtsan_disable`. +SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_enable(); SANITIZER_INTERFACE_ATTRIBUTE void __rtsan_expect_not_realtime(const char *intercepted_function_name); diff --git a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp index 6e7ab016a4c6b..5a86957170dce 100644 --- a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp +++ b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp @@ -204,10 +204,10 @@ TEST(TestRtsan, ThrowingAnExceptionDiesWhenRealtime) { TEST(TestRtsan, DoesNotDieIfTurnedOff) { std::mutex mutex; auto RealtimeUnsafeFunc = [&]() { - __rtsan_off(); + __rtsan_disable(); mutex.lock(); mutex.unlock(); - __rtsan_on(); + __rtsan_enable(); }; RealtimeInvoke(RealtimeUnsafeFunc); } diff --git a/flang/lib/Evaluate/intrinsics-library.cpp b/flang/lib/Evaluate/intrinsics-library.cpp index 65636b9956e78..ed28d8130808f 100644 --- a/flang/lib/Evaluate/intrinsics-library.cpp +++ b/flang/lib/Evaluate/intrinsics-library.cpp @@ -255,6 +255,25 @@ struct HostRuntimeLibrary { static constexpr HostRuntimeMap map{table}; static_assert(map.Verify(), "map must be sorted"); }; + +// Helpers to map complex std::pow whose resolution in F2{std::pow} is +// ambiguous as of clang++ 20. +template +static std::complex StdPowF2( + const std::complex &x, const std::complex &y) { + return std::pow(x, y); +} +template +static std::complex StdPowF2A( + const HostT &x, const std::complex &y) { + return std::pow(x, y); +} +template +static std::complex StdPowF2B( + const std::complex &x, const HostT &y) { + return std::pow(x, y); +} + template struct HostRuntimeLibrary, LibraryVersion::Libm> { using F = FuncPointer, const std::complex &>; @@ -275,9 +294,9 @@ struct HostRuntimeLibrary, LibraryVersion::Libm> { FolderFactory::Create("cosh"), FolderFactory::Create("exp"), FolderFactory::Create("log"), - FolderFactory::Create("pow"), - FolderFactory::Create("pow"), - FolderFactory::Create("pow"), + FolderFactory::Create("pow"), + FolderFactory::Create("pow"), + FolderFactory::Create("pow"), FolderFactory::Create("sin"), FolderFactory::Create("sinh"), FolderFactory::Create("sqrt"), diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp index 90943fa92493c..e5ccf659c3f8e 100644 --- a/flang/lib/Lower/Bridge.cpp +++ b/flang/lib/Lower/Bridge.cpp @@ -2349,8 +2349,11 @@ class FirConverter : public Fortran::lower::AbstractConverter { fir::IfOp topIfOp, currentIfOp; for (Fortran::lower::pft::Evaluation &e : eval.getNestedEvaluations()) { auto genIfOp = [&](mlir::Value cond) { - auto ifOp = - builder->create(toLocation(), cond, /*withElse=*/true); + Fortran::lower::pft::Evaluation &succ = *e.controlSuccessor; + bool hasElse = succ.isA() || + succ.isA(); + auto ifOp = builder->create(toLocation(), cond, + /*withElseRegion=*/hasElse); builder->setInsertionPointToStart(&ifOp.getThenRegion().front()); return ifOp; }; diff --git a/flang/test/HLFIR/assumed_shape_with_value_keyword.f90 b/flang/test/HLFIR/assumed_shape_with_value_keyword.f90 index 197efc08422c6..208f22badda28 100644 --- a/flang/test/HLFIR/assumed_shape_with_value_keyword.f90 +++ b/flang/test/HLFIR/assumed_shape_with_value_keyword.f90 @@ -102,7 +102,6 @@ subroutine test_optional1(x) ! CHECK: %[[VAL_3:.*]] = fir.box_addr %[[VAL_2]]#0 : (!fir.box>) -> !fir.ref> ! CHECK: fir.call @_QPinternal_call7(%[[VAL_3]]) fastmath : (!fir.ref>) -> () ! CHECK: hlfir.copy_out %[[TMP_BOX]], %[[VAL_2]]#1 to %[[VAL_0]]#0 : (!fir.ref>>>, i1, !fir.box>) -> () -! CHECK: } else { ! CHECK: } ! CHECK: return ! CHECK: } @@ -122,7 +121,6 @@ subroutine test_optional2(x) ! CHECK: %[[VAL_3:.*]] = fir.box_addr %[[VAL_2]]#0 : (!fir.box>) -> !fir.ref> ! CHECK: fir.call @_QPinternal_call8(%[[VAL_3]]) fastmath : (!fir.ref>) -> () ! CHECK: hlfir.copy_out %[[TMP_BOX]], %[[VAL_2]]#1 to %[[VAL_0]]#0 : (!fir.ref>>>, i1, !fir.box>) -> () -! CHECK: } else { ! CHECK: } ! CHECK: return ! CHECK: } diff --git a/flang/test/Lower/HLFIR/select-rank.f90 b/flang/test/Lower/HLFIR/select-rank.f90 index 211b7565bab8a..d27a6d732ffc7 100644 --- a/flang/test/Lower/HLFIR/select-rank.f90 +++ b/flang/test/Lower/HLFIR/select-rank.f90 @@ -796,7 +796,6 @@ subroutine test_branching(x) ! CHECK: %[[VAL_10:.*]] = arith.xori %[[VAL_8]], %[[VAL_9]] : i1 ! CHECK: fir.if %[[VAL_10]] { ! CHECK: fir.call @_QPone() fastmath : () -> () -! CHECK: } else { ! CHECK: } ! CHECK: fir.call @_QPrdefault(%[[VAL_6]]#0) fastmath : (!fir.box>) -> () ! CHECK: cf.br ^bb7 diff --git a/flang/test/Lower/Intrinsics/system_clock.f90 b/flang/test/Lower/Intrinsics/system_clock.f90 index ca36920c04eb3..9eae3a58884fa 100644 --- a/flang/test/Lower/Intrinsics/system_clock.f90 +++ b/flang/test/Lower/Intrinsics/system_clock.f90 @@ -104,7 +104,6 @@ subroutine ss(count) ! CHECK: fir.if %[[V_17]] { ! CHECK: %[[C_0:c[0-9a-z_]+]] = arith.constant 0 : i64 ! CHECK: fir.store %[[C_0]] to %arg0 : !fir.ref - ! CHECK: } else { ! CHECK: } ! CHECK: %[[V_18:[0-9]+]] = fir.zero_bits !fir.ptr ! CHECK: fir.store %[[V_18]] to %[[V_4]] : !fir.ref> @@ -137,7 +136,6 @@ subroutine ss(count) ! CHECK: %[[V_32]] = fir.load %arg0 : !fir.ref ! CHECK: %[[V_33]] = fir.call @_FortranAioOutputInteger64(%[[V_31]], %[[V_32]]) {{.*}}: (!fir.ref, i64) -> i1 ! CHECK: %[[V_34]] = fir.call @_FortranAioEndIoStatement(%[[V_31]]) {{.*}}: (!fir.ref) -> i32 - ! CHECK: } else { ! CHECK: } ! CHECK: return ! CHECK: } diff --git a/flang/test/Lower/OpenMP/master.f90 b/flang/test/Lower/OpenMP/master.f90 index 7db1be4f005b5..9f98ac89fb1fd 100644 --- a/flang/test/Lower/OpenMP/master.f90 +++ b/flang/test/Lower/OpenMP/master.f90 @@ -91,7 +91,7 @@ subroutine omp_master_parallel() !CHECK: hlfir.assign %{{.*}} to %{{.*}}#0 : i32, !fir.ref beta = alpha + gama end if - !CHECK: else + !CHECK: } !CHECK: omp.terminator !$omp end master diff --git a/flang/test/Lower/OpenMP/unstructured.f90 b/flang/test/Lower/OpenMP/unstructured.f90 index 9c3527eda5bb4..bd030b918033e 100644 --- a/flang/test/Lower/OpenMP/unstructured.f90 +++ b/flang/test/Lower/OpenMP/unstructured.f90 @@ -141,7 +141,6 @@ subroutine ss3(n) ! nested unstructured OpenMP constructs ! CHECK: @_FortranAioBeginExternalListOutput ! CHECK: %[[LOAD:.*]] = fir.load %[[OMP_LOOP_J_DECL]]#0 : !fir.ref ! CHECK: @_FortranAioOutputInteger32(%{{.*}}, %[[LOAD]]) -! CHECK: } else { ! CHECK: } ! CHECK-NEXT: omp.yield ! CHECK-NEXT: } diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 index 7e4890dd00fea..56a43abca42a7 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 @@ -118,7 +118,6 @@ ! CHECK: %[[VAL_46:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_45]]) : (!fir.box>, i64) -> !fir.ref ! CHECK: %[[VAL_47:.*]] = fir.load %[[VAL_46]] : !fir.ref ! CHECK: hlfir.assign %[[VAL_47]] to %[[VAL_37]]#0 : f32, !fir.ref -! CHECK: } else { ! CHECK: } ! CHECK: omp.yield ! CHECK: omp.terminator diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 index 9a93c75f5bd1a..775554fd3dcca 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-max.f90 @@ -108,7 +108,6 @@ ! CHECK: %[[VAL_46:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_45]]) : (!fir.box>, i64) -> !fir.ref ! CHECK: %[[VAL_47:.*]] = fir.load %[[VAL_46]] : !fir.ref ! CHECK: hlfir.assign %[[VAL_47]] to %[[VAL_37]]#0 : f32, !fir.ref -! CHECK: } else { ! CHECK: } ! CHECK: omp.yield ! CHECK: omp.terminator diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 index 41fcc979cdc9d..d16de4a867a24 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 @@ -120,7 +120,6 @@ ! CHECK: %[[VAL_46:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_45]]) : (!fir.box>, i64) -> !fir.ref ! CHECK: %[[VAL_47:.*]] = fir.load %[[VAL_46]] : !fir.ref ! CHECK: hlfir.assign %[[VAL_47]] to %[[VAL_37]]#0 : f32, !fir.ref -! CHECK: } else { ! CHECK: } ! CHECK: omp.yield ! CHECK: omp.terminator diff --git a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 index 50b2db9463d23..04957c7287eae 100644 --- a/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 +++ b/flang/test/Lower/OpenMP/wsloop-reduction-min.f90 @@ -110,7 +110,6 @@ ! CHECK: %[[VAL_46:.*]] = hlfir.designate %[[VAL_5]]#0 (%[[VAL_45]]) : (!fir.box>, i64) -> !fir.ref ! CHECK: %[[VAL_47:.*]] = fir.load %[[VAL_46]] : !fir.ref ! CHECK: hlfir.assign %[[VAL_47]] to %[[VAL_37]]#0 : f32, !fir.ref -! CHECK: } else { ! CHECK: } ! CHECK: omp.yield ! CHECK: omp.terminator diff --git a/flang/test/Lower/OpenMP/wsloop-variable.f90 b/flang/test/Lower/OpenMP/wsloop-variable.f90 index dc2acf881f482..7bfb9274f389a 100644 --- a/flang/test/Lower/OpenMP/wsloop-variable.f90 +++ b/flang/test/Lower/OpenMP/wsloop-variable.f90 @@ -190,7 +190,6 @@ subroutine wsloop_variable_sub !CHECK: %[[VAL_56:.*]] = fir.load %[[VAL_19]]#0 : !fir.ref !CHECK: %[[VAL_57:.*]] = arith.cmpi eq, %[[VAL_55]], %[[VAL_56]] : i8 !CHECK: fir.if %[[VAL_57]] { -!CHECK: } else { !CHECK: } !CHECK: omp.yield !CHECK: } diff --git a/libcxx/docs/Status/Cxx20Issues.csv b/libcxx/docs/Status/Cxx20Issues.csv index 9c65ff9a53640..e5d2498473ecd 100644 --- a/libcxx/docs/Status/Cxx20Issues.csv +++ b/libcxx/docs/Status/Cxx20Issues.csv @@ -172,7 +172,7 @@ "`LWG3221 `__","Result of ``year_month``\ arithmetic with ``months``\ is ambiguous","2019-11 (Belfast)","|Complete|","8.0","" "`LWG3235 `__","``parse``\ manipulator without abbreviation is not callable","2019-11 (Belfast)","","","" "`LWG3246 `__","LWG3246: What are the constraints on the template parameter of `basic_format_arg`?","2019-11 (Belfast)","|Nothing To Do|","","" -"`LWG3253 `__","``basic_syncbuf::basic_syncbuf()``\ should not be explicit","2019-11 (Belfast)","","","" +"`LWG3253 `__","``basic_syncbuf::basic_syncbuf()``\ should not be explicit","2019-11 (Belfast)","|Complete|","20.0","" "`LWG3245 `__","Unnecessary restriction on ``'%p'``\ parse specifier","2019-11 (Belfast)","","","" "`LWG3244 `__","Constraints for ``Source``\ in |sect|\ [fs.path.req] insufficiently constrainty","2019-11 (Belfast)","","","" "`LWG3241 `__","``chrono-spec``\ grammar ambiguity in |sect|\ [time.format]","2019-11 (Belfast)","|Complete|","16.0","" diff --git a/libcxx/include/syncstream b/libcxx/include/syncstream index e6f35b6f428ed..a0617f4acf5b6 100644 --- a/libcxx/include/syncstream +++ b/libcxx/include/syncstream @@ -46,7 +46,9 @@ namespace std { using streambuf_type = basic_streambuf; // [syncstream.syncbuf.cons], construction and destruction - explicit basic_syncbuf(streambuf_type* obuf = nullptr) + basic_syncbuf() + : basic_syncbuf(nullptr) {} + explicit basic_syncbuf(streambuf_type* obuf) : basic_syncbuf(obuf, Allocator()) {} basic_syncbuf(streambuf_type*, const Allocator&); basic_syncbuf(basic_syncbuf&&); @@ -253,7 +255,10 @@ public: // [syncstream.syncbuf.cons], construction and destruction - _LIBCPP_HIDE_FROM_ABI explicit basic_syncbuf(streambuf_type* __obuf = nullptr) + _LIBCPP_HIDE_FROM_ABI basic_syncbuf() + : basic_syncbuf(nullptr) {} + + _LIBCPP_HIDE_FROM_ABI explicit basic_syncbuf(streambuf_type* __obuf) : basic_syncbuf(__obuf, _Allocator()) {} _LIBCPP_HIDE_FROM_ABI basic_syncbuf(streambuf_type* __obuf, _Allocator const& __alloc) diff --git a/libcxx/test/std/input.output/syncstream/syncbuf/syncstream.syncbuf.cons/cons.default.pass.cpp b/libcxx/test/std/input.output/syncstream/syncbuf/syncstream.syncbuf.cons/cons.default.pass.cpp index aa0eb2d41e0f0..beebc36c76758 100644 --- a/libcxx/test/std/input.output/syncstream/syncbuf/syncstream.syncbuf.cons/cons.default.pass.cpp +++ b/libcxx/test/std/input.output/syncstream/syncbuf/syncstream.syncbuf.cons/cons.default.pass.cpp @@ -25,8 +25,15 @@ #include "constexpr_char_traits.h" #include "test_allocator.h" +template +std::basic_syncbuf lwg3253_default_constructor_is_not_explicit() { + return {}; +} + template void test() { + lwg3253_default_constructor_is_not_explicit(); + { using Buf = std::basic_syncbuf; static_assert(std::default_initializable); diff --git a/libcxx/test/std/language.support/cmp/cmp.alg/strong_order_long_double.verify.cpp b/libcxx/test/std/language.support/cmp/cmp.alg/strong_order_long_double.verify.cpp index c9c2ba2002149..cd032d4864895 100644 --- a/libcxx/test/std/language.support/cmp/cmp.alg/strong_order_long_double.verify.cpp +++ b/libcxx/test/std/language.support/cmp/cmp.alg/strong_order_long_double.verify.cpp @@ -8,21 +8,6 @@ // UNSUPPORTED: c++03, c++11, c++14, c++17 -// The following platforms have sizeof(long double) == sizeof(double), so this test doesn't apply to them. -// This test does apply to aarch64 where Arm's AAPCS64 is followed. There they are different sizes. -// XFAIL: target={{arm64|arm64e|armv(7|8)(l|m)?|powerpc|powerpc64}}-{{.+}} - -// MSVC configurations have long double equal to regular double on all -// architectures. -// XFAIL: target={{.+}}-pc-windows-msvc - -// ARM/AArch64 MinGW also has got long double equal to regular double, just -// like MSVC (thus match both MinGW and MSVC here, for those architectures). -// XFAIL: target={{aarch64|armv7}}-{{.*}}-windows-{{.+}} - -// Android's 32-bit x86 target has long double equal to regular double. -// XFAIL: target=i686-{{.+}}-android{{.*}} - // // template constexpr strong_ordering strong_order(const T& a, const T& b); @@ -37,5 +22,9 @@ void f() { long double ld = 3.14; +#ifdef TEST_LONG_DOUBLE_IS_DOUBLE + (void)ld; // expected-no-diagnostics +#else (void)std::strong_order(ld, ld); // expected-error@*:* {{std::strong_order is unimplemented for this floating-point type}} +#endif } diff --git a/libcxx/test/std/numerics/bit/bit.cast/bit_cast.pass.cpp b/libcxx/test/std/numerics/bit/bit.cast/bit_cast.pass.cpp index f73877416a717..044589298439c 100644 --- a/libcxx/test/std/numerics/bit/bit.cast/bit_cast.pass.cpp +++ b/libcxx/test/std/numerics/bit/bit.cast/bit_cast.pass.cpp @@ -229,7 +229,7 @@ bool tests() { test_roundtrip_through_nested_T(i); test_roundtrip_through_buffer(i); -#if __SIZEOF_LONG_DOUBLE__ == __SIZEOF_DOUBLE__ +#ifdef TEST_LONG_DOUBLE_IS_DOUBLE test_roundtrip_through(i); #endif #if defined(__SIZEOF_INT128__) && __SIZEOF_LONG_DOUBLE__ == __SIZEOF_INT128__ && \ diff --git a/libcxx/test/support/test_macros.h b/libcxx/test/support/test_macros.h index 6f7ec3aa0c1f9..5d4c1a65cfafb 100644 --- a/libcxx/test/support/test_macros.h +++ b/libcxx/test/support/test_macros.h @@ -511,4 +511,8 @@ inline Tp const& DoNotOptimize(Tp const& value) { # define TEST_CONSTEXPR_OPERATOR_NEW #endif +#if __SIZEOF_LONG_DOUBLE__ == __SIZEOF_DOUBLE__ +# define TEST_LONG_DOUBLE_IS_DOUBLE +#endif + #endif // SUPPORT_TEST_MACROS_HPP diff --git a/lld/test/ELF/avr-reloc.s b/lld/test/ELF/avr-reloc.s index ec088eaa149d0..41c32580f63a1 100644 --- a/lld/test/ELF/avr-reloc.s +++ b/lld/test/ELF/avr-reloc.s @@ -76,32 +76,6 @@ adiw r24, b ; R_AVR_6_ADIW in r20, b ; R_AVR_PORT6 sbic b, 1 ; R_AVR_PORT5 -.section .PCREL,"ax",@progbits -; CHECK-LABEL: section .PCREL -; CHECK: rjmp .+30 -; CHECK-NEXT: rjmp .-36 -; CHECK-NEXT: breq .+26 -; CHECK-NEXT: breq .-40 -; CHECK-NEXT: rjmp .-4096 -; CHECK-NEXT: rjmp .+4094 -; CHECK-NEXT: rjmp .+4094 -; CHECK-NEXT: rjmp .-4096 -; CHECK-NEXT: breq .-128 -; CHECK-NEXT: breq .+126 -; HEX-LABEL: section .PCREL: -; HEX-NEXT: 0fc0eecf 69f061f3 -foo: -rjmp foo + 32 ; R_AVR_13_PCREL -rjmp foo - 32 ; R_AVR_13_PCREL -breq foo + 32 ; R_AVR_7_PCREL -breq foo - 32 ; R_AVR_7_PCREL -rjmp 1f - 4096 $ 1: ; R_AVR_13_PCREL -rjmp 1f + 4094 $ 1: ; R_AVR_13_PCREL -rjmp 1f - 4098 $ 1: ; R_AVR_13_PCREL (overflow) -rjmp 1f + 4096 $ 1: ; R_AVR_13_PCREL (overflow) -breq 1f - 128 $ 1: ; R_AVR_7_PCREL -breq 1f + 126 $ 1: ; R_AVR_7_PCREL - .section .LDSSTS,"ax",@progbits ; CHECK-LABEL: section .LDSSTS: ; CHECK: lds r20, 0x1e diff --git a/lldb/include/lldb/Core/SourceManager.h b/lldb/include/lldb/Core/SourceManager.h index 5239ac6f4055f..8feeb4347dd52 100644 --- a/lldb/include/lldb/Core/SourceManager.h +++ b/lldb/include/lldb/Core/SourceManager.h @@ -37,8 +37,8 @@ class SourceManager { const SourceManager::File &rhs); public: - File(const FileSpec &file_spec, lldb::TargetSP target_sp); - File(const FileSpec &file_spec, lldb::DebuggerSP debugger_sp); + File(lldb::SupportFileSP support_file_sp, lldb::TargetSP target_sp); + File(lldb::SupportFileSP support_file_sp, lldb::DebuggerSP debugger_sp); bool ModificationTimeIsStale() const; bool PathRemappingIsStale() const; @@ -56,7 +56,10 @@ class SourceManager { bool LineIsValid(uint32_t line); - const FileSpec &GetFileSpec() { return m_file_spec; } + lldb::SupportFileSP GetSupportFile() const { + assert(m_support_file_sp && "SupportFileSP must always be valid"); + return m_support_file_sp; + } uint32_t GetSourceMapModificationID() const { return m_source_map_mod_id; } @@ -70,15 +73,13 @@ class SourceManager { protected: /// Set file and update modification time. - void SetFileSpec(FileSpec file_spec); + void SetSupportFile(lldb::SupportFileSP support_file_sp); bool CalculateLineOffsets(uint32_t line = UINT32_MAX); - FileSpec m_file_spec_orig; // The original file spec that was used (can be - // different from m_file_spec) - FileSpec m_file_spec; // The actually file spec being used (if the target - // has source mappings, this might be different from - // m_file_spec_orig) + /// The support file. If the target has source mappings, this might be + /// different from the original support file passed to the constructor. + lldb::SupportFileSP m_support_file_sp; // Keep the modification time that this file data is valid for llvm::sys::TimePoint<> m_mod_time; @@ -93,7 +94,8 @@ class SourceManager { lldb::TargetWP m_target_wp; private: - void CommonInitializer(const FileSpec &file_spec, lldb::TargetSP target_sp); + void CommonInitializer(lldb::SupportFileSP support_file_sp, + lldb::TargetSP target_sp); }; typedef std::shared_ptr FileSP; diff --git a/lldb/include/lldb/Utility/SupportFile.h b/lldb/include/lldb/Utility/SupportFile.h index 334a0aaac2c27..6a091bb84ada3 100644 --- a/lldb/include/lldb/Utility/SupportFile.h +++ b/lldb/include/lldb/Utility/SupportFile.h @@ -14,10 +14,10 @@ namespace lldb_private { -/// Wraps either a FileSpec that represents a local file or a source -/// file whose contents is known (for example because it can be -/// reconstructed from debug info), but that hasn't been written to a -/// file yet. This also stores an optional checksum of the on-disk content. +/// Wraps a FileSpec and an optional Checksum. The FileSpec represents either a +/// path to a file or a source file whose contents is known (for example because +/// it can be reconstructed from debug info), but that hasn't been written to a +/// file yet. class SupportFile { public: SupportFile() : m_file_spec(), m_checksum() {} diff --git a/lldb/source/Commands/CommandObjectSource.cpp b/lldb/source/Commands/CommandObjectSource.cpp index 5ddd46ac5fdc0..1a0629c6765d4 100644 --- a/lldb/source/Commands/CommandObjectSource.cpp +++ b/lldb/source/Commands/CommandObjectSource.cpp @@ -1076,8 +1076,8 @@ class CommandObjectSourceList : public CommandObjectParsed { target.GetSourceManager().GetLastFile()); if (last_file_sp) { const bool show_inlines = true; - m_breakpoint_locations.Reset(last_file_sp->GetFileSpec(), 0, - show_inlines); + m_breakpoint_locations.Reset( + last_file_sp->GetSupportFile()->GetSpecOnly(), 0, show_inlines); SearchFilterForUnconstrainedSearches target_search_filter( target.shared_from_this()); target_search_filter.Search(m_breakpoint_locations); diff --git a/lldb/source/Core/IOHandlerCursesGUI.cpp b/lldb/source/Core/IOHandlerCursesGUI.cpp index d922d32f91058..8f44e3d0cd016 100644 --- a/lldb/source/Core/IOHandlerCursesGUI.cpp +++ b/lldb/source/Core/IOHandlerCursesGUI.cpp @@ -6894,8 +6894,8 @@ class SourceFileWindowDelegate : public WindowDelegate { if (context_changed) m_selected_line = m_pc_line; - if (m_file_sp && - m_file_sp->GetFileSpec() == m_sc.line_entry.GetFile()) { + if (m_file_sp && m_file_sp->GetSupportFile()->GetSpecOnly() == + m_sc.line_entry.GetFile()) { // Same file, nothing to do, we should either have the lines or // not (source file missing) if (m_selected_line >= static_cast(m_first_visible_line)) { @@ -7001,7 +7001,8 @@ class SourceFileWindowDelegate : public WindowDelegate { LineEntry bp_loc_line_entry; if (bp_loc_sp->GetAddress().CalculateSymbolContextLineEntry( bp_loc_line_entry)) { - if (m_file_sp->GetFileSpec() == bp_loc_line_entry.GetFile()) { + if (m_file_sp->GetSupportFile()->GetSpecOnly() == + bp_loc_line_entry.GetFile()) { bp_lines.insert(bp_loc_line_entry.line); } } @@ -7332,7 +7333,7 @@ class SourceFileWindowDelegate : public WindowDelegate { if (exe_ctx.HasProcessScope() && exe_ctx.GetProcessRef().IsAlive()) { BreakpointSP bp_sp = exe_ctx.GetTargetRef().CreateBreakpoint( nullptr, // Don't limit the breakpoint to certain modules - m_file_sp->GetFileSpec(), // Source file + m_file_sp->GetSupportFile()->GetSpecOnly(), // Source file m_selected_line + 1, // Source line number (m_selected_line is zero based) 0, // Unspecified column. @@ -7478,7 +7479,8 @@ class SourceFileWindowDelegate : public WindowDelegate { LineEntry bp_loc_line_entry; if (bp_loc_sp->GetAddress().CalculateSymbolContextLineEntry( bp_loc_line_entry)) { - if (m_file_sp->GetFileSpec() == bp_loc_line_entry.GetFile() && + if (m_file_sp->GetSupportFile()->GetSpecOnly() == + bp_loc_line_entry.GetFile() && m_selected_line + 1 == bp_loc_line_entry.line) { bool removed = exe_ctx.GetTargetRef().RemoveBreakpointByID(bp_sp->GetID()); @@ -7492,7 +7494,7 @@ class SourceFileWindowDelegate : public WindowDelegate { // No breakpoint found on the location, add it. BreakpointSP bp_sp = exe_ctx.GetTargetRef().CreateBreakpoint( nullptr, // Don't limit the breakpoint to certain modules - m_file_sp->GetFileSpec(), // Source file + m_file_sp->GetSupportFile()->GetSpecOnly(), // Source file m_selected_line + 1, // Source line number (m_selected_line is zero based) 0, // No column specified. diff --git a/lldb/source/Core/SourceManager.cpp b/lldb/source/Core/SourceManager.cpp index 0d70c554e5342..cd0011a25f1c3 100644 --- a/lldb/source/Core/SourceManager.cpp +++ b/lldb/source/Core/SourceManager.cpp @@ -87,8 +87,10 @@ SourceManager::FileSP SourceManager::GetFile(const FileSpec &file_spec) { LLDB_LOG(log, "Source file caching disabled: creating new source file: {0}", file_spec); if (target_sp) - return std::make_shared(file_spec, target_sp); - return std::make_shared(file_spec, debugger_sp); + return std::make_shared(std::make_shared(file_spec), + target_sp); + return std::make_shared(std::make_shared(file_spec), + debugger_sp); } ProcessSP process_sp = target_sp ? target_sp->GetProcessSP() : ProcessSP(); @@ -136,7 +138,8 @@ SourceManager::FileSP SourceManager::GetFile(const FileSpec &file_spec) { } // Check if the file exists on disk. - if (file_sp && !FileSystem::Instance().Exists(file_sp->GetFileSpec())) { + if (file_sp && !FileSystem::Instance().Exists( + file_sp->GetSupportFile()->GetSpecOnly())) { LLDB_LOG(log, "File doesn't exist on disk: {0}", file_spec); file_sp.reset(); } @@ -148,9 +151,11 @@ SourceManager::FileSP SourceManager::GetFile(const FileSpec &file_spec) { // (Re)create the file. if (target_sp) - file_sp = std::make_shared(file_spec, target_sp); + file_sp = std::make_shared(std::make_shared(file_spec), + target_sp); else - file_sp = std::make_shared(file_spec, debugger_sp); + file_sp = std::make_shared(std::make_shared(file_spec), + debugger_sp); // Add the file to the debugger and process cache. If the file was // invalidated, this will overwrite it. @@ -444,25 +449,25 @@ void SourceManager::FindLinesMatchingRegex(FileSpec &file_spec, match_lines); } -SourceManager::File::File(const FileSpec &file_spec, +SourceManager::File::File(SupportFileSP support_file_sp, lldb::DebuggerSP debugger_sp) - : m_file_spec_orig(file_spec), m_file_spec(), m_mod_time(), + : m_support_file_sp(std::make_shared()), m_mod_time(), m_debugger_wp(debugger_sp), m_target_wp(TargetSP()) { - CommonInitializer(file_spec, {}); + CommonInitializer(support_file_sp, {}); } -SourceManager::File::File(const FileSpec &file_spec, TargetSP target_sp) - : m_file_spec_orig(file_spec), m_file_spec(), m_mod_time(), +SourceManager::File::File(SupportFileSP support_file_sp, TargetSP target_sp) + : m_support_file_sp(std::make_shared()), m_mod_time(), m_debugger_wp(target_sp ? target_sp->GetDebugger().shared_from_this() : DebuggerSP()), m_target_wp(target_sp) { - CommonInitializer(file_spec, target_sp); + CommonInitializer(support_file_sp, target_sp); } -void SourceManager::File::CommonInitializer(const FileSpec &file_spec, +void SourceManager::File::CommonInitializer(SupportFileSP support_file_sp, TargetSP target_sp) { // Set the file and update the modification time. - SetFileSpec(file_spec); + SetSupportFile(support_file_sp); // Always update the source map modification ID if we have a target. if (target_sp) @@ -472,65 +477,76 @@ void SourceManager::File::CommonInitializer(const FileSpec &file_spec, if (m_mod_time == llvm::sys::TimePoint<>()) { if (target_sp) { // If this is just a file name, try finding it in the target. - if (!file_spec.GetDirectory() && file_spec.GetFilename()) { - bool check_inlines = false; - SymbolContextList sc_list; - size_t num_matches = - target_sp->GetImages().ResolveSymbolContextForFilePath( - file_spec.GetFilename().AsCString(), 0, check_inlines, - SymbolContextItem(eSymbolContextModule | - eSymbolContextCompUnit), - sc_list); - bool got_multiple = false; - if (num_matches != 0) { - if (num_matches > 1) { - CompileUnit *test_cu = nullptr; - for (const SymbolContext &sc : sc_list) { - if (sc.comp_unit) { - if (test_cu) { - if (test_cu != sc.comp_unit) - got_multiple = true; - break; - } else - test_cu = sc.comp_unit; + { + FileSpec file_spec = support_file_sp->GetSpecOnly(); + if (!file_spec.GetDirectory() && file_spec.GetFilename()) { + bool check_inlines = false; + SymbolContextList sc_list; + size_t num_matches = + target_sp->GetImages().ResolveSymbolContextForFilePath( + file_spec.GetFilename().AsCString(), 0, check_inlines, + SymbolContextItem(eSymbolContextModule | + eSymbolContextCompUnit), + sc_list); + bool got_multiple = false; + if (num_matches != 0) { + if (num_matches > 1) { + CompileUnit *test_cu = nullptr; + for (const SymbolContext &sc : sc_list) { + if (sc.comp_unit) { + if (test_cu) { + if (test_cu != sc.comp_unit) + got_multiple = true; + break; + } else + test_cu = sc.comp_unit; + } } } - } - if (!got_multiple) { - SymbolContext sc; - sc_list.GetContextAtIndex(0, sc); - if (sc.comp_unit) - SetFileSpec(sc.comp_unit->GetPrimaryFile()); + if (!got_multiple) { + SymbolContext sc; + sc_list.GetContextAtIndex(0, sc); + if (sc.comp_unit) + SetSupportFile(std::make_shared( + sc.comp_unit->GetPrimaryFile())); + } } } } // Try remapping the file if it doesn't exist. - if (!FileSystem::Instance().Exists(m_file_spec)) { - // Check target specific source remappings (i.e., the - // target.source-map setting), then fall back to the module - // specific remapping (i.e., the .dSYM remapping dictionary). - auto remapped = target_sp->GetSourcePathMap().FindFile(m_file_spec); - if (!remapped) { - FileSpec new_spec; - if (target_sp->GetImages().FindSourceFile(m_file_spec, new_spec)) - remapped = new_spec; + { + FileSpec file_spec = support_file_sp->GetSpecOnly(); + if (!FileSystem::Instance().Exists(file_spec)) { + // Check target specific source remappings (i.e., the + // target.source-map setting), then fall back to the module + // specific remapping (i.e., the .dSYM remapping dictionary). + auto remapped = target_sp->GetSourcePathMap().FindFile(file_spec); + if (!remapped) { + FileSpec new_spec; + if (target_sp->GetImages().FindSourceFile(file_spec, new_spec)) + remapped = new_spec; + } + if (remapped) + SetSupportFile(std::make_shared( + *remapped, support_file_sp->GetChecksum())); } - if (remapped) - SetFileSpec(*remapped); } } } // If the file exists, read in the data. if (m_mod_time != llvm::sys::TimePoint<>()) - m_data_sp = FileSystem::Instance().CreateDataBuffer(m_file_spec); + m_data_sp = FileSystem::Instance().CreateDataBuffer( + m_support_file_sp->GetSpecOnly()); } -void SourceManager::File::SetFileSpec(FileSpec file_spec) { +void SourceManager::File::SetSupportFile(lldb::SupportFileSP support_file_sp) { + FileSpec file_spec = support_file_sp->GetSpecOnly(); resolve_tilde(file_spec); - m_file_spec = std::move(file_spec); - m_mod_time = FileSystem::Instance().GetModificationTime(m_file_spec); + m_support_file_sp = + std::make_shared(file_spec, support_file_sp->GetChecksum()); + m_mod_time = FileSystem::Instance().GetModificationTime(file_spec); } uint32_t SourceManager::File::GetLineOffset(uint32_t line) { @@ -603,7 +619,8 @@ bool SourceManager::File::ModificationTimeIsStale() const { // TODO: use host API to sign up for file modifications to anything in our // source cache and only update when we determine a file has been updated. // For now we check each time we want to display info for the file. - auto curr_mod_time = FileSystem::Instance().GetModificationTime(m_file_spec); + auto curr_mod_time = FileSystem::Instance().GetModificationTime( + m_support_file_sp->GetSpecOnly()); return curr_mod_time != llvm::sys::TimePoint<>() && m_mod_time != curr_mod_time; } @@ -644,7 +661,8 @@ size_t SourceManager::File::DisplaySourceLines(uint32_t line, debugger_sp->GetStopShowColumnAnsiSuffix()); HighlighterManager mgr; - std::string path = GetFileSpec().GetPath(/*denormalize*/ false); + std::string path = + GetSupportFile()->GetSpecOnly().GetPath(/*denormalize*/ false); // FIXME: Find a way to get the definitive language this file was written in // and pass it to the highlighter. const auto &h = mgr.getHighlighterFor(lldb::eLanguageTypeUnknown, path); @@ -698,7 +716,8 @@ void SourceManager::File::FindLinesMatchingRegex( bool lldb_private::operator==(const SourceManager::File &lhs, const SourceManager::File &rhs) { - if (lhs.m_file_spec != rhs.m_file_spec) + if (!lhs.GetSupportFile()->Equal(*rhs.GetSupportFile(), + SupportFile::eEqualChecksumIfSet)) return false; return lhs.m_mod_time == rhs.m_mod_time; } @@ -778,9 +797,9 @@ void SourceManager::SourceFileCache::AddSourceFile(const FileSpec &file_spec, assert(file_sp && "invalid FileSP"); AddSourceFileImpl(file_spec, file_sp); - const FileSpec &resolved_file_spec = file_sp->GetFileSpec(); + const FileSpec &resolved_file_spec = file_sp->GetSupportFile()->GetSpecOnly(); if (file_spec != resolved_file_spec) - AddSourceFileImpl(file_sp->GetFileSpec(), file_sp); + AddSourceFileImpl(file_sp->GetSupportFile()->GetSpecOnly(), file_sp); } void SourceManager::SourceFileCache::RemoveSourceFile(const FileSP &file_sp) { diff --git a/lldb/unittests/Core/SourceManagerTest.cpp b/lldb/unittests/Core/SourceManagerTest.cpp index 58d6f6cb3f850..26ab0edffb398 100644 --- a/lldb/unittests/Core/SourceManagerTest.cpp +++ b/lldb/unittests/Core/SourceManagerTest.cpp @@ -8,6 +8,7 @@ #include "lldb/Core/SourceManager.h" #include "lldb/Host/FileSystem.h" +#include "lldb/Utility/SupportFile.h" #include "gtest/gtest.h" #include "TestingSupport/MockTildeExpressionResolver.h" @@ -29,8 +30,8 @@ TEST_F(SourceFileCache, FindSourceFileFound) { // Insert: foo FileSpec foo_file_spec("foo"); - auto foo_file_sp = - std::make_shared(foo_file_spec, lldb::DebuggerSP()); + auto foo_file_sp = std::make_shared( + std::make_shared(foo_file_spec), lldb::DebuggerSP()); cache.AddSourceFile(foo_file_spec, foo_file_sp); // Query: foo, expect found. @@ -43,8 +44,8 @@ TEST_F(SourceFileCache, FindSourceFileNotFound) { // Insert: foo FileSpec foo_file_spec("foo"); - auto foo_file_sp = - std::make_shared(foo_file_spec, lldb::DebuggerSP()); + auto foo_file_sp = std::make_shared( + std::make_shared(foo_file_spec), lldb::DebuggerSP()); cache.AddSourceFile(foo_file_spec, foo_file_sp); // Query: bar, expect not found. @@ -63,7 +64,8 @@ TEST_F(SourceFileCache, FindSourceFileByUnresolvedPath) { // Create the file with the resolved file spec. auto foo_file_sp = std::make_shared( - resolved_foo_file_spec, lldb::DebuggerSP()); + std::make_shared(resolved_foo_file_spec), + lldb::DebuggerSP()); // Cache the result with the unresolved file spec. cache.AddSourceFile(foo_file_spec, foo_file_sp); diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index cf0a6f96fb012..05bd962fa5dd1 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -2189,10 +2189,6 @@ example: ``nosanitize_coverage`` This attribute indicates that SanitizerCoverage instrumentation is disabled for this function. -``nosanitize_realtime`` - This attribute indicates that the Realtime Sanitizer instrumentation is - disabled for this function. - This attribute is incompatible with the ``sanitize_realtime`` attribute. ``null_pointer_is_valid`` If ``null_pointer_is_valid`` is set, then the ``null`` address in address-space 0 is considered to be a valid address for memory loads and @@ -2319,7 +2315,11 @@ example: This attribute indicates that RealtimeSanitizer checks (realtime safety analysis - no allocations, syscalls or exceptions) are enabled for this function. - This attribute is incompatible with the ``nosanitize_realtime`` attribute. +``sanitize_realtime_unsafe`` + This attribute indicates that RealtimeSanitizer should error immediately + if the attributed function is called during invocation of a function + attributed with ``sanitize_realtime``. + This attribute is incompatible with the ``sanitize_realtime`` attribute. ``speculative_load_hardening`` This attribute indicates that `Speculative Load Hardening `_ diff --git a/llvm/docs/TestSuiteGuide.md b/llvm/docs/TestSuiteGuide.md index 9552cd89aa1c1..19db0ee7d01b8 100644 --- a/llvm/docs/TestSuiteGuide.md +++ b/llvm/docs/TestSuiteGuide.md @@ -134,6 +134,44 @@ Every program can work as a correctness test. Some programs are unsuitable for performance measurements. Setting the `TEST_SUITE_BENCHMARKING_ONLY` CMake option to `ON` will disable them. +The MultiSource benchmarks consist of the following apps and benchmarks: + +| MultiSource | Language | Application Area | Remark | +|----------------------|-----------|-------------------------------|----------------------| +| 7zip | C/C++ | Compression/Decompression | | +| ASCI_Purple | C | SMG2000 benchmark and solver | Memory intensive app | +| ASC_Sequoia | C | Simulation and solver | | +| BitBench | C | uudecode/uuencode utility | Bit Stream benchmark for functional compilers | +| Bullet | C++ | Bullet 2.75 physics engine | | +| DOE-ProxyApps-C++ | C++ | HPC/scientific apps | Small applications, representative of our larger DOE workloads | +| DOE-ProxyApps-C | C | HPC/scientific apps | " | +| Fhourstones | C | Game/solver | Integer benchmark that efficiently solves positions in the game of Connect-4 | +| Fhourstones-3.1 | C | Game/solver | " | +| FreeBench | C | Benchmark suite | Raytracer, four in a row, neural network, file compressor, Fast Fourier/Cosine/Sine Transform | +| llubenchmark | C | Linked-list micro-benchmark | | +| mafft | C | Bioinformatics | A multiple sequence alignment program | +| MallocBench | C | Benchmark suite | cfrac, espresso, gawk, gs, make, p2c, perl | +| McCat | C | Benchmark suite | Quicksort, bubblesort, eigenvalues | +| mediabench | C | Benchmark suite | adpcm, g721, gsm, jpeg, mpeg2 | +| MiBench | C | Embedded benchmark suite | Automotive, consumer, office, security, telecom apps | +| nbench | C | | BYTE Magazine's BYTEmark benchmark program | +| NPB-serial | C | Parallel computing | Serial version of the NPB IS code | +| Olden | C | Data Structures | SGI version of the Olden benchmark | +| OptimizerEval | C | Solver | Preston Brigg's optimizer evaluation framework | +| PAQ8p | C++ | Data compression | | +| Prolangs-C++ | C++ | Benchmark suite | city, employ, life, NP, ocean, primes, simul, vcirc | +| Prolangs-C | C | Benchmark suite | agrep, archie-client, bison, gnugo, unix-smail | +| Ptrdist | C | Pointer-Intensive Benchmark Suite | | +| Rodinia | C | Scientific apps | backprop, pathfinder, srad | +| SciMark2-C | C | Scientific apps | FFT, LU, Montecarlo, sparse matmul | +| sim | C | Dynamic programming | A Time-Efficient, Linear-Space Local Similarity Algorithm | +| tramp3d-v4 | C++ | Numerical analysis | Template-intensive numerical program based on FreePOOMA | +| Trimaran | C | Encryption | 3des, md5, crc | +| TSVC | C | Vectorization benchmark | Test Suite for Vectorizing Compilers (TSVC) | +| VersaBench | C | Benchmark suite | 8b10b, beamformer, bmm, dbms, ecbdes | + +All MultiSource applications are suitable for performance measurements +and will run when CMake option `TEST_SUITE_BENCHMARKING_ONLY` is set. Configuration ------------- diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h index 8a2e6583af87c..47d86a4725d90 100644 --- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h +++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h @@ -759,7 +759,7 @@ enum AttributeKindCodes { ATTR_KIND_INITIALIZES = 94, ATTR_KIND_HYBRID_PATCHABLE = 95, ATTR_KIND_SANITIZE_REALTIME = 96, - ATTR_KIND_NO_SANITIZE_REALTIME = 97, + ATTR_KIND_SANITIZE_REALTIME_UNSAFE = 97, }; enum ComdatSelectionKindCodes { diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td index 80936c0ee8335..eeea5b02ad2ce 100644 --- a/llvm/include/llvm/IR/Attributes.td +++ b/llvm/include/llvm/IR/Attributes.td @@ -212,9 +212,6 @@ def NoSanitizeBounds : EnumAttr<"nosanitize_bounds", [FnAttr]>; /// No SanitizeCoverage instrumentation. def NoSanitizeCoverage : EnumAttr<"nosanitize_coverage", [FnAttr]>; -/// No SanitizeRealtime instrumentation. -def NoSanitizeRealtime : EnumAttr<"nosanitize_realtime", [FnAttr]>; - /// Null pointer in address space zero is valid. def NullPointerIsValid : EnumAttr<"null_pointer_is_valid", [FnAttr]>; @@ -303,6 +300,10 @@ def SanitizeNumericalStability : EnumAttr<"sanitize_numerical_stability", [FnAtt /// RealtimeSanitizer is on. def SanitizeRealtime : EnumAttr<"sanitize_realtime", [FnAttr]>; +/// RealtimeSanitizer should error if a real-time unsafe function is invoked +/// during a real-time sanitized function (see `sanitize_realtime`). +def SanitizeRealtimeUnsafe : EnumAttr<"sanitize_realtime_unsafe", [FnAttr]>; + /// Speculative Load Hardening is enabled. /// /// Note that this uses the default compatibility (always compatible during @@ -392,6 +393,7 @@ def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; +def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; def : CompatRule<"isEqual">; diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 173faa32a3878..533fe62fb8cdd 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -5921,6 +5921,61 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, break; } + case Instruction::BitCast: { + const Value *Src; + if (!match(Op, m_ElementWiseBitCast(m_Value(Src))) || + !Src->getType()->isIntOrIntVectorTy()) + break; + + const Type *Ty = Op->getType()->getScalarType(); + KnownBits Bits(Ty->getScalarSizeInBits()); + computeKnownBits(Src, DemandedElts, Bits, Depth + 1, Q); + + // Transfer information from the sign bit. + if (Bits.isNonNegative()) + Known.signBitMustBeZero(); + else if (Bits.isNegative()) + Known.signBitMustBeOne(); + + if (Ty->isIEEE()) { + // IEEE floats are NaN when all bits of the exponent plus at least one of + // the fraction bits are 1. This means: + // - If we assume unknown bits are 0 and the value is NaN, it will + // always be NaN + // - If we assume unknown bits are 1 and the value is not NaN, it can + // never be NaN + if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) + Known.KnownFPClasses = fcNan; + else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) + Known.knownNot(fcNan); + + // Build KnownBits representing Inf and check if it must be equal or + // unequal to this value. + auto InfKB = KnownBits::makeConstant( + APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt()); + InfKB.Zero.clearSignBit(); + if (const auto InfResult = KnownBits::eq(Bits, InfKB)) { + assert(!InfResult.value()); + Known.knownNot(fcInf); + } else if (Bits == InfKB) { + Known.KnownFPClasses = fcInf; + } + + // Build KnownBits representing Zero and check if it must be equal or + // unequal to this value. + auto ZeroKB = KnownBits::makeConstant( + APFloat::getZero(Ty->getFltSemantics()).bitcastToAPInt()); + ZeroKB.Zero.clearSignBit(); + if (const auto ZeroResult = KnownBits::eq(Bits, ZeroKB)) { + assert(!ZeroResult.value()); + Known.knownNot(fcZero); + } else if (Bits == ZeroKB) { + Known.KnownFPClasses = fcZero; + } + } + + break; + } default: break; } diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 974a05023c72a..aa74a5abe9d0e 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -2093,8 +2093,6 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) { return Attribute::NoSanitizeBounds; case bitc::ATTR_KIND_NO_SANITIZE_COVERAGE: return Attribute::NoSanitizeCoverage; - case bitc::ATTR_KIND_NO_SANITIZE_REALTIME: - return Attribute::NoSanitizeRealtime; case bitc::ATTR_KIND_NULL_POINTER_IS_VALID: return Attribute::NullPointerIsValid; case bitc::ATTR_KIND_OPTIMIZE_FOR_DEBUGGING: @@ -2145,6 +2143,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) { return Attribute::SanitizeNumericalStability; case bitc::ATTR_KIND_SANITIZE_REALTIME: return Attribute::SanitizeRealtime; + case bitc::ATTR_KIND_SANITIZE_REALTIME_UNSAFE: + return Attribute::SanitizeRealtimeUnsafe; case bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING: return Attribute::SpeculativeLoadHardening; case bitc::ATTR_KIND_SWIFT_ERROR: diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp index 3c5097f4af7c5..152293509dbb7 100644 --- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp +++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp @@ -795,8 +795,6 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) { return bitc::ATTR_KIND_NO_SANITIZE_BOUNDS; case Attribute::NoSanitizeCoverage: return bitc::ATTR_KIND_NO_SANITIZE_COVERAGE; - case llvm::Attribute::NoSanitizeRealtime: - return bitc::ATTR_KIND_NO_SANITIZE_REALTIME; case Attribute::NullPointerIsValid: return bitc::ATTR_KIND_NULL_POINTER_IS_VALID; case Attribute::OptimizeForDebugging: @@ -847,6 +845,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) { return bitc::ATTR_KIND_SANITIZE_NUMERICAL_STABILITY; case Attribute::SanitizeRealtime: return bitc::ATTR_KIND_SANITIZE_REALTIME; + case Attribute::SanitizeRealtimeUnsafe: + return bitc::ATTR_KIND_SANITIZE_REALTIME_UNSAFE; case Attribute::SpeculativeLoadHardening: return bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING; case Attribute::SwiftError: diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 79b3ca3b6a5a7..449d871576763 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2224,9 +2224,9 @@ void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs, } Check(!(Attrs.hasFnAttr(Attribute::SanitizeRealtime) && - Attrs.hasFnAttr(Attribute::NoSanitizeRealtime)), + Attrs.hasFnAttr(Attribute::SanitizeRealtimeUnsafe)), "Attributes " - "'sanitize_realtime and nosanitize_realtime' are incompatible!", + "'sanitize_realtime and sanitize_realtime_unsafe' are incompatible!", V); if (Attrs.hasFnAttr(Attribute::OptimizeForDebugging)) { diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index 3296f63a9b887..28ad0abf25703 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -19852,7 +19852,6 @@ static SDValue performConcatVectorsCombine(SDNode *N, // This optimization reduces instruction count. if (N00Opc == AArch64ISD::VLSHR && N10Opc == AArch64ISD::VLSHR && N00->getOperand(1) == N10->getOperand(1)) { - SDValue N000 = N00->getOperand(0); SDValue N100 = N10->getOperand(0); uint64_t N001ConstVal = N00->getConstantOperandVal(1), @@ -19860,7 +19859,8 @@ static SDValue performConcatVectorsCombine(SDNode *N, NScalarSize = N->getValueType(0).getScalarSizeInBits(); if (N001ConstVal == N101ConstVal && N001ConstVal > NScalarSize) { - + N000 = DAG.getNode(AArch64ISD::NVCAST, dl, VT, N000); + N100 = DAG.getNode(AArch64ISD::NVCAST, dl, VT, N100); SDValue Uzp = DAG.getNode(AArch64ISD::UZP2, dl, VT, N000, N100); SDValue NewShiftConstant = DAG.getConstant(N001ConstVal - NScalarSize, dl, MVT::i32); @@ -29344,8 +29344,10 @@ void AArch64TargetLowering::verifyTargetSDNode(const SDNode *N) const { assert(OpVT.getSizeInBits() == VT.getSizeInBits() && "Expected vectors of equal size!"); // TODO: Enable assert once bogus creations have been fixed. - // assert(OpVT.getVectorElementCount() == VT.getVectorElementCount()*2 && - // "Expected result vector with half the lanes of its input!"); + if (VT.isScalableVector()) + break; + assert(OpVT.getVectorElementCount() == VT.getVectorElementCount() * 2 && + "Expected result vector with half the lanes of its input!"); break; } case AArch64ISD::TRN1: @@ -29362,7 +29364,9 @@ void AArch64TargetLowering::verifyTargetSDNode(const SDNode *N) const { assert(VT.isVector() && Op0VT.isVector() && Op1VT.isVector() && "Expected vectors!"); // TODO: Enable assert once bogus creations have been fixed. - // assert(VT == Op0VT && VT == Op1VT && "Expected matching vectors!"); + if (VT.isScalableVector()) + break; + assert(VT == Op0VT && VT == Op1VT && "Expected matching vectors!"); break; } } diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp index 37add682b150e..34c0fad45fc49 100644 --- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp +++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp @@ -6947,10 +6947,14 @@ static void ExpandCryptoAEK(const AArch64::ArchInfo &ArchInfo, } } +static SMLoc incrementLoc(SMLoc L, int Offset) { + return SMLoc::getFromPointer(L.getPointer() + Offset); +} + /// parseDirectiveArch /// ::= .arch token bool AArch64AsmParser::parseDirectiveArch(SMLoc L) { - SMLoc ArchLoc = getLoc(); + SMLoc CurLoc = getLoc(); StringRef Arch, ExtensionString; std::tie(Arch, ExtensionString) = @@ -6958,7 +6962,7 @@ bool AArch64AsmParser::parseDirectiveArch(SMLoc L) { const AArch64::ArchInfo *ArchInfo = AArch64::parseArch(Arch); if (!ArchInfo) - return Error(ArchLoc, "unknown arch name"); + return Error(CurLoc, "unknown arch name"); if (parseToken(AsmToken::EndOfStatement)) return true; @@ -6978,27 +6982,30 @@ bool AArch64AsmParser::parseDirectiveArch(SMLoc L) { ExtensionString.split(RequestedExtensions, '+'); ExpandCryptoAEK(*ArchInfo, RequestedExtensions); + CurLoc = incrementLoc(CurLoc, Arch.size()); - FeatureBitset Features = STI.getFeatureBits(); - setAvailableFeatures(ComputeAvailableFeatures(Features)); for (auto Name : RequestedExtensions) { + // Advance source location past '+'. + CurLoc = incrementLoc(CurLoc, 1); + bool EnableFeature = !Name.consume_front_insensitive("no"); - for (const auto &Extension : ExtensionMap) { - if (Extension.Name != Name) - continue; + auto It = llvm::find_if(ExtensionMap, [&Name](const auto &Extension) { + return Extension.Name == Name; + }); - if (Extension.Features.none()) - report_fatal_error("unsupported architectural extension: " + Name); + if (It == std::end(ExtensionMap)) + Error(CurLoc, "unsupported architectural extension: " + Name); - FeatureBitset ToggleFeatures = - EnableFeature - ? STI.SetFeatureBitsTransitively(~Features & Extension.Features) - : STI.ToggleFeature(Features & Extension.Features); - setAvailableFeatures(ComputeAvailableFeatures(ToggleFeatures)); - break; - } + if (EnableFeature) + STI.SetFeatureBitsTransitively(It->Features); + else + STI.ClearFeatureBitsTransitively(It->Features); + + CurLoc = incrementLoc(CurLoc, Name.size()); } + FeatureBitset Features = ComputeAvailableFeatures(STI.getFeatureBits()); + setAvailableFeatures(Features); return false; } @@ -7018,28 +7025,21 @@ bool AArch64AsmParser::parseDirectiveArchExtension(SMLoc L) { Name = Name.substr(2); } - MCSubtargetInfo &STI = copySTI(); - FeatureBitset Features = STI.getFeatureBits(); - for (const auto &Extension : ExtensionMap) { - if (Extension.Name != Name) - continue; - - if (Extension.Features.none()) - return Error(ExtLoc, "unsupported architectural extension: " + Name); - - FeatureBitset ToggleFeatures = - EnableFeature - ? STI.SetFeatureBitsTransitively(~Features & Extension.Features) - : STI.ToggleFeature(Features & Extension.Features); - setAvailableFeatures(ComputeAvailableFeatures(ToggleFeatures)); - return false; - } + auto It = llvm::find_if(ExtensionMap, [&Name](const auto &Extension) { + return Extension.Name == Name; + }); - return Error(ExtLoc, "unknown architectural extension: " + Name); -} + if (It == std::end(ExtensionMap)) + return Error(ExtLoc, "unsupported architectural extension: " + Name); -static SMLoc incrementLoc(SMLoc L, int Offset) { - return SMLoc::getFromPointer(L.getPointer() + Offset); + MCSubtargetInfo &STI = copySTI(); + if (EnableFeature) + STI.SetFeatureBitsTransitively(It->Features); + else + STI.ClearFeatureBitsTransitively(It->Features); + FeatureBitset Features = ComputeAvailableFeatures(STI.getFeatureBits()); + setAvailableFeatures(Features); + return false; } /// parseDirectiveCPU @@ -7075,30 +7075,22 @@ bool AArch64AsmParser::parseDirectiveCPU(SMLoc L) { bool EnableFeature = !Name.consume_front_insensitive("no"); - bool FoundExtension = false; - for (const auto &Extension : ExtensionMap) { - if (Extension.Name != Name) - continue; - - if (Extension.Features.none()) - report_fatal_error("unsupported architectural extension: " + Name); - - FeatureBitset Features = STI.getFeatureBits(); - FeatureBitset ToggleFeatures = - EnableFeature - ? STI.SetFeatureBitsTransitively(~Features & Extension.Features) - : STI.ToggleFeature(Features & Extension.Features); - setAvailableFeatures(ComputeAvailableFeatures(ToggleFeatures)); - FoundExtension = true; + auto It = llvm::find_if(ExtensionMap, [&Name](const auto &Extension) { + return Extension.Name == Name; + }); - break; - } + if (It == std::end(ExtensionMap)) + Error(CurLoc, "unsupported architectural extension: " + Name); - if (!FoundExtension) - Error(CurLoc, "unsupported architectural extension"); + if (EnableFeature) + STI.SetFeatureBitsTransitively(It->Features); + else + STI.ClearFeatureBitsTransitively(It->Features); CurLoc = incrementLoc(CurLoc, Name.size()); } + FeatureBitset Features = ComputeAvailableFeatures(STI.getFeatureBits()); + setAvailableFeatures(Features); return false; } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp index a5807a70582b3..df084cf41c478 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp @@ -7,36 +7,33 @@ //===----------------------------------------------------------------------===// // /// \file Implements a module splitting algorithm designed to support the -/// FullLTO --lto-partitions option for parallel codegen. +/// FullLTO --lto-partitions option for parallel codegen. This is completely +/// different from the common SplitModule pass, as this system is designed with +/// AMDGPU in mind. /// -/// The role of this module splitting pass is the same as -/// lib/Transforms/Utils/SplitModule.cpp: load-balance the module's functions -/// across a set of N partitions to allow for parallel codegen. +/// The basic idea of this module splitting implementation is the same as +/// SplitModule: load-balance the module's functions across a set of N +/// partitions to allow parallel codegen. However, it does it very +/// differently than the target-agnostic variant: +/// - The module has "split roots", which are kernels in the vast +// majority of cases. +/// - Each root has a set of dependencies, and when a root and its +/// dependencies is considered "big", we try to put it in a partition where +/// most dependencies are already imported, to avoid duplicating large +/// amounts of code. +/// - There's special care for indirect calls in order to ensure +/// AMDGPUResourceUsageAnalysis can work correctly. /// -/// The similarities mostly end here, as this pass achieves load-balancing in a -/// more elaborate fashion which is targeted towards AMDGPU modules. It can take -/// advantage of the structure of AMDGPU modules (which are mostly -/// self-contained) to allow for more efficient splitting without affecting -/// codegen negatively, or causing innaccurate resource usage analysis. -/// -/// High-level pass overview: -/// - SplitGraph & associated classes -/// - Graph representation of the module and of the dependencies that -/// matter for splitting. -/// - RecursiveSearchSplitting -/// - Core splitting algorithm. -/// - SplitProposal -/// - Represents a suggested solution for splitting the input module. These -/// solutions can be scored to determine the best one when multiple -/// solutions are available. -/// - Driver/pass "run" function glues everything together. +/// This file also includes a more elaborate logging system to enable +/// users to easily generate logs that (if desired) do not include any value +/// names, in order to not leak information about the source file. +/// Such logs are very helpful to understand and fix potential issues with +/// module splitting. #include "AMDGPUSplitModule.h" #include "AMDGPUTargetMachine.h" #include "Utils/AMDGPUBaseInfo.h" #include "llvm/ADT/DenseMap.h" -#include "llvm/ADT/EquivalenceClasses.h" -#include "llvm/ADT/GraphTraits.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" @@ -47,56 +44,44 @@ #include "llvm/IR/Module.h" #include "llvm/IR/User.h" #include "llvm/IR/Value.h" -#include "llvm/Support/Allocator.h" #include "llvm/Support/Casting.h" -#include "llvm/Support/DOTGraphTraits.h" #include "llvm/Support/Debug.h" #include "llvm/Support/FileSystem.h" -#include "llvm/Support/GraphWriter.h" #include "llvm/Support/Path.h" -#include "llvm/Support/Timer.h" +#include "llvm/Support/Process.h" +#include "llvm/Support/SHA256.h" +#include "llvm/Support/Threading.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Transforms/Utils/Cloning.h" #include #include -#include #include #include #include #include -#ifndef NDEBUG -#include "llvm/Support/LockFileManager.h" -#endif +using namespace llvm; #define DEBUG_TYPE "amdgpu-split-module" -namespace llvm { namespace { -static cl::opt MaxDepth( - "amdgpu-module-splitting-max-depth", - cl::desc( - "maximum search depth. 0 forces a greedy approach. " - "warning: the algorithm is up to O(2^N), where N is the max depth."), - cl::init(8)); - static cl::opt LargeFnFactor( - "amdgpu-module-splitting-large-threshold", cl::init(2.0f), cl::Hidden, + "amdgpu-module-splitting-large-function-threshold", cl::init(2.0f), + cl::Hidden, cl::desc( - "when max depth is reached and we can no longer branch out, this " - "value determines if a function is worth merging into an already " - "existing partition to reduce code duplication. This is a factor " - "of the ideal partition size, e.g. 2.0 means we consider the " - "function for merging if its cost (including its callees) is 2x the " - "size of an ideal partition.")); + "consider a function as large and needing special treatment when the " + "cost of importing it into a partition" + "exceeds the average cost of a partition by this factor; e;g. 2.0 " + "means if the function and its dependencies is 2 times bigger than " + "an average partition; 0 disables large functions handling entirely")); static cl::opt LargeFnOverlapForMerge( - "amdgpu-module-splitting-merge-threshold", cl::init(0.7f), cl::Hidden, - cl::desc("when a function is considered for merging into a partition that " - "already contains some of its callees, do the merge if at least " - "n% of the code it can reach is already present inside the " - "partition; e.g. 0.7 means only merge >70%")); + "amdgpu-module-splitting-large-function-merge-overlap", cl::init(0.8f), + cl::Hidden, + cl::desc( + "defines how much overlap between two large function's dependencies " + "is needed to put them in the same partition")); static cl::opt NoExternalizeGlobals( "amdgpu-module-splitting-no-externalize-globals", cl::Hidden, @@ -104,92 +89,142 @@ static cl::opt NoExternalizeGlobals( "may cause globals to be duplicated which increases binary size")); static cl::opt - ModuleDotCfgOutput("amdgpu-module-splitting-print-module-dotcfg", - cl::Hidden, - cl::desc("output file to write out the dotgraph " - "representation of the input module")); + LogDirOpt("amdgpu-module-splitting-log-dir", cl::Hidden, + cl::desc("output directory for AMDGPU module splitting logs")); -static cl::opt PartitionSummariesOutput( - "amdgpu-module-splitting-print-partition-summaries", cl::Hidden, - cl::desc("output file to write out a summary of " - "the partitions created for each module")); - -#ifndef NDEBUG static cl::opt - UseLockFile("amdgpu-module-splitting-serial-execution", cl::Hidden, - cl::desc("use a lock file so only one process in the system " - "can run this pass at once. useful to avoid mangled " - "debug output in multithreaded environments.")); + LogPrivate("amdgpu-module-splitting-log-private", cl::Hidden, + cl::desc("hash value names before printing them in the AMDGPU " + "module splitting logs")); -static cl::opt - DebugProposalSearch("amdgpu-module-splitting-debug-proposal-search", - cl::Hidden, - cl::desc("print all proposals received and whether " - "they were rejected or accepted")); -#endif +using CostType = InstructionCost::CostType; +using PartitionID = unsigned; +using GetTTIFn = function_ref; -struct SplitModuleTimer : NamedRegionTimer { - SplitModuleTimer(StringRef Name, StringRef Desc) - : NamedRegionTimer(Name, Desc, DEBUG_TYPE, "AMDGPU Module Splitting", - TimePassesIsEnabled) {} -}; +static bool isEntryPoint(const Function *F) { + return AMDGPU::isEntryFunctionCC(F->getCallingConv()); +} -//===----------------------------------------------------------------------===// -// Utils -//===----------------------------------------------------------------------===// +static std::string getName(const Value &V) { + static bool HideNames; -using CostType = InstructionCost::CostType; -using FunctionsCostMap = DenseMap; -using GetTTIFn = function_ref; -static constexpr unsigned InvalidPID = -1; + static llvm::once_flag HideNameInitFlag; + llvm::call_once(HideNameInitFlag, [&]() { + if (LogPrivate.getNumOccurrences()) + HideNames = LogPrivate; + else { + const auto EV = sys::Process::GetEnv("AMD_SPLIT_MODULE_LOG_PRIVATE"); + HideNames = (EV.value_or("0") != "0"); + } + }); -/// \param Num numerator -/// \param Dem denominator -/// \returns a printable object to print (Num/Dem) using "%0.2f". -static auto formatRatioOf(CostType Num, CostType Dem) { - return format("%0.2f", (static_cast(Num) / Dem) * 100); + if (!HideNames) + return V.getName().str(); + return toHex(SHA256::hash(arrayRefFromStringRef(V.getName())), + /*LowerCase=*/true); } -/// Checks whether a given function is non-copyable. +/// Main logging helper. /// -/// Non-copyable functions cannot be cloned into multiple partitions, and only -/// one copy of the function can be present across all partitions. +/// Logging can be configured by the following environment variable. +/// AMD_SPLIT_MODULE_LOG_DIR= +/// If set, uses as the directory to write logfiles to +/// each time module splitting is used. +/// AMD_SPLIT_MODULE_LOG_PRIVATE +/// If set to anything other than zero, all names are hidden. /// -/// External functions fall into this category. If we were to clone them, we -/// would end up with multiple symbol definitions and a very unhappy linker. -static bool isNonCopyable(const Function &F) { - assert(AMDGPU::isEntryFunctionCC(F.getCallingConv()) - ? F.hasExternalLinkage() - : true && "Kernel w/o external linkage?"); - return F.hasExternalLinkage() || !F.isDefinitionExact(); -} +/// Both environment variables have corresponding CL options which +/// takes priority over them. +/// +/// Any output printed to the log files is also printed to dbgs() when -debug is +/// used and LLVM_DEBUG is defined. +/// +/// This approach has a small disadvantage over LLVM_DEBUG though: logging logic +/// cannot be removed from the code (by building without debug). This probably +/// has a small performance cost because if some computation/formatting is +/// needed for logging purpose, it may be done everytime only to be ignored +/// by the logger. +/// +/// As this pass only runs once and is not doing anything computationally +/// expensive, this is likely a reasonable trade-off. +/// +/// If some computation should really be avoided when unused, users of the class +/// can check whether any logging will occur by using the bool operator. +/// +/// \code +/// if (SML) { +/// // Executes only if logging to a file or if -debug is available and +/// used. +/// } +/// \endcode +class SplitModuleLogger { +public: + SplitModuleLogger(const Module &M) { + std::string LogDir = LogDirOpt; + if (LogDir.empty()) + LogDir = sys::Process::GetEnv("AMD_SPLIT_MODULE_LOG_DIR").value_or(""); + + // No log dir specified means we don't need to log to a file. + // We may still log to dbgs(), though. + if (LogDir.empty()) + return; + + // If a log directory is specified, create a new file with a unique name in + // that directory. + int Fd; + SmallString<0> PathTemplate; + SmallString<0> RealPath; + sys::path::append(PathTemplate, LogDir, "Module-%%-%%-%%-%%-%%-%%-%%.txt"); + if (auto Err = + sys::fs::createUniqueFile(PathTemplate.str(), Fd, RealPath)) { + report_fatal_error("Failed to create log file at '" + Twine(LogDir) + + "': " + Err.message(), + /*CrashDiag=*/false); + } -/// If \p GV has local linkage, make it external + hidden. -static void externalize(GlobalValue &GV) { - if (GV.hasLocalLinkage()) { - GV.setLinkage(GlobalValue::ExternalLinkage); - GV.setVisibility(GlobalValue::HiddenVisibility); + FileOS = std::make_unique(Fd, /*shouldClose=*/true); } - // Unnamed entities must be named consistently between modules. setName will - // give a distinct name to each such entity. - if (!GV.hasName()) - GV.setName("__llvmsplit_unnamed"); + bool hasLogFile() const { return FileOS != nullptr; } + + raw_ostream &logfile() { + assert(FileOS && "no logfile!"); + return *FileOS; + } + + /// \returns true if this SML will log anything either to a file or dbgs(). + /// Can be used to avoid expensive computations that are ignored when logging + /// is disabled. + operator bool() const { + return hasLogFile() || (DebugFlag && isCurrentDebugType(DEBUG_TYPE)); + } + +private: + std::unique_ptr FileOS; +}; + +template +static SplitModuleLogger &operator<<(SplitModuleLogger &SML, const Ty &Val) { + static_assert( + !std::is_same_v, + "do not print values to logs directly, use handleName instead!"); + LLVM_DEBUG(dbgs() << Val); + if (SML.hasLogFile()) + SML.logfile() << Val; + return SML; } -/// Cost analysis function. Calculates the cost of each function in \p M -/// +/// Calculate the cost of each function in \p M +/// \param SML Log Helper /// \param GetTTI Abstract getter for TargetTransformInfo. /// \param M Module to analyze. /// \param CostMap[out] Resulting Function -> Cost map. /// \return The module's total cost. -static CostType calculateFunctionCosts(GetTTIFn GetTTI, Module &M, - FunctionsCostMap &CostMap) { - SplitModuleTimer SMT("calculateFunctionCosts", "cost analysis"); - - LLVM_DEBUG(dbgs() << "[cost analysis] calculating function costs\n"); +static CostType +calculateFunctionCosts(SplitModuleLogger &SML, GetTTIFn GetTTI, Module &M, + DenseMap &CostMap) { CostType ModuleCost = 0; - [[maybe_unused]] CostType KernelCost = 0; + CostType KernelCost = 0; for (auto &Fn : M) { if (Fn.isDeclaration()) @@ -216,30 +251,23 @@ static CostType calculateFunctionCosts(GetTTIFn GetTTI, Module &M, assert((ModuleCost + FnCost) >= ModuleCost && "Overflow!"); ModuleCost += FnCost; - if (AMDGPU::isEntryFunctionCC(Fn.getCallingConv())) + if (isEntryPoint(&Fn)) KernelCost += FnCost; } - if (CostMap.empty()) - return 0; - - assert(ModuleCost); - LLVM_DEBUG({ - const CostType FnCost = ModuleCost - KernelCost; - dbgs() << " - total module cost is " << ModuleCost << ". kernels cost " - << "" << KernelCost << " (" - << format("%0.2f", (float(KernelCost) / ModuleCost) * 100) - << "% of the module), functions cost " << FnCost << " (" - << format("%0.2f", (float(FnCost) / ModuleCost) * 100) - << "% of the module)\n"; - }); + CostType FnCost = (ModuleCost - KernelCost); + CostType ModuleCostOr1 = ModuleCost ? ModuleCost : 1; + SML << "=> Total Module Cost: " << ModuleCost << '\n' + << " => KernelCost: " << KernelCost << " (" + << format("%0.2f", (float(KernelCost) / ModuleCostOr1) * 100) << "%)\n" + << " => FnsCost: " << FnCost << " (" + << format("%0.2f", (float(FnCost) / ModuleCostOr1) * 100) << "%)\n"; return ModuleCost; } -/// \return true if \p F can be indirectly called static bool canBeIndirectlyCalled(const Function &F) { - if (F.isDeclaration() || AMDGPU::isEntryFunctionCC(F.getCallingConv())) + if (F.isDeclaration() || isEntryPoint(&F)) return false; return !F.hasLocalLinkage() || F.hasAddressTaken(/*PutOffender=*/nullptr, @@ -250,1081 +278,351 @@ static bool canBeIndirectlyCalled(const Function &F) { /*IgnoreCastedDirectCall=*/true); } -//===----------------------------------------------------------------------===// -// Graph-based Module Representation -//===----------------------------------------------------------------------===// - -/// AMDGPUSplitModule's view of the source Module, as a graph of all components -/// that can be split into different modules. -/// -/// The most trivial instance of this graph is just the CallGraph of the module, -/// but it is not guaranteed that the graph is strictly equal to the CG. It -/// currently always is but it's designed in a way that would eventually allow -/// us to create abstract nodes, or nodes for different entities such as global -/// variables or any other meaningful constraint we must consider. +/// When a function or any of its callees performs an indirect call, this +/// takes over \ref addAllDependencies and adds all potentially callable +/// functions to \p Fns so they can be counted as dependencies of the function. /// -/// The graph is only mutable by this class, and is generally not modified -/// after \ref SplitGraph::buildGraph runs. No consumers of the graph can -/// mutate it. -class SplitGraph { -public: - class Node; - - enum class EdgeKind : uint8_t { - /// The nodes are related through a direct call. This is a "strong" edge as - /// it means the Src will directly reference the Dst. - DirectCall, - /// The nodes are related through an indirect call. - /// This is a "weaker" edge and is only considered when traversing the graph - /// starting from a kernel. We need this edge for resource usage analysis. - /// - /// The reason why we have this edge in the first place is due to how - /// AMDGPUResourceUsageAnalysis works. In the presence of an indirect call, - /// the resource usage of the kernel containing the indirect call is the - /// max resource usage of all functions that can be indirectly called. - IndirectCall, - }; - - /// An edge between two nodes. Edges are directional, and tagged with a - /// "kind". - struct Edge { - Edge(Node *Src, Node *Dst, EdgeKind Kind) - : Src(Src), Dst(Dst), Kind(Kind) {} - - Node *Src; ///< Source - Node *Dst; ///< Destination - EdgeKind Kind; - }; - - using EdgesVec = SmallVector; - using edges_iterator = EdgesVec::const_iterator; - using nodes_iterator = const Node *const *; - - SplitGraph(const Module &M, const FunctionsCostMap &CostMap, - CostType ModuleCost) - : M(M), CostMap(CostMap), ModuleCost(ModuleCost) {} - - void buildGraph(CallGraph &CG); - -#ifndef NDEBUG - bool verifyGraph() const; -#endif - - bool empty() const { return Nodes.empty(); } - const iterator_range nodes() const { - return {Nodes.begin(), Nodes.end()}; +/// This is needed due to how AMDGPUResourceUsageAnalysis operates: in the +/// presence of an indirect call, the function's resource usage is the same as +/// the most expensive function in the module. +/// \param M The module. +/// \param Fns[out] Resulting list of functions. +static void addAllIndirectCallDependencies(const Module &M, + DenseSet &Fns) { + for (const auto &Fn : M) { + if (canBeIndirectlyCalled(Fn)) + Fns.insert(&Fn); } - const Node &getNode(unsigned ID) const { return *Nodes[ID]; } - - unsigned getNumNodes() const { return Nodes.size(); } - BitVector createNodesBitVector() const { return BitVector(Nodes.size()); } - - const Module &getModule() const { return M; } - - CostType getModuleCost() const { return ModuleCost; } - CostType getCost(const Function &F) const { return CostMap.at(&F); } - - /// \returns the aggregated cost of all nodes in \p BV (bits set to 1 = node - /// IDs). - CostType calculateCost(const BitVector &BV) const; - -private: - /// Retrieves the node for \p GV in \p Cache, or creates a new node for it and - /// updates \p Cache. - Node &getNode(DenseMap &Cache, - const GlobalValue &GV); - - // Create a new edge between two nodes and add it to both nodes. - const Edge &createEdge(Node &Src, Node &Dst, EdgeKind EK); - - const Module &M; - const FunctionsCostMap &CostMap; - CostType ModuleCost; - - // Final list of nodes with stable ordering. - SmallVector Nodes; - - SpecificBumpPtrAllocator NodesPool; - - // Edges are trivially destructible objects, so as a small optimization we - // use a BumpPtrAllocator which avoids destructor calls but also makes - // allocation faster. - static_assert( - std::is_trivially_destructible_v, - "Edge must be trivially destructible to use the BumpPtrAllocator"); - BumpPtrAllocator EdgesPool; -}; +} -/// Nodes in the SplitGraph contain both incoming, and outgoing edges. -/// Incoming edges have this node as their Dst, and Outgoing ones have this node -/// as their Src. +/// Adds the functions that \p Fn may call to \p Fns, then recurses into each +/// callee until all reachable functions have been gathered. /// -/// Edge objects are shared by both nodes in Src/Dst. They provide immediate -/// feedback on how two nodes are related, and in which direction they are -/// related, which is valuable information to make splitting decisions. -/// -/// Nodes are fundamentally abstract, and any consumers of the graph should -/// treat them as such. While a node will be a function most of the time, we -/// could also create nodes for any other reason. In the future, we could have -/// single nodes for multiple functions, or nodes for GVs, etc. -class SplitGraph::Node { - friend class SplitGraph; - -public: - Node(unsigned ID, const GlobalValue &GV, CostType IndividualCost, - bool IsNonCopyable) - : ID(ID), GV(GV), IndividualCost(IndividualCost), - IsNonCopyable(IsNonCopyable), IsEntryFnCC(false), IsGraphEntry(false) { - if (auto *Fn = dyn_cast(&GV)) - IsEntryFnCC = AMDGPU::isEntryFunctionCC(Fn->getCallingConv()); - } - - /// An 0-indexed ID for the node. The maximum ID (exclusive) is the number of - /// nodes in the graph. This ID can be used as an index in a BitVector. - unsigned getID() const { return ID; } - - const Function &getFunction() const { return cast(GV); } - - /// \returns the cost to import this component into a given module, not - /// accounting for any dependencies that may need to be imported as well. - CostType getIndividualCost() const { return IndividualCost; } - - bool isNonCopyable() const { return IsNonCopyable; } - bool isEntryFunctionCC() const { return IsEntryFnCC; } - - /// \returns whether this is an entry point in the graph. Entry points are - /// defined as follows: if you take all entry points in the graph, and iterate - /// their dependencies, you are guaranteed to visit all nodes in the graph at - /// least once. - bool isGraphEntryPoint() const { return IsGraphEntry; } - - StringRef getName() const { return GV.getName(); } - - bool hasAnyIncomingEdges() const { return IncomingEdges.size(); } - bool hasAnyIncomingEdgesOfKind(EdgeKind EK) const { - return any_of(IncomingEdges, [&](const auto *E) { return E->Kind == EK; }); - } - - bool hasAnyOutgoingEdges() const { return OutgoingEdges.size(); } - bool hasAnyOutgoingEdgesOfKind(EdgeKind EK) const { - return any_of(OutgoingEdges, [&](const auto *E) { return E->Kind == EK; }); - } - - iterator_range incoming_edges() const { - return IncomingEdges; - } - - iterator_range outgoing_edges() const { - return OutgoingEdges; - } - - bool shouldFollowIndirectCalls() const { return isEntryFunctionCC(); } - - /// Visit all children of this node in a recursive fashion. Also visits Self. - /// If \ref shouldFollowIndirectCalls returns false, then this only follows - /// DirectCall edges. - /// - /// \param Visitor Visitor Function. - void visitAllDependencies(std::function Visitor) const; - - /// Adds the depedencies of this node in \p BV by setting the bit - /// corresponding to each node. - /// - /// Implemented using \ref visitAllDependencies, hence it follows the same - /// rules regarding dependencies traversal. - /// - /// \param[out] BV The bitvector where the bits should be set. - void getDependencies(BitVector &BV) const { - visitAllDependencies([&](const Node &N) { BV.set(N.getID()); }); - } - - /// Uses \ref visitAllDependencies to aggregate the individual cost of this - /// node and all of its dependencies. - /// - /// This is cached. - CostType getFullCost() const; - -private: - void markAsGraphEntry() { IsGraphEntry = true; } - - unsigned ID; - const GlobalValue &GV; - CostType IndividualCost; - bool IsNonCopyable : 1; - bool IsEntryFnCC : 1; - bool IsGraphEntry : 1; - - // TODO: Cache dependencies as well? - mutable CostType FullCost = 0; - - // TODO: Use a single sorted vector (with all incoming/outgoing edges grouped - // together) - EdgesVec IncomingEdges; - EdgesVec OutgoingEdges; -}; - -void SplitGraph::Node::visitAllDependencies( - std::function Visitor) const { - const bool FollowIndirect = shouldFollowIndirectCalls(); - // FIXME: If this can access SplitGraph in the future, use a BitVector - // instead. - DenseSet Seen; - SmallVector WorkList({this}); +/// \param SML Log Helper +/// \param CG Call graph for \p Fn's module. +/// \param Fn Current function to look at. +/// \param Fns[out] Resulting list of functions. +/// \param OnlyDirect Whether to only consider direct callees. +/// \param HadIndirectCall[out] Set to true if an indirect call was seen at some +/// point, either in \p Fn or in one of the function it calls. When that +/// happens, we fall back to adding all callable functions inside \p Fn's module +/// to \p Fns. +static void addAllDependencies(SplitModuleLogger &SML, const CallGraph &CG, + const Function &Fn, + DenseSet &Fns, bool OnlyDirect, + bool &HadIndirectCall) { + assert(!Fn.isDeclaration()); + + const Module &M = *Fn.getParent(); + SmallVector WorkList({&Fn}); while (!WorkList.empty()) { - const Node *CurN = WorkList.pop_back_val(); - if (auto [It, Inserted] = Seen.insert(CurN); !Inserted) - continue; - - Visitor(*CurN); - - for (const Edge *E : CurN->outgoing_edges()) { - if (!FollowIndirect && E->Kind == EdgeKind::IndirectCall) - continue; - WorkList.push_back(E->Dst); - } - } -} - -CostType SplitGraph::Node::getFullCost() const { - if (FullCost) - return FullCost; - - assert(FullCost == 0); - visitAllDependencies( - [&](const Node &N) { FullCost += N.getIndividualCost(); }); - return FullCost; -} + const auto &CurFn = *WorkList.pop_back_val(); + assert(!CurFn.isDeclaration()); -void SplitGraph::buildGraph(CallGraph &CG) { - SplitModuleTimer SMT("buildGraph", "graph construction"); - LLVM_DEBUG( - dbgs() - << "[build graph] constructing graph representation of the input\n"); - - // We build the graph by just iterating all functions in the module and - // working on their direct callees. At the end, all nodes should be linked - // together as expected. - DenseMap Cache; - SmallVector FnsWithIndirectCalls, IndirectlyCallableFns; - for (const Function &Fn : M) { - if (Fn.isDeclaration()) - continue; + // Scan for an indirect call. If such a call is found, we have to + // conservatively assume this can call all non-entrypoint functions in the + // module. - // Look at direct callees and create the necessary edges in the graph. - bool HasIndirectCall = false; - Node &N = getNode(Cache, Fn); - for (auto &CGEntry : *CG[&Fn]) { + for (auto &CGEntry : *CG[&CurFn]) { auto *CGNode = CGEntry.second; auto *Callee = CGNode->getFunction(); if (!Callee) { - // TODO: Don't consider inline assembly as indirect calls. - if (CGNode == CG.getCallsExternalNode()) - HasIndirectCall = true; + if (OnlyDirect) + continue; + + // Functions have an edge towards CallsExternalNode if they're external + // declarations, or if they do an indirect call. As we only process + // definitions here, we know this means the function has an indirect + // call. We then have to conservatively assume this can call all + // non-entrypoint functions in the module. + if (CGNode != CG.getCallsExternalNode()) + continue; // this is another function-less node we don't care about. + + SML << "Indirect call detected in " << getName(CurFn) + << " - treating all non-entrypoint functions as " + "potential dependencies\n"; + + // TODO: Print an ORE as well ? + addAllIndirectCallDependencies(M, Fns); + HadIndirectCall = true; continue; } - if (!Callee->isDeclaration()) - createEdge(N, getNode(Cache, *Callee), EdgeKind::DirectCall); - } - - // Keep track of this function if it contains an indirect call and/or if it - // can be indirectly called. - if (HasIndirectCall) { - LLVM_DEBUG(dbgs() << "indirect call found in " << Fn.getName() << "\n"); - FnsWithIndirectCalls.push_back(&Fn); - } - - if (canBeIndirectlyCalled(Fn)) - IndirectlyCallableFns.push_back(&Fn); - } + if (Callee->isDeclaration()) + continue; - // Post-process functions with indirect calls. - for (const Function *Fn : FnsWithIndirectCalls) { - for (const Function *Candidate : IndirectlyCallableFns) { - Node &Src = getNode(Cache, *Fn); - Node &Dst = getNode(Cache, *Candidate); - createEdge(Src, Dst, EdgeKind::IndirectCall); + auto [It, Inserted] = Fns.insert(Callee); + if (Inserted) + WorkList.push_back(Callee); } } - - // Now, find all entry points. - SmallVector CandidateEntryPoints; - BitVector NodesReachableByKernels = createNodesBitVector(); - for (Node *N : Nodes) { - // Functions with an Entry CC are always graph entry points too. - if (N->isEntryFunctionCC()) { - N->markAsGraphEntry(); - N->getDependencies(NodesReachableByKernels); - } else if (!N->hasAnyIncomingEdgesOfKind(EdgeKind::DirectCall)) - CandidateEntryPoints.push_back(N); - } - - for (Node *N : CandidateEntryPoints) { - // This can be another entry point if it's not reachable by a kernel - // TODO: We could sort all of the possible new entries in a stable order - // (e.g. by cost), then consume them one by one until - // NodesReachableByKernels is all 1s. It'd allow us to avoid - // considering some nodes as non-entries in some specific cases. - if (!NodesReachableByKernels.test(N->getID())) - N->markAsGraphEntry(); - } - -#ifndef NDEBUG - assert(verifyGraph()); -#endif } -#ifndef NDEBUG -bool SplitGraph::verifyGraph() const { - unsigned ExpectedID = 0; - // Exceptionally using a set here in case IDs are messed up. - DenseSet SeenNodes; - DenseSet SeenFunctionNodes; - for (const Node *N : Nodes) { - if (N->getID() != (ExpectedID++)) { - errs() << "Node IDs are incorrect!\n"; - return false; - } - - if (!SeenNodes.insert(N).second) { - errs() << "Node seen more than once!\n"; - return false; - } - - if (&getNode(N->getID()) != N) { - errs() << "getNode doesn't return the right node\n"; - return false; - } - - for (const Edge *E : N->IncomingEdges) { - if (!E->Src || !E->Dst || (E->Dst != N) || - (find(E->Src->OutgoingEdges, E) == E->Src->OutgoingEdges.end())) { - errs() << "ill-formed incoming edges\n"; - return false; - } - } - - for (const Edge *E : N->OutgoingEdges) { - if (!E->Src || !E->Dst || (E->Src != N) || - (find(E->Dst->IncomingEdges, E) == E->Dst->IncomingEdges.end())) { - errs() << "ill-formed outgoing edges\n"; - return false; - } - } - - const Function &Fn = N->getFunction(); - if (AMDGPU::isEntryFunctionCC(Fn.getCallingConv())) { - if (N->hasAnyIncomingEdges()) { - errs() << "Kernels cannot have incoming edges\n"; - return false; - } - } - - if (Fn.isDeclaration()) { - errs() << "declarations shouldn't have nodes!\n"; - return false; - } - - auto [It, Inserted] = SeenFunctionNodes.insert(&Fn); - if (!Inserted) { - errs() << "one function has multiple nodes!\n"; - return false; +/// Contains information about a function and its dependencies. +/// This is a splitting root. The splitting algorithm works by +/// assigning these to partitions. +struct FunctionWithDependencies { + FunctionWithDependencies(SplitModuleLogger &SML, CallGraph &CG, + const DenseMap &FnCosts, + const Function *Fn) + : Fn(Fn) { + // When Fn is not a kernel, we don't need to collect indirect callees. + // Resource usage analysis is only performed on kernels, and we collect + // indirect callees for resource usage analysis. + addAllDependencies(SML, CG, *Fn, Dependencies, + /*OnlyDirect*/ !isEntryPoint(Fn), HasIndirectCall); + TotalCost = FnCosts.at(Fn); + for (const auto *Dep : Dependencies) { + TotalCost += FnCosts.at(Dep); + + // We cannot duplicate functions with external linkage, or functions that + // may be overriden at runtime. + HasNonDuplicatableDependecy |= + (Dep->hasExternalLinkage() || !Dep->isDefinitionExact()); } } - if (ExpectedID != Nodes.size()) { - errs() << "Node IDs out of sync!\n"; - return false; - } - - if (createNodesBitVector().size() != getNumNodes()) { - errs() << "nodes bit vector doesn't have the right size!\n"; - return false; - } - - // Check we respect the promise of Node::isKernel - BitVector BV = createNodesBitVector(); - for (const Node *N : nodes()) { - if (N->isGraphEntryPoint()) - N->getDependencies(BV); - } - - // Ensure each function in the module has an associated node. - for (const auto &Fn : M) { - if (!Fn.isDeclaration()) { - if (!SeenFunctionNodes.contains(&Fn)) { - errs() << "Fn has no associated node in the graph!\n"; - return false; - } - } - } - - if (!BV.all()) { - errs() << "not all nodes are reachable through the graph's entry points!\n"; - return false; - } - - return true; -} -#endif - -CostType SplitGraph::calculateCost(const BitVector &BV) const { - CostType Cost = 0; - for (unsigned NodeID : BV.set_bits()) - Cost += getNode(NodeID).getIndividualCost(); - return Cost; -} - -SplitGraph::Node & -SplitGraph::getNode(DenseMap &Cache, - const GlobalValue &GV) { - auto &N = Cache[&GV]; - if (N) - return *N; - - CostType Cost = 0; - bool NonCopyable = false; - if (const Function *Fn = dyn_cast(&GV)) { - NonCopyable = isNonCopyable(*Fn); - Cost = CostMap.at(Fn); - } - N = new (NodesPool.Allocate()) Node(Nodes.size(), GV, Cost, NonCopyable); - Nodes.push_back(N); - assert(&getNode(N->getID()) == N); - return *N; -} - -const SplitGraph::Edge &SplitGraph::createEdge(Node &Src, Node &Dst, - EdgeKind EK) { - const Edge *E = new (EdgesPool.Allocate(1)) Edge(&Src, &Dst, EK); - Src.OutgoingEdges.push_back(E); - Dst.IncomingEdges.push_back(E); - return *E; -} - -//===----------------------------------------------------------------------===// -// Split Proposals -//===----------------------------------------------------------------------===// - -/// Represents a module splitting proposal. -/// -/// Proposals are made of N BitVectors, one for each partition, where each bit -/// set indicates that the node is present and should be copied inside that -/// partition. -/// -/// Proposals have several metrics attached so they can be compared/sorted, -/// which the driver to try multiple strategies resultings in multiple proposals -/// and choose the best one out of them. -class SplitProposal { -public: - SplitProposal(const SplitGraph &SG, unsigned MaxPartitions) : SG(&SG) { - Partitions.resize(MaxPartitions, {0, SG.createNodesBitVector()}); - } + const Function *Fn = nullptr; + DenseSet Dependencies; + /// Whether \p Fn or any of its \ref Dependencies contains an indirect call. + bool HasIndirectCall = false; + /// Whether any of \p Fn's dependencies cannot be duplicated. + bool HasNonDuplicatableDependecy = false; - void setName(StringRef NewName) { Name = NewName; } - StringRef getName() const { return Name; } - - const BitVector &operator[](unsigned PID) const { - return Partitions[PID].second; - } - - void add(unsigned PID, const BitVector &BV) { - Partitions[PID].second |= BV; - updateScore(PID); - } - - void print(raw_ostream &OS) const; - LLVM_DUMP_METHOD void dump() const { print(dbgs()); } - - // Find the cheapest partition (lowest cost). In case of ties, always returns - // the highest partition number. - unsigned findCheapestPartition() const; - - /// Calculate the CodeSize and Bottleneck scores. - void calculateScores(); - -#ifndef NDEBUG - void verifyCompleteness() const; -#endif - - /// Only available after \ref calculateScores is called. - /// - /// A positive number indicating the % of code duplication that this proposal - /// creates. e.g. 0.2 means this proposal adds roughly 20% code size by - /// duplicating some functions across partitions. - /// - /// Value is always rounded up to 3 decimal places. - /// - /// A perfect score would be 0.0, and anything approaching 1.0 is very bad. - double getCodeSizeScore() const { return CodeSizeScore; } - - /// Only available after \ref calculateScores is called. - /// - /// A number between [0, 1] which indicates how big of a bottleneck is - /// expected from the largest partition. - /// - /// A score of 1.0 means the biggest partition is as big as the source module, - /// so build time will be equal to or greater than the build time of the - /// initial input. - /// - /// Value is always rounded up to 3 decimal places. - /// - /// This is one of the metrics used to estimate this proposal's build time. - double getBottleneckScore() const { return BottleneckScore; } - -private: - void updateScore(unsigned PID) { - assert(SG); - for (auto &[PCost, Nodes] : Partitions) { - TotalCost -= PCost; - PCost = SG->calculateCost(Nodes); - TotalCost += PCost; - } - } - - /// \see getCodeSizeScore - double CodeSizeScore = 0.0; - /// \see getBottleneckScore - double BottleneckScore = 0.0; - /// Aggregated cost of all partitions CostType TotalCost = 0; - const SplitGraph *SG = nullptr; - std::string Name; - - std::vector> Partitions; -}; - -void SplitProposal::print(raw_ostream &OS) const { - assert(SG); - - OS << "[proposal] " << Name << ", total cost:" << TotalCost - << ", code size score:" << format("%0.3f", CodeSizeScore) - << ", bottleneck score:" << format("%0.3f", BottleneckScore) << '\n'; - for (const auto &[PID, Part] : enumerate(Partitions)) { - const auto &[Cost, NodeIDs] = Part; - OS << " - P" << PID << " nodes:" << NodeIDs.count() << " cost: " << Cost - << '|' << formatRatioOf(Cost, SG->getModuleCost()) << "%\n"; - } -} - -unsigned SplitProposal::findCheapestPartition() const { - assert(!Partitions.empty()); - CostType CurCost = std::numeric_limits::max(); - unsigned CurPID = InvalidPID; - for (const auto &[Idx, Part] : enumerate(Partitions)) { - if (Part.first <= CurCost) { - CurPID = Idx; - CurCost = Part.first; - } - } - assert(CurPID != InvalidPID); - return CurPID; -} - -void SplitProposal::calculateScores() { - if (Partitions.empty()) - return; - - assert(SG); - CostType LargestPCost = 0; - for (auto &[PCost, Nodes] : Partitions) { - if (PCost > LargestPCost) - LargestPCost = PCost; + /// \returns true if this function and its dependencies can be considered + /// large according to \p Threshold. + bool isLarge(CostType Threshold) const { + return TotalCost > Threshold && !Dependencies.empty(); } - - CostType ModuleCost = SG->getModuleCost(); - CodeSizeScore = double(TotalCost) / ModuleCost; - assert(CodeSizeScore >= 0.0); - - BottleneckScore = double(LargestPCost) / ModuleCost; - - CodeSizeScore = std::ceil(CodeSizeScore * 100.0) / 100.0; - BottleneckScore = std::ceil(BottleneckScore * 100.0) / 100.0; -} - -#ifndef NDEBUG -void SplitProposal::verifyCompleteness() const { - if (Partitions.empty()) - return; - - BitVector Result = Partitions[0].second; - for (const auto &P : drop_begin(Partitions)) - Result |= P.second; - assert(Result.all() && "some nodes are missing from this proposal!"); -} -#endif - -//===-- RecursiveSearchStrategy -------------------------------------------===// - -/// Partitioning algorithm. -/// -/// This is a recursive search algorithm that can explore multiple possiblities. -/// -/// When a cluster of nodes can go into more than one partition, and we haven't -/// reached maximum search depth, we recurse and explore both options and their -/// consequences. Both branches will yield a proposal, and the driver will grade -/// both and choose the best one. -/// -/// If max depth is reached, we will use some heuristics to make a choice. Most -/// of the time we will just use the least-pressured (cheapest) partition, but -/// if a cluster is particularly big and there is a good amount of overlap with -/// an existing partition, we will choose that partition instead. -class RecursiveSearchSplitting { -public: - using SubmitProposalFn = function_ref; - - RecursiveSearchSplitting(const SplitGraph &SG, unsigned NumParts, - SubmitProposalFn SubmitProposal); - - void run(); - -private: - struct WorkListEntry { - WorkListEntry(const BitVector &BV) : Cluster(BV) {} - - unsigned NumNonEntryNodes = 0; - CostType TotalCost = 0; - CostType CostExcludingGraphEntryPoints = 0; - BitVector Cluster; - }; - - /// Collects all graph entry points's clusters and sort them so the most - /// expensive clusters are viewed first. This will merge clusters together if - /// they share a non-copyable dependency. - void setupWorkList(); - - /// Recursive function that assigns the worklist item at \p Idx into a - /// partition of \p SP. - /// - /// \p Depth is the current search depth. When this value is equal to - /// \ref MaxDepth, we can no longer recurse. - /// - /// This function only recurses if there is more than one possible assignment, - /// otherwise it is iterative to avoid creating a call stack that is as big as - /// \ref WorkList. - void pickPartition(unsigned Depth, unsigned Idx, SplitProposal SP); - - /// \return A pair: first element is the PID of the partition that has the - /// most similarities with \p Entry, or \ref InvalidPID if no partition was - /// found with at least one element in common. The second element is the - /// aggregated cost of all dependencies in common between \p Entry and that - /// partition. - std::pair - findMostSimilarPartition(const WorkListEntry &Entry, const SplitProposal &SP); - - const SplitGraph &SG; - unsigned NumParts; - SubmitProposalFn SubmitProposal; - - // A Cluster is considered large when its cost, excluding entry points, - // exceeds this value. - CostType LargeClusterThreshold = 0; - unsigned NumProposalsSubmitted = 0; - SmallVector WorkList; }; -RecursiveSearchSplitting::RecursiveSearchSplitting( - const SplitGraph &SG, unsigned NumParts, SubmitProposalFn SubmitProposal) - : SG(SG), NumParts(NumParts), SubmitProposal(SubmitProposal) { - // arbitrary max value as a safeguard. Anything above 10 will already be - // slow, this is just a max value to prevent extreme resource exhaustion or - // unbounded run time. - if (MaxDepth > 16) - report_fatal_error("[amdgpu-split-module] search depth of " + - Twine(MaxDepth) + " is too high!"); - LargeClusterThreshold = - (LargeFnFactor != 0.0) - ? CostType(((SG.getModuleCost() / NumParts) * LargeFnFactor)) - : std::numeric_limits::max(); - LLVM_DEBUG(dbgs() << "[recursive search] large cluster threshold set at " - << LargeClusterThreshold << "\n"); -} - -void RecursiveSearchSplitting::run() { - { - SplitModuleTimer SMT("recursive_search_prepare", "preparing worklist"); - setupWorkList(); +/// Calculates how much overlap there is between \p A and \p B. +/// \return A number between 0.0 and 1.0, where 1.0 means A == B and 0.0 means A +/// and B have no shared elements. Kernels do not count in overlap calculation. +static float calculateOverlap(const DenseSet &A, + const DenseSet &B) { + DenseSet Total; + for (const auto *F : A) { + if (!isEntryPoint(F)) + Total.insert(F); } - { - SplitModuleTimer SMT("recursive_search_pick", "partitioning"); - SplitProposal SP(SG, NumParts); - pickPartition(/*BranchDepth=*/0, /*Idx=*/0, SP); - } -} + if (Total.empty()) + return 0.0f; -void RecursiveSearchSplitting::setupWorkList() { - // e.g. if A and B are two worklist item, and they both call a non copyable - // dependency C, this does: - // A=C - // B=C - // => NodeEC will create a single group (A, B, C) and we create a new - // WorkList entry for that group. - - EquivalenceClasses NodeEC; - for (const SplitGraph::Node *N : SG.nodes()) { - if (!N->isGraphEntryPoint()) + unsigned NumCommon = 0; + for (const auto *F : B) { + if (isEntryPoint(F)) continue; - NodeEC.insert(N->getID()); - N->visitAllDependencies([&](const SplitGraph::Node &Dep) { - if (&Dep != N && Dep.isNonCopyable()) - NodeEC.unionSets(N->getID(), Dep.getID()); - }); + auto [It, Inserted] = Total.insert(F); + if (!Inserted) + ++NumCommon; } - for (auto I = NodeEC.begin(), E = NodeEC.end(); I != E; ++I) { - if (!I->isLeader()) - continue; + return static_cast(NumCommon) / Total.size(); +} - BitVector Cluster = SG.createNodesBitVector(); - for (auto MI = NodeEC.member_begin(I); MI != NodeEC.member_end(); ++MI) { - const SplitGraph::Node &N = SG.getNode(*MI); - if (N.isGraphEntryPoint()) - N.getDependencies(Cluster); - } - WorkList.emplace_back(std::move(Cluster)); - } +/// Performs all of the partitioning work on \p M. +/// \param SML Log Helper +/// \param M Module to partition. +/// \param NumParts Number of partitions to create. +/// \param ModuleCost Total cost of all functions in \p M. +/// \param FnCosts Map of Function -> Cost +/// \param WorkList Functions and their dependencies to process in order. +/// \returns The created partitions (a vector of size \p NumParts ) +static std::vector> +doPartitioning(SplitModuleLogger &SML, Module &M, unsigned NumParts, + CostType ModuleCost, + const DenseMap &FnCosts, + const SmallVector &WorkList) { + + SML << "\n--Partitioning Starts--\n"; + + // Calculate a "large function threshold". When more than one function's total + // import cost exceeds this value, we will try to assign it to an existing + // partition to reduce the amount of duplication needed. + // + // e.g. let two functions X and Y have a import cost of ~10% of the module, we + // assign X to a partition as usual, but when we get to Y, we check if it's + // worth also putting it in Y's partition. + const CostType LargeFnThreshold = + LargeFnFactor ? CostType(((ModuleCost / NumParts) * LargeFnFactor)) + : std::numeric_limits::max(); + + std::vector> Partitions; + Partitions.resize(NumParts); + + // Assign functions to partitions, and try to keep the partitions more or + // less balanced. We do that through a priority queue sorted in reverse, so we + // can always look at the partition with the least content. + // + // There are some cases where we will be deliberately unbalanced though. + // - Large functions: we try to merge with existing partitions to reduce code + // duplication. + // - Functions with indirect or external calls always go in the first + // partition (P0). + auto ComparePartitions = [](const std::pair &a, + const std::pair &b) { + // When two partitions have the same cost, assign to the one with the + // biggest ID first. This allows us to put things in P0 last, because P0 may + // have other stuff added later. + if (a.second == b.second) + return a.first < b.first; + return a.second > b.second; + }; - // Calculate costs and other useful information. - for (WorkListEntry &Entry : WorkList) { - for (unsigned NodeID : Entry.Cluster.set_bits()) { - const SplitGraph::Node &N = SG.getNode(NodeID); - const CostType Cost = N.getIndividualCost(); + // We can't use priority_queue here because we need to be able to access any + // element. This makes this a bit inefficient as we need to sort it again + // everytime we change it, but it's a very small array anyway (likely under 64 + // partitions) so it's a cheap operation. + std::vector> BalancingQueue; + for (unsigned I = 0; I < NumParts; ++I) + BalancingQueue.emplace_back(I, 0); + + // Helper function to handle assigning a function to a partition. This takes + // care of updating the balancing queue. + const auto AssignToPartition = [&](PartitionID PID, + const FunctionWithDependencies &FWD) { + auto &FnsInPart = Partitions[PID]; + FnsInPart.insert(FWD.Fn); + FnsInPart.insert(FWD.Dependencies.begin(), FWD.Dependencies.end()); + + SML << "assign " << getName(*FWD.Fn) << " to P" << PID << "\n -> "; + if (!FWD.Dependencies.empty()) { + SML << FWD.Dependencies.size() << " dependencies added\n"; + }; + + // Update the balancing queue. we scan backwards because in the common case + // the partition is at the end. + for (auto &[QueuePID, Cost] : reverse(BalancingQueue)) { + if (QueuePID == PID) { + CostType NewCost = 0; + for (auto *Fn : Partitions[PID]) + NewCost += FnCosts.at(Fn); + + SML << "[Updating P" << PID << " Cost]:" << Cost << " -> " << NewCost; + if (Cost) { + SML << " (" << unsigned(((float(NewCost) / Cost) - 1) * 100) + << "% increase)"; + } + SML << '\n'; - Entry.TotalCost += Cost; - if (!N.isGraphEntryPoint()) { - Entry.CostExcludingGraphEntryPoints += Cost; - ++Entry.NumNonEntryNodes; + Cost = NewCost; } } - } - sort(WorkList, [](const WorkListEntry &LHS, const WorkListEntry &RHS) { - return LHS.TotalCost > RHS.TotalCost; - }); - - LLVM_DEBUG({ - dbgs() << "[recursive search] worklist:\n"; - for (const auto &[Idx, Entry] : enumerate(WorkList)) { - dbgs() << " - [" << Idx << "]: "; - for (unsigned NodeID : Entry.Cluster.set_bits()) - dbgs() << NodeID << " "; - dbgs() << "(total_cost:" << Entry.TotalCost - << ", cost_excl_entries:" << Entry.CostExcludingGraphEntryPoints - << ")\n"; - } - }); -} + sort(BalancingQueue, ComparePartitions); + }; -void RecursiveSearchSplitting::pickPartition(unsigned Depth, unsigned Idx, - SplitProposal SP) { - while (Idx < WorkList.size()) { - // Step 1: Determine candidate PIDs. - // - const WorkListEntry &Entry = WorkList[Idx]; - const BitVector &Cluster = Entry.Cluster; - - // Default option is to do load-balancing, AKA assign to least pressured - // partition. - const unsigned CheapestPID = SP.findCheapestPartition(); - assert(CheapestPID != InvalidPID); - - // Explore assigning to the kernel that contains the most dependencies in - // common. - const auto [MostSimilarPID, SimilarDepsCost] = - findMostSimilarPartition(Entry, SP); - - // We can chose to explore only one path if we only have one valid path, or - // if we reached maximum search depth and can no longer branch out. - unsigned SinglePIDToTry = InvalidPID; - if (MostSimilarPID == InvalidPID) // no similar PID found - SinglePIDToTry = CheapestPID; - else if (MostSimilarPID == CheapestPID) // both landed on the same PID - SinglePIDToTry = CheapestPID; - else if (Depth >= MaxDepth) { - // We have to choose one path. Use a heuristic to guess which one will be - // more appropriate. - if (Entry.CostExcludingGraphEntryPoints > LargeClusterThreshold) { - // Check if the amount of code in common makes it worth it. - assert(SimilarDepsCost && Entry.CostExcludingGraphEntryPoints); - const double Ratio = - SimilarDepsCost / Entry.CostExcludingGraphEntryPoints; - assert(Ratio >= 0.0 && Ratio <= 1.0); - if (LargeFnOverlapForMerge > Ratio) { - // For debug, just print "L", so we'll see "L3=P3" for instance, which - // will mean we reached max depth and chose P3 based on this - // heuristic. - LLVM_DEBUG(dbgs() << 'L'); - SinglePIDToTry = MostSimilarPID; - } - } else - SinglePIDToTry = CheapestPID; + for (auto &CurFn : WorkList) { + // When a function has indirect calls, it must stay in the first partition + // alongside every reachable non-entry function. This is a nightmare case + // for splitting as it severely limits what we can do. + if (CurFn.HasIndirectCall) { + SML << "Function with indirect call(s): " << getName(*CurFn.Fn) + << " defaulting to P0\n"; + AssignToPartition(0, CurFn); + continue; } - // Step 2: Explore candidates. - - // When we only explore one possible path, and thus branch depth doesn't - // increase, do not recurse, iterate instead. - if (SinglePIDToTry != InvalidPID) { - LLVM_DEBUG(dbgs() << Idx << "=P" << SinglePIDToTry << ' '); - // Only one path to explore, don't clone SP, don't increase depth. - SP.add(SinglePIDToTry, Cluster); - ++Idx; + // When a function has non duplicatable dependencies, we have to keep it in + // the first partition as well. This is a conservative approach, a + // finer-grained approach could keep track of which dependencies are + // non-duplicatable exactly and just make sure they're grouped together. + if (CurFn.HasNonDuplicatableDependecy) { + SML << "Function with externally visible dependency " + << getName(*CurFn.Fn) << " defaulting to P0\n"; + AssignToPartition(0, CurFn); continue; } - assert(MostSimilarPID != InvalidPID); - - // We explore multiple paths: recurse at increased depth, then stop this - // function. - - LLVM_DEBUG(dbgs() << '\n'); - - // lb = load balancing = put in cheapest partition - { - SplitProposal BranchSP = SP; - LLVM_DEBUG(dbgs().indent(Depth) - << " [lb] " << Idx << "=P" << CheapestPID << "? "); - BranchSP.add(CheapestPID, Cluster); - pickPartition(Depth + 1, Idx + 1, BranchSP); - } + // Be smart with large functions to avoid duplicating their dependencies. + if (CurFn.isLarge(LargeFnThreshold)) { + assert(LargeFnOverlapForMerge >= 0.0f && LargeFnOverlapForMerge <= 1.0f); + SML << "Large Function: " << getName(*CurFn.Fn) + << " - looking for partition with at least " + << format("%0.2f", LargeFnOverlapForMerge * 100) << "% overlap\n"; + + bool Assigned = false; + for (const auto &[PID, Fns] : enumerate(Partitions)) { + float Overlap = calculateOverlap(CurFn.Dependencies, Fns); + SML << " => " << format("%0.2f", Overlap * 100) << "% overlap with P" + << PID << '\n'; + if (Overlap > LargeFnOverlapForMerge) { + SML << " selecting P" << PID << '\n'; + AssignToPartition(PID, CurFn); + Assigned = true; + } + } - // ms = most similar = put in partition with the most in common - { - SplitProposal BranchSP = SP; - LLVM_DEBUG(dbgs().indent(Depth) - << " [ms] " << Idx << "=P" << MostSimilarPID << "? "); - BranchSP.add(MostSimilarPID, Cluster); - pickPartition(Depth + 1, Idx + 1, BranchSP); + if (Assigned) + continue; } - return; + // Normal "load-balancing", assign to partition with least pressure. + auto [PID, CurCost] = BalancingQueue.back(); + AssignToPartition(PID, CurFn); } - // Step 3: If we assigned all WorkList items, submit the proposal. - - assert(Idx == WorkList.size()); - assert(NumProposalsSubmitted <= (2u << MaxDepth) && - "Search got out of bounds?"); - SP.setName("recursive_search (depth=" + std::to_string(Depth) + ") #" + - std::to_string(NumProposalsSubmitted++)); - LLVM_DEBUG(dbgs() << '\n'); - SubmitProposal(SP); -} - -std::pair -RecursiveSearchSplitting::findMostSimilarPartition(const WorkListEntry &Entry, - const SplitProposal &SP) { - if (!Entry.NumNonEntryNodes) - return {InvalidPID, 0}; - - // We take the partition that is the most similar using Cost as a metric. - // So we take the set of nodes in common, compute their aggregated cost, and - // pick the partition with the highest cost in common. - unsigned ChosenPID = InvalidPID; - CostType ChosenCost = 0; - for (unsigned PID = 0; PID < NumParts; ++PID) { - BitVector BV = SP[PID]; - BV &= Entry.Cluster; // FIXME: & doesn't work between BVs?! - - if (BV.none()) - continue; - - const CostType Cost = SG.calculateCost(BV); - - if (ChosenPID == InvalidPID || ChosenCost < Cost || - (ChosenCost == Cost && PID > ChosenPID)) { - ChosenPID = PID; - ChosenCost = Cost; + if (SML) { + CostType ModuleCostOr1 = ModuleCost ? ModuleCost : 1; + for (const auto &[Idx, Part] : enumerate(Partitions)) { + CostType Cost = 0; + for (auto *Fn : Part) + Cost += FnCosts.at(Fn); + SML << "P" << Idx << " has a total cost of " << Cost << " (" + << format("%0.2f", (float(Cost) / ModuleCostOr1) * 100) + << "% of source module)\n"; } - } - - return {ChosenPID, ChosenCost}; -} -//===----------------------------------------------------------------------===// -// DOTGraph Printing Support -//===----------------------------------------------------------------------===// - -const SplitGraph::Node *mapEdgeToDst(const SplitGraph::Edge *E) { - return E->Dst; -} - -using SplitGraphEdgeDstIterator = - mapped_iterator; - -} // namespace - -template <> struct GraphTraits { - using NodeRef = const SplitGraph::Node *; - using nodes_iterator = SplitGraph::nodes_iterator; - using ChildIteratorType = SplitGraphEdgeDstIterator; - - using EdgeRef = const SplitGraph::Edge *; - using ChildEdgeIteratorType = SplitGraph::edges_iterator; - - static NodeRef getEntryNode(NodeRef N) { return N; } - - static ChildIteratorType child_begin(NodeRef Ref) { - return {Ref->outgoing_edges().begin(), mapEdgeToDst}; - } - static ChildIteratorType child_end(NodeRef Ref) { - return {Ref->outgoing_edges().end(), mapEdgeToDst}; - } - - static nodes_iterator nodes_begin(const SplitGraph &G) { - return G.nodes().begin(); - } - static nodes_iterator nodes_end(const SplitGraph &G) { - return G.nodes().end(); - } -}; - -template <> struct DOTGraphTraits : public DefaultDOTGraphTraits { - DOTGraphTraits(bool IsSimple = false) : DefaultDOTGraphTraits(IsSimple) {} - - static std::string getGraphName(const SplitGraph &SG) { - return SG.getModule().getName().str(); - } - - std::string getNodeLabel(const SplitGraph::Node *N, const SplitGraph &SG) { - return N->getName().str(); - } - - static std::string getNodeDescription(const SplitGraph::Node *N, - const SplitGraph &SG) { - std::string Result; - if (N->isEntryFunctionCC()) - Result += "entry-fn-cc "; - if (N->isNonCopyable()) - Result += "non-copyable "; - Result += "cost:" + std::to_string(N->getIndividualCost()); - return Result; - } - - static std::string getNodeAttributes(const SplitGraph::Node *N, - const SplitGraph &SG) { - return N->hasAnyIncomingEdges() ? "" : "color=\"red\""; + SML << "--Partitioning Done--\n\n"; } - static std::string getEdgeAttributes(const SplitGraph::Node *N, - SplitGraphEdgeDstIterator EI, - const SplitGraph &SG) { + // Check no functions were missed. +#ifndef NDEBUG + DenseSet AllFunctions; + for (const auto &Part : Partitions) + AllFunctions.insert(Part.begin(), Part.end()); - switch ((*EI.getCurrent())->Kind) { - case SplitGraph::EdgeKind::DirectCall: - return ""; - case SplitGraph::EdgeKind::IndirectCall: - return "style=\"dashed\""; + for (auto &Fn : M) { + if (!Fn.isDeclaration() && !AllFunctions.contains(&Fn)) { + assert(AllFunctions.contains(&Fn) && "Missed a function?!"); } - llvm_unreachable("Unknown SplitGraph::EdgeKind enum"); } -}; - -//===----------------------------------------------------------------------===// -// Driver -//===----------------------------------------------------------------------===// - -namespace { +#endif -// If we didn't externalize GVs, then local GVs need to be conservatively -// imported into every module (including their initializers), and then cleaned -// up afterwards. -static bool needsConservativeImport(const GlobalValue *GV) { - if (const auto *Var = dyn_cast(GV)) - return Var->hasLocalLinkage(); - return isa(GV); + return Partitions; } -/// Prints a summary of the partition \p N, represented by module \p M, to \p -/// OS. -static void printPartitionSummary(raw_ostream &OS, unsigned N, const Module &M, - unsigned PartCost, unsigned ModuleCost) { - OS << "*** Partition P" << N << " ***\n"; - - for (const auto &Fn : M) { - if (!Fn.isDeclaration()) - OS << " - [function] " << Fn.getName() << "\n"; - } - - for (const auto &GV : M.globals()) { - if (GV.hasInitializer()) - OS << " - [global] " << GV.getName() << "\n"; +static void externalize(GlobalValue &GV) { + if (GV.hasLocalLinkage()) { + GV.setLinkage(GlobalValue::ExternalLinkage); + GV.setVisibility(GlobalValue::HiddenVisibility); } - OS << "Partition contains " << formatRatioOf(PartCost, ModuleCost) - << "% of the source\n"; -} - -static void evaluateProposal(SplitProposal &Best, SplitProposal New) { - SplitModuleTimer SMT("proposal_evaluation", "proposal ranking algorithm"); - - New.calculateScores(); - - LLVM_DEBUG({ - New.verifyCompleteness(); - if (DebugProposalSearch) - New.print(dbgs()); - }); - - const double CurBScore = Best.getBottleneckScore(); - const double CurCSScore = Best.getCodeSizeScore(); - const double NewBScore = New.getBottleneckScore(); - const double NewCSScore = New.getCodeSizeScore(); - - // TODO: Improve this - // We can probably lower the precision of the comparison at first - // e.g. if we have - // - (Current): BScore: 0.489 CSCore 1.105 - // - (New): BScore: 0.475 CSCore 1.305 - // Currently we'd choose the new one because the bottleneck score is - // lower, but the new one duplicates more code. It may be worth it to - // discard the new proposal as the impact on build time is negligible. - - // Compare them - bool IsBest = false; - if (NewBScore < CurBScore) - IsBest = true; - else if (NewBScore == CurBScore) - IsBest = (NewCSScore < CurCSScore); // Use code size as tie breaker. - - if (IsBest) - Best = std::move(New); - - LLVM_DEBUG(if (DebugProposalSearch) { - if (IsBest) - dbgs() << "[search] new best proposal!\n"; - else - dbgs() << "[search] discarding - not profitable\n"; - }); -} - -/// Trivial helper to create an identical copy of \p M. -static std::unique_ptr cloneAll(const Module &M) { - ValueToValueMapTy VMap; - return CloneModule(M, VMap, [&](const GlobalValue *GV) { return true; }); + // Unnamed entities must be named consistently between modules. setName will + // give a distinct name to each such entity. + if (!GV.hasName()) + GV.setName("__llvmsplit_unnamed"); } -/// Writes \p SG as a DOTGraph to \ref ModuleDotCfgDir if requested. -static void writeDOTGraph(const SplitGraph &SG) { - if (ModuleDotCfgOutput.empty()) - return; - - std::error_code EC; - raw_fd_ostream OS(ModuleDotCfgOutput, EC); - if (EC) { - errs() << "[" DEBUG_TYPE "]: cannot open '" << ModuleDotCfgOutput - << "' - DOTGraph will not be printed\n"; +static bool hasDirectCaller(const Function &Fn) { + for (auto &U : Fn.uses()) { + if (auto *CB = dyn_cast(U.getUser()); CB && CB->isCallee(&U)) + return true; } - WriteGraph(OS, SG, /*ShortName=*/false, - /*Title=*/SG.getModule().getName()); + return false; } static void splitAMDGPUModule( - GetTTIFn GetTTI, Module &M, unsigned NumParts, + GetTTIFn GetTTI, Module &M, unsigned N, function_ref MPart)> ModuleCallback) { + + SplitModuleLogger SML(M); + CallGraph CG(M); // Externalize functions whose address are taken. @@ -1341,8 +639,8 @@ static void splitAMDGPUModule( for (auto &Fn : M) { if (Fn.hasAddressTaken()) { if (Fn.hasLocalLinkage()) { - LLVM_DEBUG(dbgs() << "[externalize] " << Fn.getName() - << " because its address is taken\n"); + SML << "[externalize] " << Fn.getName() + << " because its address is taken\n"; } externalize(Fn); } @@ -1353,179 +651,138 @@ static void splitAMDGPUModule( if (!NoExternalizeGlobals) { for (auto &GV : M.globals()) { if (GV.hasLocalLinkage()) - LLVM_DEBUG(dbgs() << "[externalize] GV " << GV.getName() << '\n'); + SML << "[externalize] GV " << GV.getName() << '\n'; externalize(GV); } } // Start by calculating the cost of every function in the module, as well as // the module's overall cost. - FunctionsCostMap FnCosts; - const CostType ModuleCost = calculateFunctionCosts(GetTTI, M, FnCosts); - - // Build the SplitGraph, which represents the module's functions and models - // their dependencies accurately. - SplitGraph SG(M, FnCosts, ModuleCost); - SG.buildGraph(CG); - - if (SG.empty()) { - LLVM_DEBUG( - dbgs() - << "[!] no nodes in graph, input is empty - no splitting possible\n"); - ModuleCallback(cloneAll(M)); - return; + DenseMap FnCosts; + const CostType ModuleCost = calculateFunctionCosts(SML, GetTTI, M, FnCosts); + + // First, gather ever kernel into the worklist. + SmallVector WorkList; + for (auto &Fn : M) { + if (isEntryPoint(&Fn) && !Fn.isDeclaration()) + WorkList.emplace_back(SML, CG, FnCosts, &Fn); } - LLVM_DEBUG({ - dbgs() << "[graph] nodes:\n"; - for (const SplitGraph::Node *N : SG.nodes()) { - dbgs() << " - [" << N->getID() << "]: " << N->getName() << " " - << (N->isGraphEntryPoint() ? "(entry)" : "") << "\n"; + // Then, find missing functions that need to be considered as additional + // roots. These can't be called in theory, but in practice we still have to + // handle them to avoid linker errors. + { + DenseSet SeenFunctions; + for (const auto &FWD : WorkList) { + SeenFunctions.insert(FWD.Fn); + SeenFunctions.insert(FWD.Dependencies.begin(), FWD.Dependencies.end()); } - }); - writeDOTGraph(SG); - - LLVM_DEBUG(dbgs() << "[search] testing splitting strategies\n"); - - std::optional Proposal; - const auto EvaluateProposal = [&](SplitProposal SP) { - if (!Proposal) - Proposal = std::move(SP); - else - evaluateProposal(*Proposal, std::move(SP)); - }; - - // TODO: It would be very easy to create new strategies by just adding a base - // class to RecursiveSearchSplitting and abstracting it away. - RecursiveSearchSplitting(SG, NumParts, EvaluateProposal).run(); - LLVM_DEBUG(if (Proposal) dbgs() << "[search done] selected proposal: " - << Proposal->getName() << "\n";); - - if (!Proposal) { - LLVM_DEBUG(dbgs() << "[!] no proposal made, no splitting possible!\n"); - ModuleCallback(cloneAll(M)); - return; + for (auto &Fn : M) { + // If this function is not part of any kernel's dependencies and isn't + // directly called, consider it as a root. + if (!Fn.isDeclaration() && !isEntryPoint(&Fn) && + !SeenFunctions.count(&Fn) && !hasDirectCaller(Fn)) { + WorkList.emplace_back(SML, CG, FnCosts, &Fn); + } + } } - LLVM_DEBUG(Proposal->print(dbgs());); + // Sort the worklist so the most expensive roots are seen first. + sort(WorkList, [&](auto &A, auto &B) { + // Sort by total cost, and if the total cost is identical, sort + // alphabetically. + if (A.TotalCost == B.TotalCost) + return A.Fn->getName() < B.Fn->getName(); + return A.TotalCost > B.TotalCost; + }); - std::optional SummariesOS; - if (!PartitionSummariesOutput.empty()) { - std::error_code EC; - SummariesOS.emplace(PartitionSummariesOutput, EC); - if (EC) - errs() << "[" DEBUG_TYPE "]: cannot open '" << PartitionSummariesOutput - << "' - Partition summaries will not be printed\n"; + if (SML) { + SML << "Worklist\n"; + for (const auto &FWD : WorkList) { + SML << "[root] " << getName(*FWD.Fn) << " (totalCost:" << FWD.TotalCost + << " indirect:" << FWD.HasIndirectCall + << " hasNonDuplicatableDep:" << FWD.HasNonDuplicatableDependecy + << ")\n"; + // Sort function names before printing to ensure determinism. + SmallVector SortedDepNames; + SortedDepNames.reserve(FWD.Dependencies.size()); + for (const auto *Dep : FWD.Dependencies) + SortedDepNames.push_back(getName(*Dep)); + sort(SortedDepNames); + + for (const auto &Name : SortedDepNames) + SML << " [dependency] " << Name << '\n'; + } } - for (unsigned PID = 0; PID < NumParts; ++PID) { - SplitModuleTimer SMT2("modules_creation", - "creating modules for each partition"); - LLVM_DEBUG(dbgs() << "[split] creating new modules\n"); + // This performs all of the partitioning work. + auto Partitions = doPartitioning(SML, M, N, ModuleCost, FnCosts, WorkList); + assert(Partitions.size() == N); + + // If we didn't externalize GVs, then local GVs need to be conservatively + // imported into every module (including their initializers), and then cleaned + // up afterwards. + const auto NeedsConservativeImport = [&](const GlobalValue *GV) { + // We conservatively import private/internal GVs into every module and clean + // them up afterwards. + const auto *Var = dyn_cast(GV); + return Var && Var->hasLocalLinkage(); + }; - DenseSet FnsInPart; - for (unsigned NodeID : (*Proposal)[PID].set_bits()) - FnsInPart.insert(&SG.getNode(NodeID).getFunction()); + SML << "Creating " << N << " modules...\n"; + unsigned TotalFnImpls = 0; + for (unsigned I = 0; I < N; ++I) { + const auto &FnsInPart = Partitions[I]; ValueToValueMapTy VMap; - CostType PartCost = 0; std::unique_ptr MPart( CloneModule(M, VMap, [&](const GlobalValue *GV) { // Functions go in their assigned partition. - if (const auto *Fn = dyn_cast(GV)) { - if (FnsInPart.contains(Fn)) { - PartCost += SG.getCost(*Fn); - return true; - } - return false; - } + if (const auto *Fn = dyn_cast(GV)) + return FnsInPart.contains(Fn); + + if (NeedsConservativeImport(GV)) + return true; // Everything else goes in the first partition. - return needsConservativeImport(GV) || PID == 0; + return I == 0; })); - // FIXME: Aliases aren't seen often, and their handling isn't perfect so - // bugs are possible. - // Clean-up conservatively imported GVs without any users. - for (auto &GV : make_early_inc_range(MPart->global_values())) { - if (needsConservativeImport(&GV) && GV.use_empty()) + for (auto &GV : make_early_inc_range(MPart->globals())) { + if (NeedsConservativeImport(&GV) && GV.use_empty()) GV.eraseFromParent(); } - if (SummariesOS) - printPartitionSummary(*SummariesOS, PID, *MPart, PartCost, ModuleCost); - - LLVM_DEBUG( - printPartitionSummary(dbgs(), PID, *MPart, PartCost, ModuleCost)); - + unsigned NumAllFns = 0, NumKernels = 0; + for (auto &Cur : *MPart) { + if (!Cur.isDeclaration()) { + ++NumAllFns; + if (isEntryPoint(&Cur)) + ++NumKernels; + } + } + TotalFnImpls += NumAllFns; + SML << " - Module " << I << " with " << NumAllFns << " functions (" + << NumKernels << " kernels)\n"; ModuleCallback(std::move(MPart)); } + + SML << TotalFnImpls << " function definitions across all modules (" + << format("%0.2f", (float(TotalFnImpls) / FnCosts.size()) * 100) + << "% of original module)\n"; } } // namespace PreservedAnalyses AMDGPUSplitModulePass::run(Module &M, ModuleAnalysisManager &MAM) { - SplitModuleTimer SMT( - "total", "total pass runtime (incl. potentially waiting for lockfile)"); - FunctionAnalysisManager &FAM = MAM.getResult(M).getManager(); const auto TTIGetter = [&FAM](Function &F) -> const TargetTransformInfo & { return FAM.getResult(F); }; - - bool Done = false; -#ifndef NDEBUG - if (UseLockFile) { - SmallString<128> LockFilePath; - sys::path::system_temp_directory(/*ErasedOnReboot=*/true, LockFilePath); - sys::path::append(LockFilePath, "amdgpu-split-module-debug"); - LLVM_DEBUG(dbgs() << DEBUG_TYPE " using lockfile '" << LockFilePath - << "'\n"); - - while (true) { - llvm::LockFileManager Locked(LockFilePath.str()); - switch (Locked) { - case LockFileManager::LFS_Error: - LLVM_DEBUG( - dbgs() << "[amdgpu-split-module] unable to acquire lockfile, debug " - "output may be mangled by other processes\n"); - Locked.unsafeRemoveLockFile(); - break; - case LockFileManager::LFS_Owned: - break; - case LockFileManager::LFS_Shared: { - switch (Locked.waitForUnlock()) { - case LockFileManager::Res_Success: - break; - case LockFileManager::Res_OwnerDied: - continue; // try again to get the lock. - case LockFileManager::Res_Timeout: - LLVM_DEBUG( - dbgs() - << "[amdgpu-split-module] unable to acquire lockfile, debug " - "output may be mangled by other processes\n"); - Locked.unsafeRemoveLockFile(); - break; // give up - } - break; - } - } - - splitAMDGPUModule(TTIGetter, M, N, ModuleCallback); - Done = true; - break; - } - } -#endif - - if (!Done) - splitAMDGPUModule(TTIGetter, M, N, ModuleCallback); - - // We can change linkage/visibilities in the input, consider that nothing is - // preserved just to be safe. This pass runs last anyway. - return PreservedAnalyses::none(); + splitAMDGPUModule(TTIGetter, M, N, ModuleCallback); + // We don't change the original module. + return PreservedAnalyses::all(); } -} // namespace llvm diff --git a/llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp b/llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp index 383dfcc31117c..c016b2dd91dc6 100644 --- a/llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp +++ b/llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp @@ -72,7 +72,7 @@ class AVRAsmParser : public MCTargetAsmParser { int parseRegisterName(); int parseRegister(bool RestoreOnFailure = false); bool tryParseRegisterOperand(OperandVector &Operands); - bool tryParseExpression(OperandVector &Operands); + bool tryParseExpression(OperandVector &Operands, int64_t offset); bool tryParseRelocExpression(OperandVector &Operands); void eatComma(); @@ -418,7 +418,7 @@ bool AVRAsmParser::tryParseRegisterOperand(OperandVector &Operands) { return false; } -bool AVRAsmParser::tryParseExpression(OperandVector &Operands) { +bool AVRAsmParser::tryParseExpression(OperandVector &Operands, int64_t offset) { SMLoc S = Parser.getTok().getLoc(); if (!tryParseRelocExpression(Operands)) @@ -437,6 +437,11 @@ bool AVRAsmParser::tryParseExpression(OperandVector &Operands) { if (getParser().parseExpression(Expression)) return true; + if (offset) { + Expression = MCBinaryExpr::createAdd( + Expression, MCConstantExpr::create(offset, getContext()), getContext()); + } + SMLoc E = SMLoc::getFromPointer(Parser.getTok().getLoc().getPointer() - 1); Operands.push_back(AVROperand::CreateImm(Expression, S, E)); return false; @@ -529,8 +534,9 @@ bool AVRAsmParser::parseOperand(OperandVector &Operands, bool maybeReg) { [[fallthrough]]; case AsmToken::LParen: case AsmToken::Integer: + return tryParseExpression(Operands, 0); case AsmToken::Dot: - return tryParseExpression(Operands); + return tryParseExpression(Operands, 2); case AsmToken::Plus: case AsmToken::Minus: { // If the sign preceeds a number, parse the number, @@ -540,7 +546,7 @@ bool AVRAsmParser::parseOperand(OperandVector &Operands, bool maybeReg) { case AsmToken::BigNum: case AsmToken::Identifier: case AsmToken::Real: - if (!tryParseExpression(Operands)) + if (!tryParseExpression(Operands, 0)) return false; break; default: @@ -643,6 +649,7 @@ bool AVRAsmParser::ParseInstruction(ParseInstructionInfo &Info, // These specific operands should be treated as addresses/symbols/labels, // other than registers. bool maybeReg = true; + if (OperandNum == 1) { std::array Insts = {"lds", "adiw", "sbiw", "ldi"}; for (auto Inst : Insts) { diff --git a/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp b/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp index 0d29912bee264..388d58a82214d 100644 --- a/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp +++ b/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp @@ -94,6 +94,9 @@ static void adjustRelativeBranch(unsigned Size, const MCFixup &Fixup, // Rightshifts the value by one. AVR::fixups::adjustBranchTarget(Value); + + // Jumps are relative to the current instruction. + Value -= 1; } /// 22-bit absolute fixup. @@ -513,15 +516,10 @@ bool AVRAsmBackend::shouldForceRelocation(const MCAssembler &Asm, switch ((unsigned)Fixup.getKind()) { default: return Fixup.getKind() >= FirstLiteralRelocationKind; - // Fixups which should always be recorded as relocations. case AVR::fixup_7_pcrel: case AVR::fixup_13_pcrel: - // Do not force relocation for PC relative branch like 'rjmp .', - // 'rcall . - off' and 'breq . + off'. - if (const auto *SymA = Target.getSymA()) - if (SymA->getSymbol().getName().size() == 0) - return false; - [[fallthrough]]; + // Always resolve relocations for PC-relative branches + return false; case AVR::fixup_call: return true; } diff --git a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp index 603a1565e48c4..79746201133bd 100644 --- a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp +++ b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp @@ -1762,54 +1762,52 @@ static void addNoReturnAttrs(const SCCNodeSet &SCCNodes, } } -static bool -allBBPathsGoThroughCold(BasicBlock *BB, - SmallDenseMap &Visited) { - // If BB contains a cold callsite this path through the CG is cold. - // Ignore whether the instructions actually are guranteed to transfer - // execution. Divergent behavior is considered unlikely. - if (any_of(*BB, [](Instruction &I) { - if (auto *CB = dyn_cast(&I)) - return CB->hasFnAttr(Attribute::Cold); - return false; - })) { - Visited[BB] = true; - return true; - } - - auto Succs = successors(BB); - // We found a path that doesn't go through any cold callsite. - if (Succs.empty()) - return false; +static bool allPathsGoThroughCold(Function &F) { + SmallDenseMap ColdPaths; + ColdPaths[&F.front()] = false; + SmallVector Jobs; + Jobs.push_back(&F.front()); + + while (!Jobs.empty()) { + BasicBlock *BB = Jobs.pop_back_val(); + + // If block contains a cold callsite this path through the CG is cold. + // Ignore whether the instructions actually are guaranteed to transfer + // execution. Divergent behavior is considered unlikely. + if (any_of(*BB, [](Instruction &I) { + if (auto *CB = dyn_cast(&I)) + return CB->hasFnAttr(Attribute::Cold); + return false; + })) { + ColdPaths[BB] = true; + continue; + } - // We didn't find a cold callsite in this BB, so check that all successors - // contain a cold callsite (or that their successors do). - // Potential TODO: We could use static branch hints to assume certain - // successor paths are inherently cold, irrespective of if they contain a cold - // callsite. - for (auto *Succ : Succs) { - // Start with false, this is necessary to ensure we don't turn loops into - // cold. - auto R = Visited.try_emplace(Succ, false); - if (!R.second) { - if (R.first->second) - continue; + auto Succs = successors(BB); + // We found a path that doesn't go through any cold callsite. + if (Succs.empty()) return false; + + // We didn't find a cold callsite in this BB, so check that all successors + // contain a cold callsite (or that their successors do). + // Potential TODO: We could use static branch hints to assume certain + // successor paths are inherently cold, irrespective of if they contain a + // cold callsite. + for (BasicBlock *Succ : Succs) { + // Start with false, this is necessary to ensure we don't turn loops into + // cold. + auto [Iter, Inserted] = ColdPaths.try_emplace(Succ, false); + if (!Inserted) { + if (Iter->second) + continue; + return false; + } + Jobs.push_back(Succ); } - if (!allBBPathsGoThroughCold(Succ, Visited)) - return false; - Visited[Succ] = true; } - return true; } -static bool allPathsGoThroughCold(Function &F) { - SmallDenseMap Visited; - Visited[&F.front()] = false; - return allBBPathsGoThroughCold(&F.front(), Visited); -} - // Set the cold function attribute if possible. static void addColdAttrs(const SCCNodeSet &SCCNodes, SmallSet &Changed) { diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 526ae4e883439..86c7dceffc524 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -2537,14 +2537,19 @@ static bool hoistAdd(ICmpInst::Predicate Pred, Value *VariantLHS, Value *InvariantRHS, ICmpInst &ICmp, Loop &L, ICFLoopSafetyInfo &SafetyInfo, MemorySSAUpdater &MSSAU, AssumptionCache *AC, DominatorTree *DT) { - assert(ICmpInst::isSigned(Pred) && "Not supported yet!"); assert(!L.isLoopInvariant(VariantLHS) && "Precondition."); assert(L.isLoopInvariant(InvariantRHS) && "Precondition."); + bool IsSigned = ICmpInst::isSigned(Pred); + // Try to represent VariantLHS as sum of invariant and variant operands. using namespace PatternMatch; Value *VariantOp, *InvariantOp; - if (!match(VariantLHS, m_NSWAdd(m_Value(VariantOp), m_Value(InvariantOp)))) + if (IsSigned && + !match(VariantLHS, m_NSWAdd(m_Value(VariantOp), m_Value(InvariantOp)))) + return false; + if (!IsSigned && + !match(VariantLHS, m_NUWAdd(m_Value(VariantOp), m_Value(InvariantOp)))) return false; // LHS itself is a loop-variant, try to represent it in the form: @@ -2559,17 +2564,20 @@ static bool hoistAdd(ICmpInst::Predicate Pred, Value *VariantLHS, // normal linear arithmetics). Overflows make things much more complicated, so // we want to avoid this. auto &DL = L.getHeader()->getDataLayout(); - bool ProvedNoOverflowAfterReassociate = - computeOverflowForSignedSub(InvariantRHS, InvariantOp, - SimplifyQuery(DL, DT, AC, &ICmp)) == - llvm::OverflowResult::NeverOverflows; - if (!ProvedNoOverflowAfterReassociate) + SimplifyQuery SQ(DL, DT, AC, &ICmp); + if (IsSigned && computeOverflowForSignedSub(InvariantRHS, InvariantOp, SQ) != + llvm::OverflowResult::NeverOverflows) + return false; + if (!IsSigned && + computeOverflowForUnsignedSub(InvariantRHS, InvariantOp, SQ) != + llvm::OverflowResult::NeverOverflows) return false; auto *Preheader = L.getLoopPreheader(); assert(Preheader && "Loop is not in simplify form?"); IRBuilder<> Builder(Preheader->getTerminator()); - Value *NewCmpOp = Builder.CreateSub(InvariantRHS, InvariantOp, "invariant.op", - /*HasNUW*/ false, /*HasNSW*/ true); + Value *NewCmpOp = + Builder.CreateSub(InvariantRHS, InvariantOp, "invariant.op", + /*HasNUW*/ !IsSigned, /*HasNSW*/ IsSigned); ICmp.setPredicate(Pred); ICmp.setOperand(0, VariantOp); ICmp.setOperand(1, NewCmpOp); @@ -2584,14 +2592,19 @@ static bool hoistSub(ICmpInst::Predicate Pred, Value *VariantLHS, Value *InvariantRHS, ICmpInst &ICmp, Loop &L, ICFLoopSafetyInfo &SafetyInfo, MemorySSAUpdater &MSSAU, AssumptionCache *AC, DominatorTree *DT) { - assert(ICmpInst::isSigned(Pred) && "Not supported yet!"); assert(!L.isLoopInvariant(VariantLHS) && "Precondition."); assert(L.isLoopInvariant(InvariantRHS) && "Precondition."); + bool IsSigned = ICmpInst::isSigned(Pred); + // Try to represent VariantLHS as sum of invariant and variant operands. using namespace PatternMatch; Value *VariantOp, *InvariantOp; - if (!match(VariantLHS, m_NSWSub(m_Value(VariantOp), m_Value(InvariantOp)))) + if (IsSigned && + !match(VariantLHS, m_NSWSub(m_Value(VariantOp), m_Value(InvariantOp)))) + return false; + if (!IsSigned && + !match(VariantLHS, m_NUWSub(m_Value(VariantOp), m_Value(InvariantOp)))) return false; bool VariantSubtracted = false; @@ -2613,16 +2626,26 @@ static bool hoistSub(ICmpInst::Predicate Pred, Value *VariantLHS, // "C1 - C2" does not overflow. auto &DL = L.getHeader()->getDataLayout(); SimplifyQuery SQ(DL, DT, AC, &ICmp); - if (VariantSubtracted) { + if (VariantSubtracted && IsSigned) { // C1 - LV < C2 --> LV > C1 - C2 if (computeOverflowForSignedSub(InvariantOp, InvariantRHS, SQ) != llvm::OverflowResult::NeverOverflows) return false; - } else { + } else if (VariantSubtracted && !IsSigned) { + // C1 - LV < C2 --> LV > C1 - C2 + if (computeOverflowForUnsignedSub(InvariantOp, InvariantRHS, SQ) != + llvm::OverflowResult::NeverOverflows) + return false; + } else if (!VariantSubtracted && IsSigned) { // LV - C1 < C2 --> LV < C1 + C2 if (computeOverflowForSignedAdd(InvariantOp, InvariantRHS, SQ) != llvm::OverflowResult::NeverOverflows) return false; + } else { // !VariantSubtracted && !IsSigned + // LV - C1 < C2 --> LV < C1 + C2 + if (computeOverflowForUnsignedAdd(InvariantOp, InvariantRHS, SQ) != + llvm::OverflowResult::NeverOverflows) + return false; } auto *Preheader = L.getLoopPreheader(); assert(Preheader && "Loop is not in simplify form?"); @@ -2630,9 +2653,9 @@ static bool hoistSub(ICmpInst::Predicate Pred, Value *VariantLHS, Value *NewCmpOp = VariantSubtracted ? Builder.CreateSub(InvariantOp, InvariantRHS, "invariant.op", - /*HasNUW*/ false, /*HasNSW*/ true) + /*HasNUW*/ !IsSigned, /*HasNSW*/ IsSigned) : Builder.CreateAdd(InvariantOp, InvariantRHS, "invariant.op", - /*HasNUW*/ false, /*HasNSW*/ true); + /*HasNUW*/ !IsSigned, /*HasNSW*/ IsSigned); ICmp.setPredicate(Pred); ICmp.setOperand(0, VariantOp); ICmp.setOperand(1, NewCmpOp); @@ -2650,10 +2673,6 @@ static bool hoistAddSub(Instruction &I, Loop &L, ICFLoopSafetyInfo &SafetyInfo, if (!match(&I, m_ICmp(Pred, m_Value(LHS), m_Value(RHS)))) return false; - // TODO: Support unsigned predicates? - if (!ICmpInst::isSigned(Pred)) - return false; - // Put variant operand to LHS position. if (L.isLoopInvariant(LHS)) { std::swap(LHS, RHS); diff --git a/llvm/lib/Transforms/Utils/CodeExtractor.cpp b/llvm/lib/Transforms/Utils/CodeExtractor.cpp index cf00299812bb7..a60d70244110d 100644 --- a/llvm/lib/Transforms/Utils/CodeExtractor.cpp +++ b/llvm/lib/Transforms/Utils/CodeExtractor.cpp @@ -937,7 +937,6 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs, case Attribute::NoUnwind: case Attribute::NoSanitizeBounds: case Attribute::NoSanitizeCoverage: - case Attribute::NoSanitizeRealtime: case Attribute::NullPointerIsValid: case Attribute::OptimizeForDebugging: case Attribute::OptForFuzzing: @@ -952,6 +951,7 @@ Function *CodeExtractor::constructFunction(const ValueSet &inputs, case Attribute::SanitizeHWAddress: case Attribute::SanitizeMemTag: case Attribute::SanitizeRealtime: + case Attribute::SanitizeRealtimeUnsafe: case Attribute::SpeculativeLoadHardening: case Attribute::StackProtect: case Attribute::StackProtectReq: diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp index edb2567fa057b..4c0a1c4c094b9 100644 --- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp @@ -2605,7 +2605,7 @@ class BoUpSLP { int Score = LookAhead.getScoreAtLevelRec(Candidates[I].first, Candidates[I].second, /*U1=*/nullptr, /*U2=*/nullptr, - /*Level=*/1, std::nullopt); + /*CurrLevel=*/1, std::nullopt); if (Score > BestScore) { BestScore = Score; Index = I; @@ -13137,7 +13137,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { } bool IsReverseOrder = isReverseOrder(E->ReorderIndices); - auto FinalShuffle = [&](Value *V, const TreeEntry *E, VectorType *VecTy) { + auto FinalShuffle = [&](Value *V, const TreeEntry *E) { ShuffleInstructionBuilder ShuffleBuilder(ScalarTy, Builder, *this); if (E->getOpcode() == Instruction::Store && E->State == TreeEntry::Vectorize) { @@ -13197,7 +13197,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { PH->getParent()->getFirstInsertionPt()); Builder.SetCurrentDebugLocation(PH->getDebugLoc()); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; if (PostponedPHIs) @@ -13249,7 +13249,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { if (const TreeEntry *TE = getTreeEntry(V)) V = TE->VectorizedValue; setInsertPointAfterBundle(E); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; return V; } @@ -13259,7 +13259,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { Value *Ptr = LI->getPointerOperand(); LoadInst *V = Builder.CreateAlignedLoad(VecTy, Ptr, LI->getAlign()); Value *NewV = propagateMetadata(V, E->Scalars); - NewV = FinalShuffle(NewV, E, VecTy); + NewV = FinalShuffle(NewV, E); E->VectorizedValue = NewV; return NewV; } @@ -13474,7 +13474,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { Value *V = (VecOpcode != ShuffleOrOp && VecOpcode == Instruction::BitCast) ? InVec : Builder.CreateCast(VecOpcode, InVec, VecTy); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13518,7 +13518,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { propagateIRFlags(V, E->Scalars, VL0); // Do not cast for cmps. VecTy = cast(V->getType()); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13571,7 +13571,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { assert(getNumElements(Cond->getType()) == TrueNumElements && "Cannot vectorize Instruction::Select"); Value *V = Builder.CreateSelect(Cond, True, False); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13593,7 +13593,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { if (auto *I = dyn_cast(V)) V = propagateMetadata(I, E->Scalars); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13611,7 +13611,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { } Value *V = Builder.CreateFreeze(Op); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13655,7 +13655,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { auto *CI = dyn_cast(Op); return CI && CI->getValue().countr_one() >= It->second.first; })) { - V = FinalShuffle(I == 0 ? RHS : LHS, E, VecTy); + V = FinalShuffle(I == 0 ? RHS : LHS, E); E->VectorizedValue = V; ++NumVectorInstructions; return V; @@ -13688,7 +13688,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { I->setHasNoUnsignedWrap(/*b=*/false); } - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13780,7 +13780,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { } Value *V = propagateMetadata(NewLI, E->Scalars); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; return V; @@ -13794,7 +13794,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { if (VecValue->getType() != VecTy) VecValue = Builder.CreateIntCast(VecValue, VecTy, GetOperandSignedness(0)); - VecValue = FinalShuffle(VecValue, E, VecTy); + VecValue = FinalShuffle(VecValue, E); Value *Ptr = SI->getPointerOperand(); Instruction *ST; @@ -13859,7 +13859,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { V = propagateMetadata(I, GEPs); } - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -13941,7 +13941,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { Value *V = Builder.CreateCall(CF, OpVecs, OpBundles); propagateIRFlags(V, E->Scalars, VL0); - V = FinalShuffle(V, E, VecTy); + V = FinalShuffle(V, E); E->VectorizedValue = V; ++NumVectorInstructions; @@ -14039,6 +14039,7 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E, bool PostponedPHIs) { "Expected same type as operand."); if (auto *I = dyn_cast(LHS)) LHS = propagateMetadata(I, E->Scalars); + LHS = FinalShuffle(LHS, E); E->VectorizedValue = LHS; ++NumVectorInstructions; return LHS; @@ -19148,7 +19149,6 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) { } // Undefs come last. assert(U1 && U2 && "The only thing left should be undef & undef."); - continue; } return false; }; diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index ee7c7cea0b767..9796ee64f6ef9 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -878,6 +878,17 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { // value with the others blended into it. unsigned StartIndex = 0; + for (unsigned I = 0; I != Blend->getNumIncomingValues(); ++I) { + // If a value's mask is used only by the blend then is can be deadcoded. + // TODO: Find the most expensive mask that can be deadcoded, or a mask + // that's used by multiple blends where it can be removed from them all. + VPValue *Mask = Blend->getMask(I); + if (Mask->getNumUsers() == 1 && !match(Mask, m_False())) { + StartIndex = I; + break; + } + } + SmallVector OperandsWithMask; OperandsWithMask.push_back(Blend->getIncomingValue(StartIndex)); @@ -956,6 +967,7 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { m_LogicalAnd(m_VPValue(X1), m_Not(m_VPValue(Y1))))) && X == X1 && Y == Y1) { R.getVPSingleValue()->replaceAllUsesWith(X); + R.eraseFromParent(); return; } diff --git a/llvm/test/Bitcode/attributes.ll b/llvm/test/Bitcode/attributes.ll index 835622276ef27..a66eda19ff573 100644 --- a/llvm/test/Bitcode/attributes.ll +++ b/llvm/test/Bitcode/attributes.ll @@ -512,8 +512,7 @@ define void @f92() sanitize_realtime } ; CHECK: define void @f93() #54 -define void @f93() nosanitize_realtime -{ +define void @f93() sanitize_realtime_unsafe { ret void; } @@ -612,7 +611,7 @@ define void @initializes(ptr initializes((-4, 0), (4, 8)) %a) { ; CHECK: attributes #51 = { uwtable(sync) } ; CHECK: attributes #52 = { nosanitize_bounds } ; CHECK: attributes #53 = { sanitize_realtime } -; CHECK: attributes #54 = { nosanitize_realtime } +; CHECK: attributes #54 = { sanitize_realtime_unsafe } ; CHECK: attributes [[FNRETTHUNKEXTERN]] = { fn_ret_thunk_extern } ; CHECK: attributes [[SKIPPROFILE]] = { skipprofile } ; CHECK: attributes [[OPTDEBUG]] = { optdebug } diff --git a/llvm/test/Bitcode/compatibility.ll b/llvm/test/Bitcode/compatibility.ll index c401cde8e146e..35c43b7d09446 100644 --- a/llvm/test/Bitcode/compatibility.ll +++ b/llvm/test/Bitcode/compatibility.ll @@ -1992,8 +1992,8 @@ declare void @f.sanitize_numerical_stability() sanitize_numerical_stability declare void @f.sanitize_realtime() sanitize_realtime ; CHECK: declare void @f.sanitize_realtime() #52 -declare void @f.nosanitize_realtime() nosanitize_realtime -; CHECK: declare void @f.nosanitize_realtime() #53 +declare void @f.sanitize_realtime_unsafe() sanitize_realtime_unsafe +; CHECK: declare void @f.sanitize_realtime_unsafe() #53 ; CHECK: declare nofpclass(snan) float @nofpclass_snan(float nofpclass(snan)) declare nofpclass(snan) float @nofpclass_snan(float nofpclass(snan)) @@ -2118,7 +2118,7 @@ define float @nofpclass_callsites(float %arg) { ; CHECK: attributes #50 = { allockind("alloc,uninitialized") } ; CHECK: attributes #51 = { sanitize_numerical_stability } ; CHECK: attributes #52 = { sanitize_realtime } -; CHECK: attributes #53 = { nosanitize_realtime } +; CHECK: attributes #53 = { sanitize_realtime_unsafe } ; CHECK: attributes #54 = { builtin } ;; Metadata diff --git a/llvm/test/CodeGen/AVR/jmp.ll b/llvm/test/CodeGen/AVR/jmp.ll new file mode 100644 index 0000000000000..95dfff4836b4e --- /dev/null +++ b/llvm/test/CodeGen/AVR/jmp.ll @@ -0,0 +1,25 @@ +; RUN: llc -filetype=obj -mtriple=avr < %s | llvm-objdump -dr --no-show-raw-insn - | FileCheck %s + +define i8 @foo(i8 %a) { +bb0: + %0 = tail call i8 @bar(i8 %a) + %1 = icmp eq i8 %0, 123 + br i1 %1, label %bb1, label %bb2 + +bb1: + ret i8 100 + +bb2: + ret i8 200 +} + +declare i8 @bar(i8); + +; CHECK: rcall .-2 +; CHECK-NEXT: 00000000: R_AVR_13_PCREL bar +; CHECK-NEXT: cpi r24, 0x7b +; CHECK-NEXT: brne .+4 +; CHECK-NEXT: ldi r24, 0x64 +; CHECK-NEXT: ret +; CHECK-NEXT: ldi r24, 0xc8 +; CHECK-NEXT: ret diff --git a/llvm/test/MC/AArch64/SVE/directive-arch-negative.s b/llvm/test/MC/AArch64/SVE/directive-arch-negative.s new file mode 100644 index 0000000000000..e3029c16ffc8a --- /dev/null +++ b/llvm/test/MC/AArch64/SVE/directive-arch-negative.s @@ -0,0 +1,8 @@ +// RUN: not llvm-mc -triple aarch64 -filetype asm -o - %s 2>&1 | FileCheck %s + +// Check that setting +nosve implies +nosve2 +.arch armv9-a+nosve + +adclb z0.s, z1.s, z31.s +// CHECK: error: instruction requires: sve2 +// CHECK-NEXT: adclb z0.s, z1.s, z31.s diff --git a/llvm/test/MC/AArch64/SVE/directive-arch_extension-negative.s b/llvm/test/MC/AArch64/SVE/directive-arch_extension-negative.s index 661f13974d0bc..31118f7490d00 100644 --- a/llvm/test/MC/AArch64/SVE/directive-arch_extension-negative.s +++ b/llvm/test/MC/AArch64/SVE/directive-arch_extension-negative.s @@ -1,7 +1,12 @@ // RUN: not llvm-mc -triple aarch64 -filetype asm -o - %s 2>&1 | FileCheck %s -.arch_extension nosve +.arch_extension sve2+nosve ptrue p0.b, pow2 // CHECK: error: instruction requires: sve or sme // CHECK-NEXT: ptrue p0.b, pow2 + +// Check that setting +nosve implies +nosve2 +adclb z0.s, z1.s, z31.s +// CHECK: error: instruction requires: sve2 +// CHECK-NEXT: adclb z0.s, z1.s, z31.s diff --git a/llvm/test/MC/AArch64/SVE/directive-cpu-negative.s b/llvm/test/MC/AArch64/SVE/directive-cpu-negative.s index 82acc1b0b0be9..6ba537ca70609 100644 --- a/llvm/test/MC/AArch64/SVE/directive-cpu-negative.s +++ b/llvm/test/MC/AArch64/SVE/directive-cpu-negative.s @@ -1,6 +1,11 @@ // RUN: not llvm-mc -triple aarch64 -filetype asm -o - %s 2>&1 | FileCheck %s -.cpu generic+sve+nosve +.cpu generic+sve2+nosve ptrue p0.b, pow2 // CHECK: error: instruction requires: sve or sme // CHECK-NEXT: ptrue p0.b, pow2 + +// Check that setting +nosve implies +nosve2 +adclb z0.s, z1.s, z31.s +// CHECK: error: instruction requires: sve2 +// CHECK-NEXT: adclb z0.s, z1.s, z31.s diff --git a/llvm/test/MC/AArch64/directive-arch-negative.s b/llvm/test/MC/AArch64/directive-arch-negative.s index f60759899aa6c..406507d5fc8f4 100644 --- a/llvm/test/MC/AArch64/directive-arch-negative.s +++ b/llvm/test/MC/AArch64/directive-arch-negative.s @@ -12,10 +12,13 @@ # CHECK-NEXT: aese v0.8h, v1.8h # CHECK-NEXT: ^ -// We silently ignore invalid features. .arch armv8+foo aese v0.8h, v1.8h +# CHECK: error: unsupported architectural extension: foo +# CHECK-NEXT: .arch armv8+foo +# CHECK-NEXT: ^ + # CHECK: error: invalid operand for instruction # CHECK-NEXT: aese v0.8h, v1.8h # CHECK-NEXT: ^ diff --git a/llvm/test/MC/AArch64/directive-arch_extension-negative.s b/llvm/test/MC/AArch64/directive-arch_extension-negative.s index 1c1cfc9d33e3e..1843af5655546 100644 --- a/llvm/test/MC/AArch64/directive-arch_extension-negative.s +++ b/llvm/test/MC/AArch64/directive-arch_extension-negative.s @@ -4,7 +4,7 @@ // RUN: -filetype asm -o - %s 2>&1 | FileCheck %s .arch_extension axp64 -// CHECK: error: unknown architectural extension: axp64 +// CHECK: error: unsupported architectural extension: axp64 // CHECK-NEXT: .arch_extension axp64 crc32cx w0, w1, x3 @@ -49,6 +49,8 @@ fminnm d0, d0, d1 // CHECK: [[@LINE-1]]:1: error: instruction requires: fp // CHECK-NEXT: fminnm d0, d0, d1 +// nofp implied nosimd, so reinstate it +.arch_extension simd addp v0.4s, v0.4s, v0.4s // CHECK-NOT: [[@LINE-1]]:1: error: instruction requires: neon .arch_extension nosimd @@ -70,6 +72,8 @@ casa w5, w7, [x20] // CHECK: [[@LINE-1]]:1: error: instruction requires: lse // CHECK-NEXT: casa w5, w7, [x20] +// nolse implied nolse128, so reinstate it +.arch_extension lse128 swpp x0, x2, [x3] // CHECK-NOT: [[@LINE-1]]:1: error: instruction requires: lse128 .arch_extension nolse128 @@ -84,6 +88,8 @@ cfp rctx, x0 // CHECK: [[@LINE-1]]:5: error: CFPRCTX requires: predres // CHECK-NEXT: cfp rctx, x0 +// nopredres implied nopredres2, so reinstate it +.arch_extension predres2 cosp rctx, x0 // CHECK-NOT: [[@LINE-1]]:6: error: COSP requires: predres2 .arch_extension nopredres2 @@ -133,6 +139,8 @@ ldapr x0, [x1] // CHECK: [[@LINE-1]]:1: error: instruction requires: rcpc // CHECK-NEXT: ldapr x0, [x1] +// norcpc implied norcpc3, so reinstate it +.arch_extension rcpc3 stilp w24, w0, [x16, #-8]! // CHECK-NOT: [[@LINE-1]]:1: error: instruction requires: rcpc3 .arch_extension norcpc3 @@ -169,6 +177,8 @@ cpyfp [x0]!, [x1]!, x2! // CHECK: [[@LINE-1]]:1: error: instruction requires: mops // CHECK-NEXT: cpyfp [x0]!, [x1]!, x2! +// nolse128 implied nod128, so reinstate it +.arch_extension d128 // This needs to come before `.arch_extension nothe` as it uses an instruction // that requires both the and d128 sysp #0, c2, c0, #0, x0, x1 @@ -204,6 +214,8 @@ umax x0, x1, x2 // CHECK: [[@LINE-1]]:1: error: instruction requires: cssc // CHECK-NEXT: umax x0, x1, x2 +// noras implied norasv2, so reinstate it +.arch_extension rasv2 mrs x0, ERXGSR_EL1 // CHECK-NOT: [[@LINE-1]]:9: error: expected readable system register .arch_extension norasv2 diff --git a/llvm/test/MC/AVR/inst-brbc.s b/llvm/test/MC/AVR/inst-brbc.s index 4d7d684da4468..3ef3664cf07bf 100644 --- a/llvm/test/MC/AVR/inst-brbc.s +++ b/llvm/test/MC/AVR/inst-brbc.s @@ -3,7 +3,6 @@ ; RUN: | llvm-objdump -d - | FileCheck --check-prefix=INST %s foo: - brbc 3, .+8 brbc 0, .-16 .short 0xf759 @@ -11,14 +10,16 @@ foo: .short 0xf74c .short 0xf4c7 -; CHECK: brvc .Ltmp0+8 ; encoding: [0bAAAAA011,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp0+8, kind: fixup_7_pcrel -; CHECK: brcc .Ltmp1-16 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp1-16, kind: fixup_7_pcrel +; CHECK: brvc (.Ltmp0+8)+2 ; encoding: [0bAAAAA011,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+8)+2, kind: fixup_7_pcrel +; +; CHECK: brcc (.Ltmp1-16)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-16)+2, kind: fixup_7_pcrel -; INST: 23 f4 brvc .+8 -; INST: c0 f7 brsh .-16 -; INST: 59 f7 brne .-42 -; INST: 52 f7 brpl .-44 -; INST: 4c f7 brge .-46 -; INST: c7 f4 brid .+48 +; INST-LABEL: : +; INST-NEXT: 23 f4 brvc .+8 +; INST-NEXT: c0 f7 brsh .-16 +; INST-NEXT: 59 f7 brne .-42 +; INST-NEXT: 52 f7 brpl .-44 +; INST-NEXT: 4c f7 brge .-46 +; INST-NEXT: c7 f4 brid .+48 diff --git a/llvm/test/MC/AVR/inst-brbs.s b/llvm/test/MC/AVR/inst-brbs.s index 7987feeec654a..f15a779a53654 100644 --- a/llvm/test/MC/AVR/inst-brbs.s +++ b/llvm/test/MC/AVR/inst-brbs.s @@ -3,7 +3,6 @@ ; RUN: | llvm-objdump -d - | FileCheck --check-prefix=INST %s foo: - brbs 3, .+8 brbs 0, .-12 .short 0xf359 @@ -11,14 +10,15 @@ foo: .short 0xf34c .short 0xf077 -; CHECK: brvs .Ltmp0+8 ; encoding: [0bAAAAA011,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp0+8, kind: fixup_7_pcrel -; CHECK: brcs .Ltmp1-12 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp1-12, kind: fixup_7_pcrel +; CHECK: brvs (.Ltmp0+8)+2 ; encoding: [0bAAAAA011,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+8)+2, kind: fixup_7_pcrel +; CHECK: brcs (.Ltmp1-12)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-12)+2, kind: fixup_7_pcrel -; INST: 23 f0 brvs .+8 -; INST: d0 f3 brlo .-12 -; INST: 59 f3 breq .-42 -; INST: 52 f3 brmi .-44 -; INST: 4c f3 brlt .-46 -; INST: 77 f0 brie .+28 +; INST-LABEL: : +; INST-NEXT: 23 f0 brvs .+8 +; INST-NEXT: d0 f3 brlo .-12 +; INST-NEXT: 59 f3 breq .-42 +; INST-NEXT: 52 f3 brmi .-44 +; INST-NEXT: 4c f3 brlt .-46 +; INST-NEXT: 77 f0 brie .+28 diff --git a/llvm/test/MC/AVR/inst-brcc.s b/llvm/test/MC/AVR/inst-brcc.s new file mode 100644 index 0000000000000..d9218bc61e787 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brcc.s @@ -0,0 +1,28 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brcc .+66 + brcc .-22 + brbc 0, .+66 + brbc 0, bar + +bar: + +; CHECK: brcc (.Ltmp0+66)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+66)+2, kind: fixup_7_pcrel +; CHECK: brcc (.Ltmp1-22)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-22)+2, kind: fixup_7_pcrel +; CHECK: brcc (.Ltmp2+66)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2+66)+2, kind: fixup_7_pcrel +; CHECK: brcc bar ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 08 f5 brsh .+66 +; INST-NEXT: a8 f7 brsh .-22 +; INST-NEXT: 08 f5 brsh .+66 +; INST-NEXT: 00 f4 brsh .+0 diff --git a/llvm/test/MC/AVR/inst-brcs.s b/llvm/test/MC/AVR/inst-brcs.s new file mode 100644 index 0000000000000..0012cb31f6126 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brcs.s @@ -0,0 +1,28 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brcs .+8 + brcs .+4 + brbs 0, .+8 + brbs 0, bar + +bar: + +; CHECK: brcs (.Ltmp0+8)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+8)+2, kind: fixup_7_pcrel +; CHECK: brcs (.Ltmp1+4)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+4)+2, kind: fixup_7_pcrel +; CHECK: brcs (.Ltmp2+8)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2+8)+2, kind: fixup_7_pcrel +; CHECK: brcs bar ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 20 f0 brlo .+8 +; INST-NEXT: 10 f0 brlo .+4 +; INST-NEXT: 20 f0 brlo .+8 +; INST-NEXT: 00 f0 brlo .+0 diff --git a/llvm/test/MC/AVR/inst-breq.s b/llvm/test/MC/AVR/inst-breq.s new file mode 100644 index 0000000000000..f82010f02ba61 --- /dev/null +++ b/llvm/test/MC/AVR/inst-breq.s @@ -0,0 +1,28 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + breq .-18 + breq .-12 + brbs 1, .-18 + brbs 1, bar + +bar: + +; CHECK: breq (.Ltmp0-18)+2 ; encoding: [0bAAAAA001,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0-18)+2, kind: fixup_7_pcrel +; CHECK: breq (.Ltmp1-12)+2 ; encoding: [0bAAAAA001,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-12)+2, kind: fixup_7_pcrel +; CHECK: brbs 1, (.Ltmp2-18)+2 ; encoding: [0bAAAAA001,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2-18)+2, kind: fixup_7_pcrel +; CHECK: brbs 1, bar ; encoding: [0bAAAAA001,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: b9 f3 breq .-18 +; INST-NEXT: d1 f3 breq .-12 +; INST-NEXT: b9 f3 breq .-18 +; INST-NEXT: 01 f0 breq .+0 diff --git a/llvm/test/MC/AVR/inst-brge.s b/llvm/test/MC/AVR/inst-brge.s new file mode 100644 index 0000000000000..1121284a11468 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brge.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brge .+50 + brge .+42 + brge bar + +bar: + +; CHECK: brge (.Ltmp0+50)+2 ; encoding: [0bAAAAA100,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+50)+2, kind: fixup_7_pcrel +; CHECK: brge (.Ltmp1+42)+2 ; encoding: [0bAAAAA100,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+42)+2, kind: fixup_7_pcrel +; CHECK: brge bar ; encoding: [0bAAAAA100,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: cc f4 brge .+50 +; INST-NEXT: ac f4 brge .+42 +; INST-NEXT: 04 f4 brge .+0 diff --git a/llvm/test/MC/AVR/inst-brhc.s b/llvm/test/MC/AVR/inst-brhc.s new file mode 100644 index 0000000000000..eb16ac2ef7a64 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brhc.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brhc .+12 + brhc .+14 + brhc bar + +bar: + +; CHECK: brhc (.Ltmp0+12)+2 ; encoding: [0bAAAAA101,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+12)+2, kind: fixup_7_pcrel +; CHECK: brhc (.Ltmp1+14)+2 ; encoding: [0bAAAAA101,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+14)+2, kind: fixup_7_pcrel +; CHECK: brhc bar ; encoding: [0bAAAAA101,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 35 f4 brhc .+12 +; INST-NEXT: 3d f4 brhc .+14 +; INST-NEXT: 05 f4 brhc .+0 diff --git a/llvm/test/MC/AVR/inst-brhs.s b/llvm/test/MC/AVR/inst-brhs.s new file mode 100644 index 0000000000000..77c49596b3b0b --- /dev/null +++ b/llvm/test/MC/AVR/inst-brhs.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brhs .-66 + brhs .+14 + brhs bar + +bar: + +; CHECK: brhs (.Ltmp0-66)+2 ; encoding: [0bAAAAA101,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0-66)+2, kind: fixup_7_pcrel +; CHECK: brhs (.Ltmp1+14)+2 ; encoding: [0bAAAAA101,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+14)+2, kind: fixup_7_pcrel +; CHECK: brhs bar ; encoding: [0bAAAAA101,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: fd f2 brhs .-66 +; INST-NEXT: 3d f0 brhs .+14 +; INST-NEXT: 05 f0 brhs .+0 diff --git a/llvm/test/MC/AVR/inst-brid.s b/llvm/test/MC/AVR/inst-brid.s new file mode 100644 index 0000000000000..70d0ea83c49b2 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brid.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brid .+42 + brid .+62 + brid bar + +bar: + +; CHECK: brid (.Ltmp0+42)+2 ; encoding: [0bAAAAA111,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+42)+2, kind: fixup_7_pcrel +; CHECK: brid (.Ltmp1+62)+2 ; encoding: [0bAAAAA111,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+62)+2, kind: fixup_7_pcrel +; CHECK: brid bar ; encoding: [0bAAAAA111,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: af f4 brid .+42 +; INST-NEXT: ff f4 brid .+62 +; INST-NEXT: 07 f4 brid .+0 diff --git a/llvm/test/MC/AVR/inst-brie.s b/llvm/test/MC/AVR/inst-brie.s new file mode 100644 index 0000000000000..717c686e2ed44 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brie.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brie .+20 + brie .+40 + brie bar + +bar: + +; CHECK: brie (.Ltmp0+20)+2 ; encoding: [0bAAAAA111,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+20)+2, kind: fixup_7_pcrel +; CHECK: brie (.Ltmp1+40)+2 ; encoding: [0bAAAAA111,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+40)+2, kind: fixup_7_pcrel +; CHECK: brie bar ; encoding: [0bAAAAA111,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 57 f0 brie .+20 +; INST-NEXT: a7 f0 brie .+40 +; INST-NEXT: 07 f0 brie .+0 diff --git a/llvm/test/MC/AVR/inst-brlo.s b/llvm/test/MC/AVR/inst-brlo.s new file mode 100644 index 0000000000000..4b56d66ffdfe0 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brlo.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brlo .+12 + brlo .+28 + brlo bar + +bar: + +; CHECK: brlo (.Ltmp0+12)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+12)+2, kind: fixup_7_pcrel +; CHECK: brlo (.Ltmp1+28)+2 ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+28)+2, kind: fixup_7_pcrel +; CHECK: brlo bar ; encoding: [0bAAAAA000,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 30 f0 brlo .+12 +; INST-NEXT: 70 f0 brlo .+28 +; INST-NEXT: 00 f0 brlo .+0 diff --git a/llvm/test/MC/AVR/inst-brlt.s b/llvm/test/MC/AVR/inst-brlt.s new file mode 100644 index 0000000000000..8a7c543f9444b --- /dev/null +++ b/llvm/test/MC/AVR/inst-brlt.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brlt .+16 + brlt .+2 + brlt bar + +bar: + +; CHECK: brlt (.Ltmp0+16)+2 ; encoding: [0bAAAAA100,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+16)+2, kind: fixup_7_pcrel +; CHECK: brlt (.Ltmp1+2)+2 ; encoding: [0bAAAAA100,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+2)+2, kind: fixup_7_pcrel +; CHECK: brlt bar ; encoding: [0bAAAAA100,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 44 f0 brlt .+16 +; INST-NEXT: 0c f0 brlt .+2 +; INST-NEXT: 04 f0 brlt .+0 diff --git a/llvm/test/MC/AVR/inst-brmi.s b/llvm/test/MC/AVR/inst-brmi.s new file mode 100644 index 0000000000000..878612d294dd9 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brmi.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brmi .+66 + brmi .+58 + brmi bar + +bar: + +; CHECK: brmi (.Ltmp0+66)+2 ; encoding: [0bAAAAA010,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+66)+2, kind: fixup_7_pcrel +; CHECK: brmi (.Ltmp1+58)+2 ; encoding: [0bAAAAA010,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+58)+2, kind: fixup_7_pcrel +; CHECK: brmi bar ; encoding: [0bAAAAA010,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 0a f1 brmi .+66 +; INST-NEXT: ea f0 brmi .+58 +; INST-NEXT: 02 f0 brmi .+0 diff --git a/llvm/test/MC/AVR/inst-brne.s b/llvm/test/MC/AVR/inst-brne.s new file mode 100644 index 0000000000000..9d6bee4b754d9 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brne.s @@ -0,0 +1,28 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brne .+10 + brne .+2 + brbc 1, .+10 + brbc 1, bar + +bar: + +; CHECK: brne (.Ltmp0+10)+2 ; encoding: [0bAAAAA001,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+10)+2, kind: fixup_7_pcrel +; CHECK: brne (.Ltmp1+2)+2 ; encoding: [0bAAAAA001,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+2)+2, kind: fixup_7_pcrel +; CHECK: brbc 1, (.Ltmp2+10)+2 ; encoding: [0bAAAAA001,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2+10)+2, kind: fixup_7_pcrel +; CHECK: brbc 1, bar ; encoding: [0bAAAAA001,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 29 f4 brne .+10 +; INST-NEXT: 09 f4 brne .+2 +; INST-NEXT: 29 f4 brne .+10 +; INST-NEXT: 01 f4 brne .+0 diff --git a/llvm/test/MC/AVR/inst-brpl.s b/llvm/test/MC/AVR/inst-brpl.s new file mode 100644 index 0000000000000..393365ee35339 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brpl.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brpl .-12 + brpl .+18 + brpl bar + +bar: + +; CHECK: brpl (.Ltmp0-12)+2 ; encoding: [0bAAAAA010,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0-12)+2, kind: fixup_7_pcrel +; CHECK: brpl (.Ltmp1+18)+2 ; encoding: [0bAAAAA010,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+18)+2, kind: fixup_7_pcrel +; CHECK: brpl bar ; encoding: [0bAAAAA010,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: d2 f7 brpl .-12 +; INST-NEXT: 4a f4 brpl .+18 +; INST-NEXT: 02 f4 brpl .+0 diff --git a/llvm/test/MC/AVR/inst-brsh.s b/llvm/test/MC/AVR/inst-brsh.s new file mode 100644 index 0000000000000..0bacd64d3d8d0 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brsh.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brsh .+32 + brsh .+70 + brsh bar + +bar: + +; CHECK: brsh (.Ltmp0+32)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+32)+2, kind: fixup_7_pcrel +; CHECK: brsh (.Ltmp1+70)+2 ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+70)+2, kind: fixup_7_pcrel +; CHECK: brsh bar ; encoding: [0bAAAAA000,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 80 f4 brsh .+32 +; INST-NEXT: 18 f5 brsh .+70 +; INST-NEXT: 00 f4 brsh .+0 diff --git a/llvm/test/MC/AVR/inst-brtc.s b/llvm/test/MC/AVR/inst-brtc.s new file mode 100644 index 0000000000000..eb4ee21162872 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brtc.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brtc .+52 + brtc .+50 + brtc bar + +bar: + +; CHECK: brtc (.Ltmp0+52)+2 ; encoding: [0bAAAAA110,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+52)+2, kind: fixup_7_pcrel +; CHECK: brtc (.Ltmp1+50)+2 ; encoding: [0bAAAAA110,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+50)+2, kind: fixup_7_pcrel +; CHECK: brtc bar ; encoding: [0bAAAAA110,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: d6 f4 brtc .+52 +; INST-NEXT: ce f4 brtc .+50 +; INST-NEXT: 06 f4 brtc .+0 diff --git a/llvm/test/MC/AVR/inst-brts.s b/llvm/test/MC/AVR/inst-brts.s new file mode 100644 index 0000000000000..ccd794a922589 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brts.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brts .+18 + brts .+22 + brts bar + +bar: + +; CHECK: brts (.Ltmp0+18)+2 ; encoding: [0bAAAAA110,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+18)+2, kind: fixup_7_pcrel +; CHECK: brts (.Ltmp1+22)+2 ; encoding: [0bAAAAA110,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+22)+2, kind: fixup_7_pcrel +; CHECK: brts bar ; encoding: [0bAAAAA110,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 4e f0 brts .+18 +; INST-NEXT: 5e f0 brts .+22 +; INST-NEXT: 06 f0 brts .+0 diff --git a/llvm/test/MC/AVR/inst-brvc.s b/llvm/test/MC/AVR/inst-brvc.s new file mode 100644 index 0000000000000..573f779c0dcd6 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brvc.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brvc .-28 + brvc .-62 + brvc bar + +bar: + +; CHECK: brvc (.Ltmp0-28)+2 ; encoding: [0bAAAAA011,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0-28)+2, kind: fixup_7_pcrel +; CHECK: brvc (.Ltmp1-62)+2 ; encoding: [0bAAAAA011,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-62)+2, kind: fixup_7_pcrel +; CHECK: brvc bar ; encoding: [0bAAAAA011,0b111101AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 93 f7 brvc .-28 +; INST-NEXT: 0b f7 brvc .-62 +; INST-NEXT: 03 f4 brvc .+0 diff --git a/llvm/test/MC/AVR/inst-brvs.s b/llvm/test/MC/AVR/inst-brvs.s new file mode 100644 index 0000000000000..d50a1a9ec5b62 --- /dev/null +++ b/llvm/test/MC/AVR/inst-brvs.s @@ -0,0 +1,24 @@ +; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; +; RUN: llvm-mc -filetype=obj -triple avr < %s \ +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s + +foo: + brvs .+18 + brvs .+32 + brvs bar + +bar: + +; CHECK: brvs (.Ltmp0+18)+2 ; encoding: [0bAAAAA011,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+18)+2, kind: fixup_7_pcrel +; CHECK: brvs (.Ltmp1+32)+2 ; encoding: [0bAAAAA011,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1+32)+2, kind: fixup_7_pcrel +; CHECK: brvs bar ; encoding: [0bAAAAA011,0b111100AA] +; CHECK-NEXT: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel + +; INST-LABEL: : +; INST-NEXT: 4b f0 brvs .+18 +; INST-NEXT: 83 f0 brvs .+32 +; INST-NEXT: 03 f0 brvs .+0 diff --git a/llvm/test/MC/AVR/inst-family-cond-branch.s b/llvm/test/MC/AVR/inst-family-cond-branch.s deleted file mode 100644 index dc36425a884f3..0000000000000 --- a/llvm/test/MC/AVR/inst-family-cond-branch.s +++ /dev/null @@ -1,321 +0,0 @@ -; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s -; RUN: llvm-mc -filetype=obj -triple avr < %s \ -; RUN: | llvm-objdump -d - | FileCheck --check-prefix=INST %s - - -foo: - ; BREQ - breq .-18 - breq .-12 - brbs 1, .-18 - brbs 1, baz - -; CHECK: breq .Ltmp0-18 ; encoding: [0bAAAAA001,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp0-18, kind: fixup_7_pcrel -; CHECK: breq .Ltmp1-12 ; encoding: [0bAAAAA001,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp1-12, kind: fixup_7_pcrel -; CHECK: brbs 1, .Ltmp2-18 ; encoding: [0bAAAAA001,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp2-18, kind: fixup_7_pcrel -; CHECK: brbs 1, baz ; encoding: [0bAAAAA001,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: baz, kind: fixup_7_pcrel - -; INST-LABEL: : -; INST: breq .-18 -; INST: breq .-12 -; INST: breq .-18 -; INST: breq .+0 - - ; BRNE - brne .+10 - brne .+2 - brbc 1, .+10 - brbc 1, bar - -; CHECK: brne .Ltmp3+10 ; encoding: [0bAAAAA001,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp3+10, kind: fixup_7_pcrel -; CHECK: brne .Ltmp4+2 ; encoding: [0bAAAAA001,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp4+2, kind: fixup_7_pcrel -; CHECK: brbc 1, .Ltmp5+10 ; encoding: [0bAAAAA001,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp5+10, kind: fixup_7_pcrel -; CHECK: brbc 1, bar ; encoding: [0bAAAAA001,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: bar, kind: fixup_7_pcrel - -; INST: brne .+10 -; INST: brne .+2 -; INST: brne .+10 -; INST: brne .+0 - -bar: - ; BRCS - brcs .+8 - brcs .+4 - brbs 0, .+8 - brbs 0, end - -; CHECK: brcs .Ltmp6+8 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp6+8, kind: fixup_7_pcrel -; CHECK: brcs .Ltmp7+4 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp7+4, kind: fixup_7_pcrel -; CHECK: brcs .Ltmp8+8 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp8+8, kind: fixup_7_pcrel -; CHECK: brcs end ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST-LABEL: : -; INST: brlo .+8 -; INST: brlo .+4 -; INST: brlo .+8 -; INST: brlo .+0 - - ; BRCC - brcc .+66 - brcc .-22 - brbc 0, .+66 - brbc 0, baz - -; CHECK: brcc .Ltmp9+66 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp9+66, kind: fixup_7_pcrel -; CHECK: brcc .Ltmp10-22 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp10-22, kind: fixup_7_pcrel -; CHECK: brcc .Ltmp11+66 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp11+66, kind: fixup_7_pcrel -; CHECK: brcc baz ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: baz, kind: fixup_7_pcrel - -; INST: brsh .+66 -; INST: brsh .-22 -; INST: brsh .+66 -; INST: brsh .+0 - -; BRSH - brsh .+32 - brsh .+70 - brsh car - -; CHECK: brsh .Ltmp12+32 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp12+32, kind: fixup_7_pcrel -; CHECK: brsh .Ltmp13+70 ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp13+70, kind: fixup_7_pcrel -; CHECK: brsh car ; encoding: [0bAAAAA000,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: car, kind: fixup_7_pcrel - -; INST: brsh .+32 -; INST: brsh .+70 -; INST: brsh .+0 - -baz: - - ; BRLO - brlo .+12 - brlo .+28 - brlo car - -; CHECK: brlo .Ltmp14+12 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp14+12, kind: fixup_7_pcrel -; CHECK: brlo .Ltmp15+28 ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp15+28, kind: fixup_7_pcrel -; CHECK: brlo car ; encoding: [0bAAAAA000,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: car, kind: fixup_7_pcrel - -; INST-LABEL: : -; INST: brlo .+12 -; INST: brlo .+28 -; INST: brlo .+0 - - ; BRMI - brmi .+66 - brmi .+58 - brmi car - -; CHECK: brmi .Ltmp16+66 ; encoding: [0bAAAAA010,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp16+66, kind: fixup_7_pcrel -; CHECK: brmi .Ltmp17+58 ; encoding: [0bAAAAA010,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp17+58, kind: fixup_7_pcrel -; CHECK: brmi car ; encoding: [0bAAAAA010,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: car, kind: fixup_7_pcrel - -; INST: brmi .+66 -; INST: brmi .+58 -; INST: brmi .+0 - - ; BRPL - brpl .-12 - brpl .+18 - brpl car - -; CHECK: brpl .Ltmp18-12 ; encoding: [0bAAAAA010,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp18-12, kind: fixup_7_pcrel -; CHECK: brpl .Ltmp19+18 ; encoding: [0bAAAAA010,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp19+18, kind: fixup_7_pcrel -; CHECK: brpl car ; encoding: [0bAAAAA010,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: car, kind: fixup_7_pcrel - -; INST: brpl .-12 -; INST: brpl .+18 -; INST: brpl .+0 - -; BRGE - brge .+50 - brge .+42 - brge car - -; CHECK: brge .Ltmp20+50 ; encoding: [0bAAAAA100,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp20+50, kind: fixup_7_pcrel -; CHECK: brge .Ltmp21+42 ; encoding: [0bAAAAA100,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp21+42, kind: fixup_7_pcrel -; CHECK: brge car ; encoding: [0bAAAAA100,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: car, kind: fixup_7_pcrel - -; INST: brge .+50 -; INST: brge .+42 -; INST: brge .+0 - -car: - ; BRLT - brlt .+16 - brlt .+2 - brlt end - -; CHECK: brlt .Ltmp22+16 ; encoding: [0bAAAAA100,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp22+16, kind: fixup_7_pcrel -; CHECK: brlt .Ltmp23+2 ; encoding: [0bAAAAA100,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp23+2, kind: fixup_7_pcrel -; CHECK: brlt end ; encoding: [0bAAAAA100,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST-LABEL: : -; INST: brlt .+16 -; INST: brlt .+2 -; INST: brlt .+0 - - ; BRHS - brhs .-66 - brhs .+14 - brhs just_another_label - -; CHECK: brhs .Ltmp24-66 ; encoding: [0bAAAAA101,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp24-66, kind: fixup_7_pcrel -; CHECK: brhs .Ltmp25+14 ; encoding: [0bAAAAA101,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp25+14, kind: fixup_7_pcrel -; CHECK: brhs just_another_label ; encoding: [0bAAAAA101,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: just_another_label, kind: fixup_7_pcrel - -; INST: brhs .-66 -; INST: brhs .+14 -; INST: brhs .+0 - - ; BRHC - brhc .+12 - brhc .+14 - brhc just_another_label - -; CHECK: brhc .Ltmp26+12 ; encoding: [0bAAAAA101,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp26+12, kind: fixup_7_pcrel -; CHECK: brhc .Ltmp27+14 ; encoding: [0bAAAAA101,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp27+14, kind: fixup_7_pcrel -; CHECK: brhc just_another_label ; encoding: [0bAAAAA101,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: just_another_label, kind: fixup_7_pcrel - -; INST: brhc .+12 -; INST: brhc .+14 -; INST: brhc .+0 - - ; BRTS - brts .+18 - brts .+22 - brts just_another_label - -; CHECK: brts .Ltmp28+18 ; encoding: [0bAAAAA110,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp28+18, kind: fixup_7_pcrel -; CHECK: brts .Ltmp29+22 ; encoding: [0bAAAAA110,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp29+22, kind: fixup_7_pcrel -; CHECK: brts just_another_label ; encoding: [0bAAAAA110,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: just_another_label, kind: fixup_7_pcrel - -; INST: brts .+18 -; INST: brts .+22 -; INST: brts .+0 - -just_another_label: - ; BRTC - brtc .+52 - brtc .+50 - brtc end - -; CHECK: brtc .Ltmp30+52 ; encoding: [0bAAAAA110,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp30+52, kind: fixup_7_pcrel -; CHECK: brtc .Ltmp31+50 ; encoding: [0bAAAAA110,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp31+50, kind: fixup_7_pcrel -; CHECK: brtc end ; encoding: [0bAAAAA110,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST-LABEL: : -; INST: brtc .+52 -; INST: brtc .+50 -; INST: brtc .+0 - - ; BRVS - brvs .+18 - brvs .+32 - brvs end - -; CHECK: brvs .Ltmp32+18 ; encoding: [0bAAAAA011,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp32+18, kind: fixup_7_pcrel -; CHECK: brvs .Ltmp33+32 ; encoding: [0bAAAAA011,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp33+32, kind: fixup_7_pcrel -; CHECK: brvs end ; encoding: [0bAAAAA011,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST: brvs .+18 -; INST: brvs .+32 -; INST: brvs .+0 - - ; BRVC - brvc .-28 - brvc .-62 - brvc end - -; CHECK: brvc .Ltmp34-28 ; encoding: [0bAAAAA011,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp34-28, kind: fixup_7_pcrel -; CHECK: brvc .Ltmp35-62 ; encoding: [0bAAAAA011,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp35-62, kind: fixup_7_pcrel -; CHECK: brvc end ; encoding: [0bAAAAA011,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST: brvc .-28 -; INST: brvc .-62 -; INST: brvc .+0 - - ; BRIE - brie .+20 - brie .+40 - brie end - -; CHECK: brie .Ltmp36+20 ; encoding: [0bAAAAA111,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp36+20, kind: fixup_7_pcrel -; CHECK: brie .Ltmp37+40 ; encoding: [0bAAAAA111,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp37+40, kind: fixup_7_pcrel -; CHECK: brie end ; encoding: [0bAAAAA111,0b111100AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST: brie .+20 -; INST: brie .+40 -; INST: brie .+0 - - ; BRID - brid .+42 - brid .+62 - brid end - -; CHECK: brid .Ltmp38+42 ; encoding: [0bAAAAA111,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp38+42, kind: fixup_7_pcrel -; CHECK: brid .Ltmp39+62 ; encoding: [0bAAAAA111,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp39+62, kind: fixup_7_pcrel -; CHECK: brid end ; encoding: [0bAAAAA111,0b111101AA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_7_pcrel - -; INST: brid .+42 -; INST: brid .+62 -; INST: brid .+0 - -end: diff --git a/llvm/test/MC/AVR/inst-rcall.s b/llvm/test/MC/AVR/inst-rcall.s index 006013aa6ea94..a4ec32d05b1a4 100644 --- a/llvm/test/MC/AVR/inst-rcall.s +++ b/llvm/test/MC/AVR/inst-rcall.s @@ -1,27 +1,28 @@ ; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; ; RUN: llvm-mc -filetype=obj -triple avr < %s \ -; RUN: | llvm-objdump -d - | FileCheck --check-prefix=INST %s - +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s foo: - rcall .+0 rcall .-8 rcall .+12 rcall .+46 .short 0xdfea -; CHECK: rcall .Ltmp0+0 ; encoding: [A,0b1101AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp0+0, kind: fixup_13_pcrel -; CHECK: rcall .Ltmp1-8 ; encoding: [A,0b1101AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp1-8, kind: fixup_13_pcrel -; CHECK: rcall .Ltmp2+12 ; encoding: [A,0b1101AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp2+12, kind: fixup_13_pcrel -; CHECK: rcall .Ltmp3+46 ; encoding: [A,0b1101AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp3+46, kind: fixup_13_pcrel +; CHECK: rcall (.Ltmp0+0)+2 ; encoding: [A,0b1101AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+0)+2, kind: fixup_13_pcrel +; CHECK: rcall (.Ltmp1-8)+2 ; encoding: [A,0b1101AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-8)+2, kind: fixup_13_pcrel +; CHECK: rcall (.Ltmp2+12)+2 ; encoding: [A,0b1101AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2+12)+2, kind: fixup_13_pcrel +; CHECK: rcall (.Ltmp3+46)+2 ; encoding: [A,0b1101AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp3+46)+2, kind: fixup_13_pcrel -; INST: 00 d0 rcall .+0 -; INST: fc df rcall .-8 -; INST: 06 d0 rcall .+12 -; INST: 17 d0 rcall .+46 -; INST: ea df rcall .-44 +; INST-LABEL: : +; INST-NEXT: 00 d0 rcall .+0 +; INST-NEXT: fc df rcall .-8 +; INST-NEXT: 06 d0 rcall .+12 +; INST-NEXT: 17 d0 rcall .+46 +; INST-NEXT: ea df rcall .-44 diff --git a/llvm/test/MC/AVR/inst-rjmp.s b/llvm/test/MC/AVR/inst-rjmp.s index 3dbac39e055dd..cc843a58b55d2 100644 --- a/llvm/test/MC/AVR/inst-rjmp.s +++ b/llvm/test/MC/AVR/inst-rjmp.s @@ -1,49 +1,56 @@ ; RUN: llvm-mc -triple avr -show-encoding < %s | FileCheck %s +; ; RUN: llvm-mc -filetype=obj -triple avr < %s \ -; RUN: | llvm-objdump -d - | FileCheck --check-prefix=INST %s - +; RUN: | llvm-objdump -d - \ +; RUN: | FileCheck --check-prefix=INST %s foo: - rjmp .+2 rjmp .-2 rjmp foo rjmp .+8 rjmp end rjmp .+0 + end: rjmp .-4 rjmp .-6 + x: rjmp x .short 0xc00f -; CHECK: rjmp .Ltmp0+2 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp0+2, kind: fixup_13_pcrel -; CHECK: rjmp .Ltmp1-2 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp1-2, kind: fixup_13_pcrel -; CHECK: rjmp foo ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: foo, kind: fixup_13_pcrel -; CHECK: rjmp .Ltmp2+8 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp2+8, kind: fixup_13_pcrel -; CHECK: rjmp end ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: end, kind: fixup_13_pcrel -; CHECK: rjmp .Ltmp3+0 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp3+0, kind: fixup_13_pcrel -; CHECK: rjmp .Ltmp4-4 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp4-4, kind: fixup_13_pcrel -; CHECK: rjmp .Ltmp5-6 ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: .Ltmp5-6, kind: fixup_13_pcrel -; CHECK: rjmp x ; encoding: [A,0b1100AAAA] -; CHECK: ; fixup A - offset: 0, value: x, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp0+2)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp0+2)+2, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp1-2)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp1-2)+2, kind: fixup_13_pcrel +; CHECK: rjmp foo ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: foo, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp2+8)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp2+8)+2, kind: fixup_13_pcrel +; CHECK: rjmp end ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: end, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp3+0)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp3+0)+2, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp4-4)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp4-4)+2, kind: fixup_13_pcrel +; CHECK: rjmp (.Ltmp5-6)+2 ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: (.Ltmp5-6)+2, kind: fixup_13_pcrel +; CHECK: rjmp x ; encoding: [A,0b1100AAAA] +; CHECK-NEXT: ; fixup A - offset: 0, value: x, kind: fixup_13_pcrel -; INST: 01 c0 rjmp .+2 -; INST: ff cf rjmp .-2 -; INST: 00 c0 rjmp .+0 -; INST: 04 c0 rjmp .+8 -; INST: 00 c0 rjmp .+0 -; INST: 00 c0 rjmp .+0 -; INST: fe cf rjmp .-4 -; INST: fd cf rjmp .-6 -; INST: 00 c0 rjmp .+0 -; INST: 0f c0 rjmp .+30 +; INST-LABEL: : +; INST-NEXT: 01 c0 rjmp .+2 +; INST-NEXT: ff cf rjmp .-2 +; INST-NEXT: fd cf rjmp .-6 +; INST-NEXT: 04 c0 rjmp .+8 +; INST-NEXT: 01 c0 rjmp .+2 +; INST-NEXT: 00 c0 rjmp .+0 +; INST-EMPTY: +; INST-LABEL: : +; INST-NEXT: fe cf rjmp .-4 +; INST-NEXT: fd cf rjmp .-6 +; INST-EMPTY: +; INST-LABEL: : +; INST-NEXT: ff cf rjmp .-2 +; INST-NEXT: 0f c0 rjmp .+30 diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll index 781ba636c3ab3..2a6780b60211c 100644 --- a/llvm/test/Transforms/Attributor/nofpclass.ll +++ b/llvm/test/Transforms/Attributor/nofpclass.ll @@ -2685,11 +2685,291 @@ define @scalable_splat_zero() { ; See https://github.com/llvm/llvm-project/issues/78507 define double @call_abs(double noundef %__x) { +; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; TUNIT-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) double @call_abs +; TUNIT-SAME: (double noundef [[__X:%.*]]) #[[ATTR3]] { +; TUNIT-NEXT: entry: +; TUNIT-NEXT: [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR22]] +; TUNIT-NEXT: ret double [[ABS]] +; +; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CGSCC-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) double @call_abs +; CGSCC-SAME: (double noundef [[__X:%.*]]) #[[ATTR3]] { +; CGSCC-NEXT: entry: +; CGSCC-NEXT: [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR19]] +; CGSCC-NEXT: ret double [[ABS]] +; entry: %abs = tail call double @llvm.fabs.f64(double %__x) ret double %abs } +define float @bitcast_to_float_sign_0(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ARG]], 1 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[SHR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %shr = lshr i32 %arg, 1 + %cast = bitcast i32 %shr to float + ret float %cast +} + +define float @bitcast_to_float_nnan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) float @bitcast_to_float_nnan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ARG]], 2 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[SHR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %shr = lshr i32 %arg, 2 + %cast = bitcast i32 %shr to float + ret float %cast +} + +define float @bitcast_to_float_sign_1(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1 +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i32 [[ARG]], -2147483648 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %or = or i32 %arg, -2147483648 + %cast = bitcast i32 %or to float + ret float %cast +} + +define float @bitcast_to_float_nan(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i32 [[ARG]], 2139095041 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %or = or i32 %arg, 2139095041 + %cast = bitcast i32 %or to float + ret float %cast +} + +define float @bitcast_to_float_zero(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan inf sub norm) float @bitcast_to_float_zero +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ARG]], 31 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[SHL]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %shl = shl i32 %arg, 31 + %cast = bitcast i32 %shl to float + ret float %cast +} + +define float @bitcast_to_float_nzero(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(zero) float @bitcast_to_float_nzero +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i32 [[ARG]], 134217728 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %or = or i32 %arg, 134217728 + %cast = bitcast i32 %or to float + ret float %cast +} + +define float @bitcast_to_float_inf(i32 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan zero sub norm) float @bitcast_to_float_inf +; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = shl i32 [[ARG]], 31 +; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], 2139095040 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i32 [[OR]] to float +; CHECK-NEXT: ret float [[CAST]] +; + %shr = shl i32 %arg, 31 + %or = or i32 %shr, 2139095040 + %cast = bitcast i32 %or to float + ret float %cast +} + +define double @bitcast_to_double_sign_0(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr i64 [[ARG]], 1 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[SHR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %shr = lshr i64 %arg, 1 + %cast = bitcast i64 %shr to double + ret double %cast +} + +define double @bitcast_to_double_nnan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) double @bitcast_to_double_nnan +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr i64 [[ARG]], 2 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[SHR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %shr = lshr i64 %arg, 2 + %cast = bitcast i64 %shr to double + ret double %cast +} + +define double @bitcast_to_double_sign_1(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double @bitcast_to_double_sign_1 +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i64 [[ARG]], -9223372036854775808 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %or = or i64 %arg, -9223372036854775808 + %cast = bitcast i64 %or to double + ret double %cast +} + +define double @bitcast_to_double_nan(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) double @bitcast_to_double_nan +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i64 [[ARG]], -4503599627370495 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %or = or i64 %arg, -4503599627370495 + %cast = bitcast i64 %or to double + ret double %cast +} + + +define double @bitcast_to_double_zero(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan inf sub norm) double @bitcast_to_double_zero +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ARG]], 63 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[SHL]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %shl = shl i64 %arg, 63 + %cast = bitcast i64 %shl to double + ret double %cast +} + +define double @bitcast_to_double_nzero(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(zero) double @bitcast_to_double_nzero +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or i64 [[ARG]], 1152921504606846976 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %or = or i64 %arg, 1152921504606846976 + %cast = bitcast i64 %or to double + ret double %cast +} + +define double @bitcast_to_double_inf(i64 %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan zero sub norm) double @bitcast_to_double_inf +; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = shl i64 [[ARG]], 63 +; CHECK-NEXT: [[OR:%.*]] = or i64 [[SHR]], 9218868437227405312 +; CHECK-NEXT: [[CAST:%.*]] = bitcast i64 [[OR]] to double +; CHECK-NEXT: ret double [[CAST]] +; + %shr = shl i64 %arg, 63 + %or = or i64 %shr, 9218868437227405312 + %cast = bitcast i64 %or to double + ret double %cast +} + + +define <2 x float> @bitcast_to_float_vect_sign_0(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) <2 x float> @bitcast_to_float_vect_sign_0 +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %shr = lshr <2 x i32> %arg, + %cast = bitcast <2 x i32> %shr to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vect_nnan(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) <2 x float> @bitcast_to_float_vect_nnan +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[SHR:%.*]] = lshr <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %shr = lshr <2 x i32> %arg, + %cast = bitcast <2 x i32> %shr to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vect_sign_1(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) <2 x float> @bitcast_to_float_vect_sign_1 +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %or = or <2 x i32> %arg, + %cast = bitcast <2 x i32> %or to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vect_nan(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define nofpclass(inf zero sub norm) <2 x float> @bitcast_to_float_vect_nan +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %or = or <2 x i32> %arg, + %cast = bitcast <2 x i32> %or to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vect_conservative_1(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define <2 x float> @bitcast_to_float_vect_conservative_1 +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %or = or <2 x i32> %arg, + %cast = bitcast <2 x i32> %or to <2 x float> + ret <2 x float> %cast +} + +define <2 x float> @bitcast_to_float_vect_conservative_2(<2 x i32> %arg) { +; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) +; CHECK-LABEL: define <2 x float> @bitcast_to_float_vect_conservative_2 +; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] { +; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[ARG]], +; CHECK-NEXT: [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float> +; CHECK-NEXT: ret <2 x float> [[CAST]] +; + %or = or <2 x i32> %arg, + %cast = bitcast <2 x i32> %or to <2 x float> + ret <2 x float> %cast +} + declare i64 @_Z13get_global_idj(i32 noundef) attributes #0 = { "denormal-fp-math"="preserve-sign,preserve-sign" } diff --git a/llvm/test/Transforms/Inline/X86/inline-target-cpu-i686.ll b/llvm/test/Transforms/Inline/X86/inline-target-cpu-i686.ll index bd05cffcaa8b7..187278d1c9035 100644 --- a/llvm/test/Transforms/Inline/X86/inline-target-cpu-i686.ll +++ b/llvm/test/Transforms/Inline/X86/inline-target-cpu-i686.ll @@ -1,12 +1,17 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt < %s -mtriple=i686-unknown-unknown -S -passes=inline | FileCheck %s define i32 @func_target_cpu_nocona() #0 { +; CHECK-LABEL: @func_target_cpu_nocona( +; CHECK-NEXT: ret i32 0 +; ret i32 0 } -; CHECK-LABEL: @target_cpu_prescott_call_target_cpu_nocona( -; CHECK-NEXT: ret i32 0 define i32 @target_cpu_prescott_call_target_cpu_nocona() #1 { +; CHECK-LABEL: @target_cpu_prescott_call_target_cpu_nocona( +; CHECK-NEXT: ret i32 0 +; %call = call i32 @func_target_cpu_nocona() ret i32 %call } diff --git a/llvm/test/Transforms/Inline/X86/inline-target-cpu-x86_64.ll b/llvm/test/Transforms/Inline/X86/inline-target-cpu-x86_64.ll index b0a145d54cf59..e6693a637d820 100644 --- a/llvm/test/Transforms/Inline/X86/inline-target-cpu-x86_64.ll +++ b/llvm/test/Transforms/Inline/X86/inline-target-cpu-x86_64.ll @@ -1,37 +1,48 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -S -passes=inline | FileCheck %s define i32 @func_target_cpu_base() #0 { +; CHECK-LABEL: @func_target_cpu_base( +; CHECK-NEXT: ret i32 0 +; ret i32 0 } -; CHECK-LABEL: @target_cpu_k8_call_target_cpu_base( -; CHECK-NEXT: ret i32 0 define i32 @target_cpu_k8_call_target_cpu_base() #1 { +; CHECK-LABEL: @target_cpu_k8_call_target_cpu_base( +; CHECK-NEXT: ret i32 0 +; %call = call i32 @func_target_cpu_base() ret i32 %call } -; CHECK-LABEL: @target_cpu_target_nehalem_call_target_cpu_base( -; CHECK-NEXT: ret i32 0 define i32 @target_cpu_target_nehalem_call_target_cpu_base() #2 { +; CHECK-LABEL: @target_cpu_target_nehalem_call_target_cpu_base( +; CHECK-NEXT: ret i32 0 +; %call = call i32 @func_target_cpu_base() ret i32 %call } -; CHECK-LABEL: @target_cpu_target_goldmont_call_target_cpu_base( -; CHECK-NEXT: ret i32 0 define i32 @target_cpu_target_goldmont_call_target_cpu_base() #3 { +; CHECK-LABEL: @target_cpu_target_goldmont_call_target_cpu_base( +; CHECK-NEXT: ret i32 0 +; %call = call i32 @func_target_cpu_base() ret i32 %call } define i32 @func_target_cpu_nocona() #4 { +; CHECK-LABEL: @func_target_cpu_nocona( +; CHECK-NEXT: ret i32 0 +; ret i32 0 } -; CHECK-LABEL: @target_cpu_target_base_call_target_cpu_nocona( -; CHECK-NEXT: ret i32 0 define i32 @target_cpu_target_base_call_target_cpu_nocona() #0 { +; CHECK-LABEL: @target_cpu_target_base_call_target_cpu_nocona( +; CHECK-NEXT: ret i32 0 +; %call = call i32 @func_target_cpu_nocona() ret i32 %call } diff --git a/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll b/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll index 80d8e1b16ed28..3c44da84813fd 100644 --- a/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll +++ b/llvm/test/Transforms/InstCombine/X86/x86-avx512-inseltpoison.ll @@ -1814,1366 +1814,6 @@ define double @test_mask3_vfnmsub_sd_1_unary_fneg(<2 x double> %a, <2 x double> ret double %13 } -declare <8 x i32> @llvm.x86.avx2.permd(<8 x i32>, <8 x i32>) - -define <8 x i32> @identity_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_si_256( -; CHECK-NEXT: ret <8 x i32> [[A0:%.*]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @identity_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> [[A0:%.*]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP2]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @zero_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) - ret <8 x i32> %1 -} - -define <8 x i32> @zero_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @shuffle_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @shuffle_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @undef_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @undef_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -declare <8 x float> @llvm.x86.avx2.permps(<8 x float>, <8 x i32>) - -define <8 x float> @identity_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_sf_256( -; CHECK-NEXT: ret <8 x float> [[A0:%.*]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @identity_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x float> [[A0:%.*]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP2]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @zero_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) - ret <8 x float> %1 -} - -define <8 x float> @zero_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @shuffle_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @shuffle_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @undef_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @undef_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -declare <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64>, <4 x i64>) - -define <4 x i64> @identity_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_di_256( -; CHECK-NEXT: ret <4 x i64> [[A0:%.*]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @identity_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[A0:%.*]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP2]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @zero_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) - ret <4 x i64> %1 -} - -define <4 x i64> @zero_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @shuffle_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @shuffle_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @undef_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @undef_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -declare <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double>, <4 x i64>) - -define <4 x double> @identity_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_df_256( -; CHECK-NEXT: ret <4 x double> [[A0:%.*]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @identity_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[A0:%.*]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP2]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @zero_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) - ret <4 x double> %1 -} - -define <4 x double> @zero_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @shuffle_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @shuffle_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @undef_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @undef_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -declare <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32>, <16 x i32>) - -define <16 x i32> @identity_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_si_512( -; CHECK-NEXT: ret <16 x i32> [[A0:%.*]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @identity_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i32> [[A0:%.*]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP2]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @zero_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) - ret <16 x i32> %1 -} - -define <16 x i32> @zero_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @shuffle_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @shuffle_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @undef_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @undef_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -declare <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float>, <16 x i32>) - -define <16 x float> @identity_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_sf_512( -; CHECK-NEXT: ret <16 x float> [[A0:%.*]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @identity_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x float> [[A0:%.*]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP2]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @zero_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) - ret <16 x float> %1 -} - -define <16 x float> @zero_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @shuffle_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @shuffle_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @undef_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @undef_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -declare <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64>, <8 x i64>) - -define <8 x i64> @identity_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_di_512( -; CHECK-NEXT: ret <8 x i64> [[A0:%.*]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @identity_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i64> [[A0:%.*]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP2]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @zero_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) - ret <8 x i64> %1 -} - -define <8 x i64> @zero_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @shuffle_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @shuffle_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @undef_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @undef_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -declare <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double>, <8 x i64>) - -define <8 x double> @identity_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_df_512( -; CHECK-NEXT: ret <8 x double> [[A0:%.*]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @identity_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x double> [[A0:%.*]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP2]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @zero_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) - ret <8 x double> %1 -} - -define <8 x double> @zero_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @shuffle_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @shuffle_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @undef_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @undef_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -declare <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16>, <8 x i16>) - -define <8 x i16> @identity_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_128( -; CHECK-NEXT: ret <8 x i16> [[A0:%.*]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @identity_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[A0:%.*]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP2]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @zero_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) - ret <8 x i16> %1 -} - -define <8 x i16> @zero_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @shuffle_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @shuffle_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @undef_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @undef_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -declare <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16>, <16 x i16>) - -define <16 x i16> @identity_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_256( -; CHECK-NEXT: ret <16 x i16> [[A0:%.*]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @identity_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i16> [[A0:%.*]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP2]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @zero_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) - ret <16 x i16> %1 -} - -define <16 x i16> @zero_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @shuffle_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @shuffle_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @undef_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @undef_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -declare <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16>, <32 x i16>) - -define <32 x i16> @identity_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_512( -; CHECK-NEXT: ret <32 x i16> [[A0:%.*]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @identity_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i16> [[A0:%.*]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP2]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @zero_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) - ret <32 x i16> %1 -} - -define <32 x i16> @zero_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @shuffle_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @shuffle_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @undef_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @undef_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -declare <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8>, <16 x i8>) - -define <16 x i8> @identity_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_128( -; CHECK-NEXT: ret <16 x i8> [[A0:%.*]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @identity_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[A0:%.*]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP2]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @zero_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) - ret <16 x i8> %1 -} - -define <16 x i8> @zero_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @shuffle_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @shuffle_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @undef_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @undef_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -declare <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8>, <32 x i8>) - -define <32 x i8> @identity_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_256( -; CHECK-NEXT: ret <32 x i8> [[A0:%.*]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @identity_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i8> [[A0:%.*]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP2]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @zero_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) - ret <32 x i8> %1 -} - -define <32 x i8> @zero_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @shuffle_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @shuffle_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @undef_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @undef_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -declare <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8>, <64 x i8>) - -define <64 x i8> @identity_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_512( -; CHECK-NEXT: ret <64 x i8> [[A0:%.*]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @identity_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <64 x i1> [[TMP1]], <64 x i8> [[A0:%.*]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP2]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @zero_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) - ret <64 x i8> %1 -} - -define <64 x i8> @zero_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @shuffle_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @shuffle_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @undef_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @undef_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - declare <16 x float> @llvm.x86.avx512.add.ps.512(<16 x float>, <16 x float>, i32) define <16 x float> @test_add_ps(<16 x float> %a, <16 x float> %b) { diff --git a/llvm/test/Transforms/InstCombine/X86/x86-avx512.ll b/llvm/test/Transforms/InstCombine/X86/x86-avx512.ll index 906e84b607481..d89cf6b0bb986 100644 --- a/llvm/test/Transforms/InstCombine/X86/x86-avx512.ll +++ b/llvm/test/Transforms/InstCombine/X86/x86-avx512.ll @@ -1814,1366 +1814,6 @@ define double @test_mask3_vfnmsub_sd_1_unary_fneg(<2 x double> %a, <2 x double> ret double %13 } -declare <8 x i32> @llvm.x86.avx2.permd(<8 x i32>, <8 x i32>) - -define <8 x i32> @identity_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_si_256( -; CHECK-NEXT: ret <8 x i32> [[A0:%.*]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @identity_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> [[A0:%.*]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP2]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @zero_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) - ret <8 x i32> %1 -} - -define <8 x i32> @zero_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @shuffle_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @shuffle_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -define <8 x i32> @undef_test_permvar_si_256(<8 x i32> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_si_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i32> [[TMP1]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - ret <8 x i32> %1 -} - -define <8 x i32> @undef_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_si_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i32> [[TMP3]] -; - %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru - ret <8 x i32> %3 -} - -declare <8 x float> @llvm.x86.avx2.permps(<8 x float>, <8 x i32>) - -define <8 x float> @identity_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_sf_256( -; CHECK-NEXT: ret <8 x float> [[A0:%.*]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @identity_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x float> [[A0:%.*]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP2]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @zero_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) - ret <8 x float> %1 -} - -define <8 x float> @zero_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @shuffle_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @shuffle_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -define <8 x float> @undef_test_permvar_sf_256(<8 x float> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_sf_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: ret <8 x float> [[TMP1]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - ret <8 x float> %1 -} - -define <8 x float> @undef_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_sf_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x float> [[TMP3]] -; - %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru - ret <8 x float> %3 -} - -declare <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64>, <4 x i64>) - -define <4 x i64> @identity_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_di_256( -; CHECK-NEXT: ret <4 x i64> [[A0:%.*]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @identity_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[A0:%.*]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP2]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @zero_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) - ret <4 x i64> %1 -} - -define <4 x i64> @zero_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @shuffle_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @shuffle_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -define <4 x i64> @undef_test_permvar_di_256(<4 x i64> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_di_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: ret <4 x i64> [[TMP1]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - ret <4 x i64> %1 -} - -define <4 x i64> @undef_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_di_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x i64> [[TMP3]] -; - %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru - ret <4 x i64> %3 -} - -declare <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double>, <4 x i64>) - -define <4 x double> @identity_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_df_256( -; CHECK-NEXT: ret <4 x double> [[A0:%.*]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @identity_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[A0:%.*]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP2]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @zero_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) - ret <4 x double> %1 -} - -define <4 x double> @zero_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @shuffle_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @shuffle_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -define <4 x double> @undef_test_permvar_df_256(<4 x double> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_df_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: ret <4 x double> [[TMP1]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - ret <4 x double> %1 -} - -define <4 x double> @undef_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_df_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> -; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <4 x double> [[TMP3]] -; - %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> - %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru - ret <4 x double> %3 -} - -declare <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32>, <16 x i32>) - -define <16 x i32> @identity_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_si_512( -; CHECK-NEXT: ret <16 x i32> [[A0:%.*]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @identity_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i32> [[A0:%.*]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP2]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @zero_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) - ret <16 x i32> %1 -} - -define <16 x i32> @zero_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @shuffle_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @shuffle_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -define <16 x i32> @undef_test_permvar_si_512(<16 x i32> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_si_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i32> [[TMP1]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - ret <16 x i32> %1 -} - -define <16 x i32> @undef_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_si_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i32> [[TMP3]] -; - %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru - ret <16 x i32> %3 -} - -declare <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float>, <16 x i32>) - -define <16 x float> @identity_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_sf_512( -; CHECK-NEXT: ret <16 x float> [[A0:%.*]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @identity_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x float> [[A0:%.*]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP2]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @zero_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) - ret <16 x float> %1 -} - -define <16 x float> @zero_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @shuffle_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @shuffle_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -define <16 x float> @undef_test_permvar_sf_512(<16 x float> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_sf_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: ret <16 x float> [[TMP1]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - ret <16 x float> %1 -} - -define <16 x float> @undef_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_sf_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x float> [[TMP3]] -; - %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru - ret <16 x float> %3 -} - -declare <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64>, <8 x i64>) - -define <8 x i64> @identity_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_di_512( -; CHECK-NEXT: ret <8 x i64> [[A0:%.*]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @identity_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i64> [[A0:%.*]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP2]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @zero_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) - ret <8 x i64> %1 -} - -define <8 x i64> @zero_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @shuffle_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @shuffle_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -define <8 x i64> @undef_test_permvar_di_512(<8 x i64> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_di_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i64> [[TMP1]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - ret <8 x i64> %1 -} - -define <8 x i64> @undef_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_di_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i64> [[TMP3]] -; - %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru - ret <8 x i64> %3 -} - -declare <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double>, <8 x i64>) - -define <8 x double> @identity_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_df_512( -; CHECK-NEXT: ret <8 x double> [[A0:%.*]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @identity_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x double> [[A0:%.*]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP2]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @zero_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) - ret <8 x double> %1 -} - -define <8 x double> @zero_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @shuffle_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @shuffle_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -define <8 x double> @undef_test_permvar_df_512(<8 x double> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_df_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: ret <8 x double> [[TMP1]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - ret <8 x double> %1 -} - -define <8 x double> @undef_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_df_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x double> [[TMP3]] -; - %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru - ret <8 x double> %3 -} - -declare <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16>, <8 x i16>) - -define <8 x i16> @identity_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_128( -; CHECK-NEXT: ret <8 x i16> [[A0:%.*]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @identity_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[A0:%.*]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP2]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @zero_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) - ret <8 x i16> %1 -} - -define <8 x i16> @zero_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @shuffle_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @shuffle_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -define <8 x i16> @undef_test_permvar_hi_128(<8 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: ret <8 x i16> [[TMP1]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - ret <8 x i16> %1 -} - -define <8 x i16> @undef_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <8 x i16> [[TMP3]] -; - %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) - %2 = bitcast i8 %mask to <8 x i1> - %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru - ret <8 x i16> %3 -} - -declare <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16>, <16 x i16>) - -define <16 x i16> @identity_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_256( -; CHECK-NEXT: ret <16 x i16> [[A0:%.*]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @identity_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i16> [[A0:%.*]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP2]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @zero_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) - ret <16 x i16> %1 -} - -define <16 x i16> @zero_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @shuffle_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @shuffle_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -define <16 x i16> @undef_test_permvar_hi_256(<16 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i16> [[TMP1]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - ret <16 x i16> %1 -} - -define <16 x i16> @undef_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i16> [[TMP3]] -; - %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru - ret <16 x i16> %3 -} - -declare <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16>, <32 x i16>) - -define <32 x i16> @identity_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_hi_512( -; CHECK-NEXT: ret <32 x i16> [[A0:%.*]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @identity_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i16> [[A0:%.*]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP2]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @zero_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) - ret <32 x i16> %1 -} - -define <32 x i16> @zero_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @shuffle_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @shuffle_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -define <32 x i16> @undef_test_permvar_hi_512(<32 x i16> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_hi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i16> [[TMP1]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - ret <32 x i16> %1 -} - -define <32 x i16> @undef_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_hi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i16> [[TMP3]] -; - %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru - ret <32 x i16> %3 -} - -declare <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8>, <16 x i8>) - -define <16 x i8> @identity_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_128( -; CHECK-NEXT: ret <16 x i8> [[A0:%.*]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @identity_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[A0:%.*]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP2]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @zero_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) - ret <16 x i8> %1 -} - -define <16 x i8> @zero_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @shuffle_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @shuffle_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -define <16 x i8> @undef_test_permvar_qi_128(<16 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_128( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: ret <16 x i8> [[TMP1]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - ret <16 x i8> %1 -} - -define <16 x i8> @undef_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_128_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <16 x i8> [[TMP3]] -; - %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) - %2 = bitcast i16 %mask to <16 x i1> - %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru - ret <16 x i8> %3 -} - -declare <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8>, <32 x i8>) - -define <32 x i8> @identity_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_256( -; CHECK-NEXT: ret <32 x i8> [[A0:%.*]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @identity_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i8> [[A0:%.*]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP2]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @zero_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) - ret <32 x i8> %1 -} - -define <32 x i8> @zero_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @shuffle_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @shuffle_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -define <32 x i8> @undef_test_permvar_qi_256(<32 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_256( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: ret <32 x i8> [[TMP1]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - ret <32 x i8> %1 -} - -define <32 x i8> @undef_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_256_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <32 x i8> [[TMP3]] -; - %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) - %2 = bitcast i32 %mask to <32 x i1> - %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru - ret <32 x i8> %3 -} - -declare <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8>, <64 x i8>) - -define <64 x i8> @identity_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @identity_test_permvar_qi_512( -; CHECK-NEXT: ret <64 x i8> [[A0:%.*]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @identity_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @identity_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP2:%.*]] = select <64 x i1> [[TMP1]], <64 x i8> [[A0:%.*]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP2]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @zero_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @zero_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) - ret <64 x i8> %1 -} - -define <64 x i8> @zero_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @zero_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @shuffle_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @shuffle_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @shuffle_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - -define <64 x i8> @undef_test_permvar_qi_512(<64 x i8> %a0) { -; -; CHECK-LABEL: @undef_test_permvar_qi_512( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: ret <64 x i8> [[TMP1]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - ret <64 x i8> %1 -} - -define <64 x i8> @undef_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { -; -; CHECK-LABEL: @undef_test_permvar_qi_512_mask( -; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> -; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> -; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] -; CHECK-NEXT: ret <64 x i8> [[TMP3]] -; - %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) - %2 = bitcast i64 %mask to <64 x i1> - %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru - ret <64 x i8> %3 -} - declare <16 x float> @llvm.x86.avx512.add.ps.512(<16 x float>, <16 x float>, i32) define <16 x float> @test_add_ps(<16 x float> %a, <16 x float> %b) { diff --git a/llvm/test/Transforms/InstCombine/X86/x86-vperm.ll b/llvm/test/Transforms/InstCombine/X86/x86-vperm.ll new file mode 100644 index 0000000000000..6519e4f534848 --- /dev/null +++ b/llvm/test/Transforms/InstCombine/X86/x86-vperm.ll @@ -0,0 +1,1404 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt < %s -passes=instcombine -mtriple=x86_64-unknown-unknown -S | FileCheck %s + +declare <8 x i32> @llvm.x86.avx2.permd(<8 x i32>, <8 x i32>) + +define <8 x i32> @identity_test_permvar_si_256(<8 x i32> %a0) { +; CHECK-LABEL: @identity_test_permvar_si_256( +; CHECK-NEXT: ret <8 x i32> [[A0:%.*]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + ret <8 x i32> %1 +} + +define <8 x i32> @identity_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_si_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> [[A0:%.*]], <8 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i32> [[TMP2]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru + ret <8 x i32> %3 +} + +define <8 x i32> @zero_test_permvar_si_256(<8 x i32> %a0) { +; CHECK-LABEL: @zero_test_permvar_si_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: ret <8 x i32> [[TMP1]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) + ret <8 x i32> %1 +} + +define <8 x i32> @zero_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_si_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i32> [[TMP3]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru + ret <8 x i32> %3 +} + +define <8 x i32> @shuffle_test_permvar_si_256(<8 x i32> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_si_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i32> [[TMP1]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + ret <8 x i32> %1 +} + +define <8 x i32> @shuffle_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_si_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i32> [[TMP3]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru + ret <8 x i32> %3 +} + +define <8 x i32> @undef_test_permvar_si_256(<8 x i32> %a0) { +; CHECK-LABEL: @undef_test_permvar_si_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i32> [[TMP1]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + ret <8 x i32> %1 +} + +define <8 x i32> @undef_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_si_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A0:%.*]], <8 x i32> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i32> [[TMP1]], <8 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i32> [[TMP3]] +; + %1 = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i32> %1, <8 x i32> %passthru + ret <8 x i32> %3 +} + +define <8 x i32> @demandedbit_test_permvar_si_256_mask(<8 x i32> %a0, <8 x i32> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_si_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <8 x i32> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> [[A0:%.*]], <8 x i32> [[M]]) +; CHECK-NEXT: ret <8 x i32> [[S]] +; + %m = or <8 x i32> %a1, + %s = call <8 x i32> @llvm.x86.avx2.permd(<8 x i32> %a0, <8 x i32> %m) + ret <8 x i32> %s +} + +declare <8 x float> @llvm.x86.avx2.permps(<8 x float>, <8 x i32>) + +define <8 x float> @identity_test_permvar_sf_256(<8 x float> %a0) { +; CHECK-LABEL: @identity_test_permvar_sf_256( +; CHECK-NEXT: ret <8 x float> [[A0:%.*]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + ret <8 x float> %1 +} + +define <8 x float> @identity_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_sf_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x float> [[A0:%.*]], <8 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x float> [[TMP2]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru + ret <8 x float> %3 +} + +define <8 x float> @zero_test_permvar_sf_256(<8 x float> %a0) { +; CHECK-LABEL: @zero_test_permvar_sf_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: ret <8 x float> [[TMP1]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) + ret <8 x float> %1 +} + +define <8 x float> @zero_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_sf_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x float> [[TMP3]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru + ret <8 x float> %3 +} + +define <8 x float> @shuffle_test_permvar_sf_256(<8 x float> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_sf_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> +; CHECK-NEXT: ret <8 x float> [[TMP1]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + ret <8 x float> %1 +} + +define <8 x float> @shuffle_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_sf_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x float> [[TMP3]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru + ret <8 x float> %3 +} + +define <8 x float> @undef_test_permvar_sf_256(<8 x float> %a0) { +; CHECK-LABEL: @undef_test_permvar_sf_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> +; CHECK-NEXT: ret <8 x float> [[TMP1]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + ret <8 x float> %1 +} + +define <8 x float> @undef_test_permvar_sf_256_mask(<8 x float> %a0, <8 x float> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_sf_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A0:%.*]], <8 x float> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x float> [[TMP1]], <8 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x float> [[TMP3]] +; + %1 = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x float> %1, <8 x float> %passthru + ret <8 x float> %3 +} + +define <8 x float> @demandedbit_test_permvar_sf_256_mask(<8 x float> %a0, <8 x i32> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_sf_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <8 x i32> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <8 x float> @llvm.x86.avx2.permps(<8 x float> [[A0:%.*]], <8 x i32> [[M]]) +; CHECK-NEXT: ret <8 x float> [[S]] +; + %m = or <8 x i32> %a1, + %s = call <8 x float> @llvm.x86.avx2.permps(<8 x float> %a0, <8 x i32> %m) + ret <8 x float> %s +} + +declare <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64>, <4 x i64>) + +define <4 x i64> @identity_test_permvar_di_256(<4 x i64> %a0) { +; CHECK-LABEL: @identity_test_permvar_di_256( +; CHECK-NEXT: ret <4 x i64> [[A0:%.*]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + ret <4 x i64> %1 +} + +define <4 x i64> @identity_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_di_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[A0:%.*]], <4 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x i64> [[TMP2]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru + ret <4 x i64> %3 +} + +define <4 x i64> @zero_test_permvar_di_256(<4 x i64> %a0) { +; CHECK-LABEL: @zero_test_permvar_di_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer +; CHECK-NEXT: ret <4 x i64> [[TMP1]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) + ret <4 x i64> %1 +} + +define <4 x i64> @zero_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_di_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x i64> [[TMP3]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru + ret <4 x i64> %3 +} + +define <4 x i64> @shuffle_test_permvar_di_256(<4 x i64> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_di_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> +; CHECK-NEXT: ret <4 x i64> [[TMP1]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + ret <4 x i64> %1 +} + +define <4 x i64> @shuffle_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_di_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x i64> [[TMP3]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru + ret <4 x i64> %3 +} + +define <4 x i64> @undef_test_permvar_di_256(<4 x i64> %a0) { +; CHECK-LABEL: @undef_test_permvar_di_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> +; CHECK-NEXT: ret <4 x i64> [[TMP1]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + ret <4 x i64> %1 +} + +define <4 x i64> @undef_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_di_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i64> [[A0:%.*]], <4 x i64> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x i64> [[TMP1]], <4 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x i64> [[TMP3]] +; + %1 = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x i64> %1, <4 x i64> %passthru + ret <4 x i64> %3 +} + +define <4 x i64> @demandedbits_test_permvar_di_256_mask(<4 x i64> %a0, <4 x i64> %a1) { +; CHECK-LABEL: @demandedbits_test_permvar_di_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <4 x i64> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> [[A0:%.*]], <4 x i64> [[M]]) +; CHECK-NEXT: ret <4 x i64> [[S]] +; + %m = or <4 x i64> %a1, + %s = call <4 x i64> @llvm.x86.avx512.permvar.di.256(<4 x i64> %a0, <4 x i64> %m) + ret <4 x i64> %s +} + +declare <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double>, <4 x i64>) + +define <4 x double> @identity_test_permvar_df_256(<4 x double> %a0) { +; CHECK-LABEL: @identity_test_permvar_df_256( +; CHECK-NEXT: ret <4 x double> [[A0:%.*]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + ret <4 x double> %1 +} + +define <4 x double> @identity_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_df_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP1]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[A0:%.*]], <4 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x double> [[TMP2]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru + ret <4 x double> %3 +} + +define <4 x double> @zero_test_permvar_df_256(<4 x double> %a0) { +; CHECK-LABEL: @zero_test_permvar_df_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer +; CHECK-NEXT: ret <4 x double> [[TMP1]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) + ret <4 x double> %1 +} + +define <4 x double> @zero_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_df_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x double> [[TMP3]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru + ret <4 x double> %3 +} + +define <4 x double> @shuffle_test_permvar_df_256(<4 x double> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_df_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> +; CHECK-NEXT: ret <4 x double> [[TMP1]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + ret <4 x double> %1 +} + +define <4 x double> @shuffle_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_df_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x double> [[TMP3]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru + ret <4 x double> %3 +} + +define <4 x double> @undef_test_permvar_df_256(<4 x double> %a0) { +; CHECK-LABEL: @undef_test_permvar_df_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> +; CHECK-NEXT: ret <4 x double> [[TMP1]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + ret <4 x double> %1 +} + +define <4 x double> @undef_test_permvar_df_256_mask(<4 x double> %a0, <4 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_df_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x double> [[A0:%.*]], <4 x double> poison, <4 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[EXTRACT:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> poison, <4 x i32> +; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[EXTRACT]], <4 x double> [[TMP1]], <4 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <4 x double> [[TMP3]] +; + %1 = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %extract = shufflevector <8 x i1> %2, <8 x i1> %2, <4 x i32> + %3 = select <4 x i1> %extract, <4 x double> %1, <4 x double> %passthru + ret <4 x double> %3 +} + +define <4 x double> @demandedbits_test_permvar_df_256_mask(<4 x double> %a0, <4 x i64> %a1) { +; CHECK-LABEL: @demandedbits_test_permvar_df_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <4 x i64> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> [[A0:%.*]], <4 x i64> [[M]]) +; CHECK-NEXT: ret <4 x double> [[S]] +; + %m = or <4 x i64> %a1, + %s = call <4 x double> @llvm.x86.avx512.permvar.df.256(<4 x double> %a0, <4 x i64> %m) + ret <4 x double> %s +} + +declare <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32>, <16 x i32>) + +define <16 x i32> @identity_test_permvar_si_512(<16 x i32> %a0) { +; CHECK-LABEL: @identity_test_permvar_si_512( +; CHECK-NEXT: ret <16 x i32> [[A0:%.*]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + ret <16 x i32> %1 +} + +define <16 x i32> @identity_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { +; CHECK-LABEL: @identity_test_permvar_si_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i32> [[A0:%.*]], <16 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i32> [[TMP2]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru + ret <16 x i32> %3 +} + +define <16 x i32> @zero_test_permvar_si_512(<16 x i32> %a0) { +; CHECK-LABEL: @zero_test_permvar_si_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: ret <16 x i32> [[TMP1]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) + ret <16 x i32> %1 +} + +define <16 x i32> @zero_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { +; CHECK-LABEL: @zero_test_permvar_si_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i32> [[TMP3]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> zeroinitializer) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru + ret <16 x i32> %3 +} + +define <16 x i32> @shuffle_test_permvar_si_512(<16 x i32> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_si_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i32> [[TMP1]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + ret <16 x i32> %1 +} + +define <16 x i32> @shuffle_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_si_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i32> [[TMP3]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru + ret <16 x i32> %3 +} + +define <16 x i32> @undef_test_permvar_si_512(<16 x i32> %a0) { +; CHECK-LABEL: @undef_test_permvar_si_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i32> [[TMP1]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + ret <16 x i32> %1 +} + +define <16 x i32> @undef_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) { +; CHECK-LABEL: @undef_test_permvar_si_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i32> [[A0:%.*]], <16 x i32> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i32> [[TMP1]], <16 x i32> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i32> [[TMP3]] +; + %1 = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %passthru + ret <16 x i32> %3 +} + +define <16 x i32> @demandedbit_test_permvar_si_512_mask(<16 x i32> %a0, <16 x i32> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_si_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <16 x i32> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> [[A0:%.*]], <16 x i32> [[M]]) +; CHECK-NEXT: ret <16 x i32> [[S]] +; + %m = or <16 x i32> %a1, + %s = call <16 x i32> @llvm.x86.avx512.permvar.si.512(<16 x i32> %a0, <16 x i32> %m) + ret <16 x i32> %s +} + +declare <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float>, <16 x i32>) + +define <16 x float> @identity_test_permvar_sf_512(<16 x float> %a0) { +; CHECK-LABEL: @identity_test_permvar_sf_512( +; CHECK-NEXT: ret <16 x float> [[A0:%.*]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + ret <16 x float> %1 +} + +define <16 x float> @identity_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { +; CHECK-LABEL: @identity_test_permvar_sf_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x float> [[A0:%.*]], <16 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x float> [[TMP2]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru + ret <16 x float> %3 +} + +define <16 x float> @zero_test_permvar_sf_512(<16 x float> %a0) { +; CHECK-LABEL: @zero_test_permvar_sf_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: ret <16 x float> [[TMP1]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) + ret <16 x float> %1 +} + +define <16 x float> @zero_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { +; CHECK-LABEL: @zero_test_permvar_sf_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x float> [[TMP3]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> zeroinitializer) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru + ret <16 x float> %3 +} + +define <16 x float> @shuffle_test_permvar_sf_512(<16 x float> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_sf_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> +; CHECK-NEXT: ret <16 x float> [[TMP1]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + ret <16 x float> %1 +} + +define <16 x float> @shuffle_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_sf_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x float> [[TMP3]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru + ret <16 x float> %3 +} + +define <16 x float> @undef_test_permvar_sf_512(<16 x float> %a0) { +; CHECK-LABEL: @undef_test_permvar_sf_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> +; CHECK-NEXT: ret <16 x float> [[TMP1]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + ret <16 x float> %1 +} + +define <16 x float> @undef_test_permvar_sf_512_mask(<16 x float> %a0, <16 x float> %passthru, i16 %mask) { +; CHECK-LABEL: @undef_test_permvar_sf_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x float> [[A0:%.*]], <16 x float> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x float> [[TMP1]], <16 x float> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x float> [[TMP3]] +; + %1 = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x float> %1, <16 x float> %passthru + ret <16 x float> %3 +} + +define <16 x float> @demandedbit_test_permvar_sf_512_mask(<16 x float> %a0, <16 x i32> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_sf_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <16 x i32> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> [[A0:%.*]], <16 x i32> [[M]]) +; CHECK-NEXT: ret <16 x float> [[S]] +; + %m = or <16 x i32> %a1, + %s = call <16 x float> @llvm.x86.avx512.permvar.sf.512(<16 x float> %a0, <16 x i32> %m) + ret <16 x float> %s +} + +declare <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64>, <8 x i64>) + +define <8 x i64> @identity_test_permvar_di_512(<8 x i64> %a0) { +; CHECK-LABEL: @identity_test_permvar_di_512( +; CHECK-NEXT: ret <8 x i64> [[A0:%.*]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + ret <8 x i64> %1 +} + +define <8 x i64> @identity_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_di_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i64> [[A0:%.*]], <8 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i64> [[TMP2]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru + ret <8 x i64> %3 +} + +define <8 x i64> @zero_test_permvar_di_512(<8 x i64> %a0) { +; CHECK-LABEL: @zero_test_permvar_di_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: ret <8 x i64> [[TMP1]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) + ret <8 x i64> %1 +} + +define <8 x i64> @zero_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_di_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i64> [[TMP3]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru + ret <8 x i64> %3 +} + +define <8 x i64> @shuffle_test_permvar_di_512(<8 x i64> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_di_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i64> [[TMP1]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + ret <8 x i64> %1 +} + +define <8 x i64> @shuffle_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_di_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i64> [[TMP3]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru + ret <8 x i64> %3 +} + +define <8 x i64> @undef_test_permvar_di_512(<8 x i64> %a0) { +; CHECK-LABEL: @undef_test_permvar_di_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i64> [[TMP1]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + ret <8 x i64> %1 +} + +define <8 x i64> @undef_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_di_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> [[A0:%.*]], <8 x i64> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> [[TMP1]], <8 x i64> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i64> [[TMP3]] +; + %1 = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %passthru + ret <8 x i64> %3 +} + +define <8 x i64> @demandedbit_test_permvar_di_512_mask(<8 x i64> %a0, <8 x i64> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_di_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <8 x i64> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> [[A0:%.*]], <8 x i64> [[M]]) +; CHECK-NEXT: ret <8 x i64> [[S]] +; + %m = or <8 x i64> %a1, + %s = call <8 x i64> @llvm.x86.avx512.permvar.di.512(<8 x i64> %a0, <8 x i64> %m) + ret <8 x i64> %s +} + +declare <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double>, <8 x i64>) + +define <8 x double> @identity_test_permvar_df_512(<8 x double> %a0) { +; CHECK-LABEL: @identity_test_permvar_df_512( +; CHECK-NEXT: ret <8 x double> [[A0:%.*]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + ret <8 x double> %1 +} + +define <8 x double> @identity_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_df_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x double> [[A0:%.*]], <8 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x double> [[TMP2]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru + ret <8 x double> %3 +} + +define <8 x double> @zero_test_permvar_df_512(<8 x double> %a0) { +; CHECK-LABEL: @zero_test_permvar_df_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: ret <8 x double> [[TMP1]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) + ret <8 x double> %1 +} + +define <8 x double> @zero_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_df_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x double> [[TMP3]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru + ret <8 x double> %3 +} + +define <8 x double> @shuffle_test_permvar_df_512(<8 x double> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_df_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> +; CHECK-NEXT: ret <8 x double> [[TMP1]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + ret <8 x double> %1 +} + +define <8 x double> @shuffle_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_df_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x double> [[TMP3]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru + ret <8 x double> %3 +} + +define <8 x double> @undef_test_permvar_df_512(<8 x double> %a0) { +; CHECK-LABEL: @undef_test_permvar_df_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> +; CHECK-NEXT: ret <8 x double> [[TMP1]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + ret <8 x double> %1 +} + +define <8 x double> @undef_test_permvar_df_512_mask(<8 x double> %a0, <8 x double> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_df_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x double> [[A0:%.*]], <8 x double> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x double> [[TMP1]], <8 x double> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x double> [[TMP3]] +; + %1 = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x double> %1, <8 x double> %passthru + ret <8 x double> %3 +} + +define <8 x double> @demandedbit_test_permvar_df_512_mask(<8 x double> %a0, <8 x i64> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_df_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <8 x i64> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> [[A0:%.*]], <8 x i64> [[M]]) +; CHECK-NEXT: ret <8 x double> [[S]] +; + %m = or <8 x i64> %a1, + %s = call <8 x double> @llvm.x86.avx512.permvar.df.512(<8 x double> %a0, <8 x i64> %m) + ret <8 x double> %s +} + +declare <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16>, <8 x i16>) + +define <8 x i16> @identity_test_permvar_hi_128(<8 x i16> %a0) { +; CHECK-LABEL: @identity_test_permvar_hi_128( +; CHECK-NEXT: ret <8 x i16> [[A0:%.*]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + ret <8 x i16> %1 +} + +define <8 x i16> @identity_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { +; CHECK-LABEL: @identity_test_permvar_hi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i16> [[A0:%.*]], <8 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i16> [[TMP2]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru + ret <8 x i16> %3 +} + +define <8 x i16> @zero_test_permvar_hi_128(<8 x i16> %a0) { +; CHECK-LABEL: @zero_test_permvar_hi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: ret <8 x i16> [[TMP1]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) + ret <8 x i16> %1 +} + +define <8 x i16> @zero_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { +; CHECK-LABEL: @zero_test_permvar_hi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i16> [[TMP3]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> zeroinitializer) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru + ret <8 x i16> %3 +} + +define <8 x i16> @shuffle_test_permvar_hi_128(<8 x i16> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_hi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i16> [[TMP1]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + ret <8 x i16> %1 +} + +define <8 x i16> @shuffle_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_hi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i16> [[TMP3]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru + ret <8 x i16> %3 +} + +define <8 x i16> @undef_test_permvar_hi_128(<8 x i16> %a0) { +; CHECK-LABEL: @undef_test_permvar_hi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> +; CHECK-NEXT: ret <8 x i16> [[TMP1]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + ret <8 x i16> %1 +} + +define <8 x i16> @undef_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %passthru, i8 %mask) { +; CHECK-LABEL: @undef_test_permvar_hi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i16> [[A0:%.*]], <8 x i16> poison, <8 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i8 [[MASK:%.*]] to <8 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i16> [[TMP1]], <8 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <8 x i16> [[TMP3]] +; + %1 = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> ) + %2 = bitcast i8 %mask to <8 x i1> + %3 = select <8 x i1> %2, <8 x i16> %1, <8 x i16> %passthru + ret <8 x i16> %3 +} + +define <8 x i16> @demandedbit_test_permvar_hi_128_mask(<8 x i16> %a0, <8 x i16> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_hi_128_mask( +; CHECK-NEXT: [[M:%.*]] = or <8 x i16> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> [[A0:%.*]], <8 x i16> [[M]]) +; CHECK-NEXT: ret <8 x i16> [[S]] +; + %m = or <8 x i16> %a1, + %s = call <8 x i16> @llvm.x86.avx512.permvar.hi.128(<8 x i16> %a0, <8 x i16> %m) + ret <8 x i16> %s +} + +declare <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16>, <16 x i16>) + +define <16 x i16> @identity_test_permvar_hi_256(<16 x i16> %a0) { +; CHECK-LABEL: @identity_test_permvar_hi_256( +; CHECK-NEXT: ret <16 x i16> [[A0:%.*]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + ret <16 x i16> %1 +} + +define <16 x i16> @identity_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { +; CHECK-LABEL: @identity_test_permvar_hi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i16> [[A0:%.*]], <16 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i16> [[TMP2]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru + ret <16 x i16> %3 +} + +define <16 x i16> @zero_test_permvar_hi_256(<16 x i16> %a0) { +; CHECK-LABEL: @zero_test_permvar_hi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: ret <16 x i16> [[TMP1]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) + ret <16 x i16> %1 +} + +define <16 x i16> @zero_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { +; CHECK-LABEL: @zero_test_permvar_hi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i16> [[TMP3]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> zeroinitializer) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru + ret <16 x i16> %3 +} + +define <16 x i16> @shuffle_test_permvar_hi_256(<16 x i16> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_hi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i16> [[TMP1]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + ret <16 x i16> %1 +} + +define <16 x i16> @shuffle_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_hi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i16> [[TMP3]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru + ret <16 x i16> %3 +} + +define <16 x i16> @undef_test_permvar_hi_256(<16 x i16> %a0) { +; CHECK-LABEL: @undef_test_permvar_hi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i16> [[TMP1]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + ret <16 x i16> %1 +} + +define <16 x i16> @undef_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %passthru, i16 %mask) { +; CHECK-LABEL: @undef_test_permvar_hi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i16> [[A0:%.*]], <16 x i16> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i16> [[TMP1]], <16 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i16> [[TMP3]] +; + %1 = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %passthru + ret <16 x i16> %3 +} + +define <16 x i16> @demandedbit_test_permvar_hi_256_mask(<16 x i16> %a0, <16 x i16> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_hi_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <16 x i16> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> [[A0:%.*]], <16 x i16> [[M]]) +; CHECK-NEXT: ret <16 x i16> [[S]] +; + %m = or <16 x i16> %a1, + %s = call <16 x i16> @llvm.x86.avx512.permvar.hi.256(<16 x i16> %a0, <16 x i16> %m) + ret <16 x i16> %s +} + +declare <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16>, <32 x i16>) + +define <32 x i16> @identity_test_permvar_hi_512(<32 x i16> %a0) { +; CHECK-LABEL: @identity_test_permvar_hi_512( +; CHECK-NEXT: ret <32 x i16> [[A0:%.*]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + ret <32 x i16> %1 +} + +define <32 x i16> @identity_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { +; CHECK-LABEL: @identity_test_permvar_hi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i16> [[A0:%.*]], <32 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i16> [[TMP2]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru + ret <32 x i16> %3 +} + +define <32 x i16> @zero_test_permvar_hi_512(<32 x i16> %a0) { +; CHECK-LABEL: @zero_test_permvar_hi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer +; CHECK-NEXT: ret <32 x i16> [[TMP1]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) + ret <32 x i16> %1 +} + +define <32 x i16> @zero_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { +; CHECK-LABEL: @zero_test_permvar_hi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i16> [[TMP3]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> zeroinitializer) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru + ret <32 x i16> %3 +} + +define <32 x i16> @shuffle_test_permvar_hi_512(<32 x i16> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_hi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> +; CHECK-NEXT: ret <32 x i16> [[TMP1]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + ret <32 x i16> %1 +} + +define <32 x i16> @shuffle_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_hi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i16> [[TMP3]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru + ret <32 x i16> %3 +} + +define <32 x i16> @undef_test_permvar_hi_512(<32 x i16> %a0) { +; CHECK-LABEL: @undef_test_permvar_hi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> +; CHECK-NEXT: ret <32 x i16> [[TMP1]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + ret <32 x i16> %1 +} + +define <32 x i16> @undef_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) { +; CHECK-LABEL: @undef_test_permvar_hi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i16> [[A0:%.*]], <32 x i16> poison, <32 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i16> [[TMP1]], <32 x i16> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i16> [[TMP3]] +; + %1 = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i16> %1, <32 x i16> %passthru + ret <32 x i16> %3 +} + +define <32 x i16> @demandedbit_test_permvar_hi_512_mask(<32 x i16> %a0, <32 x i16> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_hi_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <32 x i16> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> [[A0:%.*]], <32 x i16> [[M]]) +; CHECK-NEXT: ret <32 x i16> [[S]] +; + %m = or <32 x i16> %a1, + %s = call <32 x i16> @llvm.x86.avx512.permvar.hi.512(<32 x i16> %a0, <32 x i16> %m) + ret <32 x i16> %s +} + +declare <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8>, <16 x i8>) + +define <16 x i8> @identity_test_permvar_qi_128(<16 x i8> %a0) { +; CHECK-LABEL: @identity_test_permvar_qi_128( +; CHECK-NEXT: ret <16 x i8> [[A0:%.*]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + ret <16 x i8> %1 +} + +define <16 x i8> @identity_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { +; CHECK-LABEL: @identity_test_permvar_qi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <16 x i1> [[TMP1]], <16 x i8> [[A0:%.*]], <16 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i8> [[TMP2]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru + ret <16 x i8> %3 +} + +define <16 x i8> @zero_test_permvar_qi_128(<16 x i8> %a0) { +; CHECK-LABEL: @zero_test_permvar_qi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: ret <16 x i8> [[TMP1]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) + ret <16 x i8> %1 +} + +define <16 x i8> @zero_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { +; CHECK-LABEL: @zero_test_permvar_qi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i8> [[TMP3]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> zeroinitializer) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru + ret <16 x i8> %3 +} + +define <16 x i8> @shuffle_test_permvar_qi_128(<16 x i8> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_qi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i8> [[TMP1]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + ret <16 x i8> %1 +} + +define <16 x i8> @shuffle_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_qi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i8> [[TMP3]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru + ret <16 x i8> %3 +} + +define <16 x i8> @undef_test_permvar_qi_128(<16 x i8> %a0) { +; CHECK-LABEL: @undef_test_permvar_qi_128( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> +; CHECK-NEXT: ret <16 x i8> [[TMP1]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + ret <16 x i8> %1 +} + +define <16 x i8> @undef_test_permvar_qi_128_mask(<16 x i8> %a0, <16 x i8> %passthru, i16 %mask) { +; CHECK-LABEL: @undef_test_permvar_qi_128_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A0:%.*]], <16 x i8> poison, <16 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i16 [[MASK:%.*]] to <16 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <16 x i1> [[TMP2]], <16 x i8> [[TMP1]], <16 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <16 x i8> [[TMP3]] +; + %1 = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> ) + %2 = bitcast i16 %mask to <16 x i1> + %3 = select <16 x i1> %2, <16 x i8> %1, <16 x i8> %passthru + ret <16 x i8> %3 +} + +define <16 x i8> @demandedbit_test_permvar_qi_129_mask(<16 x i8> %a0, <16 x i8> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_qi_129_mask( +; CHECK-NEXT: [[M:%.*]] = or <16 x i8> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> [[A0:%.*]], <16 x i8> [[M]]) +; CHECK-NEXT: ret <16 x i8> [[S]] +; + %m = or <16 x i8> %a1, + %s = call <16 x i8> @llvm.x86.avx512.permvar.qi.128(<16 x i8> %a0, <16 x i8> %m) + ret <16 x i8> %s +} + +declare <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8>, <32 x i8>) + +define <32 x i8> @identity_test_permvar_qi_256(<32 x i8> %a0) { +; CHECK-LABEL: @identity_test_permvar_qi_256( +; CHECK-NEXT: ret <32 x i8> [[A0:%.*]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + ret <32 x i8> %1 +} + +define <32 x i8> @identity_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { +; CHECK-LABEL: @identity_test_permvar_qi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <32 x i1> [[TMP1]], <32 x i8> [[A0:%.*]], <32 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i8> [[TMP2]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru + ret <32 x i8> %3 +} + +define <32 x i8> @zero_test_permvar_qi_256(<32 x i8> %a0) { +; CHECK-LABEL: @zero_test_permvar_qi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer +; CHECK-NEXT: ret <32 x i8> [[TMP1]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) + ret <32 x i8> %1 +} + +define <32 x i8> @zero_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { +; CHECK-LABEL: @zero_test_permvar_qi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i8> [[TMP3]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> zeroinitializer) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru + ret <32 x i8> %3 +} + +define <32 x i8> @shuffle_test_permvar_qi_256(<32 x i8> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_qi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> +; CHECK-NEXT: ret <32 x i8> [[TMP1]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + ret <32 x i8> %1 +} + +define <32 x i8> @shuffle_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_qi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i8> [[TMP3]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru + ret <32 x i8> %3 +} + +define <32 x i8> @undef_test_permvar_qi_256(<32 x i8> %a0) { +; CHECK-LABEL: @undef_test_permvar_qi_256( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> +; CHECK-NEXT: ret <32 x i8> [[TMP1]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + ret <32 x i8> %1 +} + +define <32 x i8> @undef_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %passthru, i32 %mask) { +; CHECK-LABEL: @undef_test_permvar_qi_256_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <32 x i8> [[A0:%.*]], <32 x i8> poison, <32 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[MASK:%.*]] to <32 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <32 x i1> [[TMP2]], <32 x i8> [[TMP1]], <32 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <32 x i8> [[TMP3]] +; + %1 = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> ) + %2 = bitcast i32 %mask to <32 x i1> + %3 = select <32 x i1> %2, <32 x i8> %1, <32 x i8> %passthru + ret <32 x i8> %3 +} + +define <32 x i8> @demandedbit_test_permvar_qi_256_mask(<32 x i8> %a0, <32 x i8> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_qi_256_mask( +; CHECK-NEXT: [[M:%.*]] = or <32 x i8> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> [[A0:%.*]], <32 x i8> [[M]]) +; CHECK-NEXT: ret <32 x i8> [[S]] +; + %m = or <32 x i8> %a1, + %s = call <32 x i8> @llvm.x86.avx512.permvar.qi.256(<32 x i8> %a0, <32 x i8> %m) + ret <32 x i8> %s +} + +declare <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8>, <64 x i8>) + +define <64 x i8> @identity_test_permvar_qi_512(<64 x i8> %a0) { +; CHECK-LABEL: @identity_test_permvar_qi_512( +; CHECK-NEXT: ret <64 x i8> [[A0:%.*]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + ret <64 x i8> %1 +} + +define <64 x i8> @identity_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { +; CHECK-LABEL: @identity_test_permvar_qi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> +; CHECK-NEXT: [[TMP2:%.*]] = select <64 x i1> [[TMP1]], <64 x i8> [[A0:%.*]], <64 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <64 x i8> [[TMP2]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + %2 = bitcast i64 %mask to <64 x i1> + %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru + ret <64 x i8> %3 +} + +define <64 x i8> @zero_test_permvar_qi_512(<64 x i8> %a0) { +; CHECK-LABEL: @zero_test_permvar_qi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer +; CHECK-NEXT: ret <64 x i8> [[TMP1]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) + ret <64 x i8> %1 +} + +define <64 x i8> @zero_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { +; CHECK-LABEL: @zero_test_permvar_qi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> zeroinitializer +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <64 x i8> [[TMP3]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> zeroinitializer) + %2 = bitcast i64 %mask to <64 x i1> + %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru + ret <64 x i8> %3 +} + +define <64 x i8> @shuffle_test_permvar_qi_512(<64 x i8> %a0) { +; CHECK-LABEL: @shuffle_test_permvar_qi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> +; CHECK-NEXT: ret <64 x i8> [[TMP1]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + ret <64 x i8> %1 +} + +define <64 x i8> @shuffle_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { +; CHECK-LABEL: @shuffle_test_permvar_qi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <64 x i8> [[TMP3]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + %2 = bitcast i64 %mask to <64 x i1> + %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru + ret <64 x i8> %3 +} + +define <64 x i8> @undef_test_permvar_qi_512(<64 x i8> %a0) { +; CHECK-LABEL: @undef_test_permvar_qi_512( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> +; CHECK-NEXT: ret <64 x i8> [[TMP1]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + ret <64 x i8> %1 +} + +define <64 x i8> @undef_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %passthru, i64 %mask) { +; CHECK-LABEL: @undef_test_permvar_qi_512_mask( +; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <64 x i8> [[A0:%.*]], <64 x i8> poison, <64 x i32> +; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[MASK:%.*]] to <64 x i1> +; CHECK-NEXT: [[TMP3:%.*]] = select <64 x i1> [[TMP2]], <64 x i8> [[TMP1]], <64 x i8> [[PASSTHRU:%.*]] +; CHECK-NEXT: ret <64 x i8> [[TMP3]] +; + %1 = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> ) + %2 = bitcast i64 %mask to <64 x i1> + %3 = select <64 x i1> %2, <64 x i8> %1, <64 x i8> %passthru + ret <64 x i8> %3 +} + +define <64 x i8> @demandedbit_test_permvar_qi_512_mask(<64 x i8> %a0, <64 x i8> %a1) { +; CHECK-LABEL: @demandedbit_test_permvar_qi_512_mask( +; CHECK-NEXT: [[M:%.*]] = or <64 x i8> [[A1:%.*]], +; CHECK-NEXT: [[S:%.*]] = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> [[A0:%.*]], <64 x i8> [[M]]) +; CHECK-NEXT: ret <64 x i8> [[S]] +; + %m = or <64 x i8> %a1, + %s = call <64 x i8> @llvm.x86.avx512.permvar.qi.512(<64 x i8> %a0, <64 x i8> %m) + ret <64 x i8> %s +} diff --git a/llvm/test/Transforms/InstCombine/X86/x86-vpermi2.ll b/llvm/test/Transforms/InstCombine/X86/x86-vpermi2.ll index a65358e1033cc..eb6ad4458d932 100644 --- a/llvm/test/Transforms/InstCombine/X86/x86-vpermi2.ll +++ b/llvm/test/Transforms/InstCombine/X86/x86-vpermi2.ll @@ -25,6 +25,30 @@ define <2 x i64> @shuffle_vpermv3_v2i64_unary(<2 x i64> %x0) { ret <2 x i64> %r } +define <2 x i64> @shuffle_vpermv3_v2i64_demandedbits(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %m) { +; CHECK-LABEL: define <2 x i64> @shuffle_vpermv3_v2i64_demandedbits( +; CHECK-SAME: <2 x i64> [[X0:%.*]], <2 x i64> [[X1:%.*]], <2 x i64> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <2 x i64> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <2 x i64> @llvm.x86.avx512.vpermi2var.q.128(<2 x i64> [[X0]], <2 x i64> [[T]], <2 x i64> [[X1]]) +; CHECK-NEXT: ret <2 x i64> [[R]] +; + %t = or <2 x i64> %m, + %r = call <2 x i64> @llvm.x86.avx512.vpermi2var.q.128(<2 x i64> %x0, <2 x i64> %t, <2 x i64> %x1) + ret <2 x i64> %r +} + +define <2 x i64> @shuffle_vpermv3_v2i64_demandedbits_negative(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %m) { +; CHECK-LABEL: define <2 x i64> @shuffle_vpermv3_v2i64_demandedbits_negative( +; CHECK-SAME: <2 x i64> [[X0:%.*]], <2 x i64> [[X1:%.*]], <2 x i64> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <2 x i64> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <2 x i64> @llvm.x86.avx512.vpermi2var.q.128(<2 x i64> [[X0]], <2 x i64> [[T]], <2 x i64> [[X1]]) +; CHECK-NEXT: ret <2 x i64> [[R]] +; + %t = or <2 x i64> %m, + %r = call <2 x i64> @llvm.x86.avx512.vpermi2var.q.128(<2 x i64> %x0, <2 x i64> %t, <2 x i64> %x1) + ret <2 x i64> %r +} + define <4 x i64> @shuffle_vpermv3_v4i64(<4 x i64> %x0, <4 x i64> %x1) { ; CHECK-LABEL: define <4 x i64> @shuffle_vpermv3_v4i64( ; CHECK-SAME: <4 x i64> [[X0:%.*]], <4 x i64> [[X1:%.*]]) { @@ -45,6 +69,18 @@ define <4 x i64> @shuffle_vpermv3_v4i64_unary(<4 x i64> %x0) { ret <4 x i64> %r } +define <4 x i64> @shuffle_vpermv3_v4i64_demandedbits(<4 x i64> %x0, <4 x i64> %x1, <4 x i64> %m) { +; CHECK-LABEL: define <4 x i64> @shuffle_vpermv3_v4i64_demandedbits( +; CHECK-SAME: <4 x i64> [[X0:%.*]], <4 x i64> [[X1:%.*]], <4 x i64> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <4 x i64> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <4 x i64> @llvm.x86.avx512.vpermi2var.q.256(<4 x i64> [[X0]], <4 x i64> [[T]], <4 x i64> [[X1]]) +; CHECK-NEXT: ret <4 x i64> [[R]] +; + %t = or <4 x i64> %m, + %r = call <4 x i64> @llvm.x86.avx512.vpermi2var.q.256(<4 x i64> %x0, <4 x i64> %t, <4 x i64> %x1) + ret <4 x i64> %r +} + define <8 x i64> @shuffle_vpermv3_v8i64(<8 x i64> %x0, <8 x i64> %x1) { ; CHECK-LABEL: define <8 x i64> @shuffle_vpermv3_v8i64( ; CHECK-SAME: <8 x i64> [[X0:%.*]], <8 x i64> [[X1:%.*]]) { @@ -65,6 +101,18 @@ define <8 x i64> @shuffle_vpermv3_v8i64_unary(<8 x i64> %x0) { ret <8 x i64> %r } +define <8 x i64> @shuffle_vpermv3_v8i64_demandedbits(<8 x i64> %x0, <8 x i64> %x1, <8 x i64> %m) { +; CHECK-LABEL: define <8 x i64> @shuffle_vpermv3_v8i64_demandedbits( +; CHECK-SAME: <8 x i64> [[X0:%.*]], <8 x i64> [[X1:%.*]], <8 x i64> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <8 x i64> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <8 x i64> @llvm.x86.avx512.vpermi2var.q.512(<8 x i64> [[X0]], <8 x i64> [[T]], <8 x i64> [[X1]]) +; CHECK-NEXT: ret <8 x i64> [[R]] +; + %t = or <8 x i64> %m, + %r = call <8 x i64> @llvm.x86.avx512.vpermi2var.q.512(<8 x i64> %x0, <8 x i64> %t, <8 x i64> %x1) + ret <8 x i64> %r +} + ; ; vXi32 ; @@ -89,6 +137,18 @@ define <4 x i32> @shuffle_vpermv3_v4i32_unary(<4 x i32> %x0) { ret <4 x i32> %r } +define <4 x i32> @shuffle_vpermv3_v4i32_demandedbits(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %m) { +; CHECK-LABEL: define <4 x i32> @shuffle_vpermv3_v4i32_demandedbits( +; CHECK-SAME: <4 x i32> [[X0:%.*]], <4 x i32> [[X1:%.*]], <4 x i32> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <4 x i32> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <4 x i32> @llvm.x86.avx512.vpermi2var.d.128(<4 x i32> [[X0]], <4 x i32> [[T]], <4 x i32> [[X1]]) +; CHECK-NEXT: ret <4 x i32> [[R]] +; + %t = or <4 x i32> %m, + %r = call <4 x i32> @llvm.x86.avx512.vpermi2var.d.128(<4 x i32> %x0, <4 x i32> %t, <4 x i32> %x1) + ret <4 x i32> %r +} + define <8 x i32> @shuffle_vpermv3_v8i32(<8 x i32> %x0, <8 x i32> %x1) { ; CHECK-LABEL: define <8 x i32> @shuffle_vpermv3_v8i32( ; CHECK-SAME: <8 x i32> [[X0:%.*]], <8 x i32> [[X1:%.*]]) { @@ -109,6 +169,18 @@ define <8 x i32> @shuffle_vpermv3_v8i32_unary(<8 x i32> %x0) { ret <8 x i32> %r } +define <8 x i32> @shuffle_vpermv3_v8i32_demandedbits(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %m) { +; CHECK-LABEL: define <8 x i32> @shuffle_vpermv3_v8i32_demandedbits( +; CHECK-SAME: <8 x i32> [[X0:%.*]], <8 x i32> [[X1:%.*]], <8 x i32> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <8 x i32> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <8 x i32> @llvm.x86.avx512.vpermi2var.d.256(<8 x i32> [[X0]], <8 x i32> [[T]], <8 x i32> [[X1]]) +; CHECK-NEXT: ret <8 x i32> [[R]] +; + %t = or <8 x i32> %m, + %r = call <8 x i32> @llvm.x86.avx512.vpermi2var.d.256(<8 x i32> %x0, <8 x i32> %t, <8 x i32> %x1) + ret <8 x i32> %r +} + define <16 x i32> @shuffle_vpermv3_v16i32(<16 x i32> %x0, <16 x i32> %x1) { ; CHECK-LABEL: define <16 x i32> @shuffle_vpermv3_v16i32( ; CHECK-SAME: <16 x i32> [[X0:%.*]], <16 x i32> [[X1:%.*]]) { @@ -129,6 +201,18 @@ define <16 x i32> @shuffle_vpermv3_v16i32_unary(<16 x i32> %x0) { ret <16 x i32> %r } +define <16 x i32> @shuffle_vpermv3_v16i32_demandedbits(<16 x i32> %x0, <16 x i32> %x1, <16 x i32> %m) { +; CHECK-LABEL: define <16 x i32> @shuffle_vpermv3_v16i32_demandedbits( +; CHECK-SAME: <16 x i32> [[X0:%.*]], <16 x i32> [[X1:%.*]], <16 x i32> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <16 x i32> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <16 x i32> @llvm.x86.avx512.vpermi2var.d.512(<16 x i32> [[X0]], <16 x i32> [[T]], <16 x i32> [[X1]]) +; CHECK-NEXT: ret <16 x i32> [[R]] +; + %t = or <16 x i32> %m, + %r = call <16 x i32> @llvm.x86.avx512.vpermi2var.d.512(<16 x i32> %x0, <16 x i32> %t, <16 x i32> %x1) + ret <16 x i32> %r +} + ; ; vXi16 ; @@ -153,6 +237,18 @@ define <8 x i16> @shuffle_vpermv3_v8i16_unary(<8 x i16> %x0) { ret <8 x i16> %r } +define <8 x i16> @shuffle_vpermv3_v8i16_demandedbits(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %m) { +; CHECK-LABEL: define <8 x i16> @shuffle_vpermv3_v8i16_demandedbits( +; CHECK-SAME: <8 x i16> [[X0:%.*]], <8 x i16> [[X1:%.*]], <8 x i16> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <8 x i16> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <8 x i16> @llvm.x86.avx512.vpermi2var.hi.128(<8 x i16> [[X0]], <8 x i16> [[T]], <8 x i16> [[X1]]) +; CHECK-NEXT: ret <8 x i16> [[R]] +; + %t = or <8 x i16> %m, + %r = call <8 x i16> @llvm.x86.avx512.vpermi2var.hi.128(<8 x i16> %x0, <8 x i16> %t, <8 x i16> %x1) + ret <8 x i16> %r +} + define <16 x i16> @shuffle_vpermv3_v16i16(<16 x i16> %x0, <16 x i16> %x1) { ; CHECK-LABEL: define <16 x i16> @shuffle_vpermv3_v16i16( ; CHECK-SAME: <16 x i16> [[X0:%.*]], <16 x i16> [[X1:%.*]]) { @@ -173,6 +269,18 @@ define <16 x i16> @shuffle_vpermv3_v16i16_unary(<16 x i16> %x0) { ret <16 x i16> %r } +define <16 x i16> @shuffle_vpermv3_v16i16_demandedbits(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %m) { +; CHECK-LABEL: define <16 x i16> @shuffle_vpermv3_v16i16_demandedbits( +; CHECK-SAME: <16 x i16> [[X0:%.*]], <16 x i16> [[X1:%.*]], <16 x i16> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <16 x i16> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <16 x i16> @llvm.x86.avx512.vpermi2var.hi.256(<16 x i16> [[X0]], <16 x i16> [[T]], <16 x i16> [[X1]]) +; CHECK-NEXT: ret <16 x i16> [[R]] +; + %t = or <16 x i16> %m, + %r = call <16 x i16> @llvm.x86.avx512.vpermi2var.hi.256(<16 x i16> %x0, <16 x i16> %t, <16 x i16> %x1) + ret <16 x i16> %r +} + define <32 x i16> @shuffle_vpermv3_v32i16(<32 x i16> %x0, <32 x i16> %x1) { ; CHECK-LABEL: define <32 x i16> @shuffle_vpermv3_v32i16( ; CHECK-SAME: <32 x i16> [[X0:%.*]], <32 x i16> [[X1:%.*]]) { @@ -193,6 +301,18 @@ define <32 x i16> @shuffle_vpermv3_v32i16_unary(<32 x i16> %x0) { ret <32 x i16> %r } +define <32 x i16> @shuffle_vpermv3_v32i16_demandedbits(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %m) { +; CHECK-LABEL: define <32 x i16> @shuffle_vpermv3_v32i16_demandedbits( +; CHECK-SAME: <32 x i16> [[X0:%.*]], <32 x i16> [[X1:%.*]], <32 x i16> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <32 x i16> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <32 x i16> @llvm.x86.avx512.vpermi2var.hi.512(<32 x i16> [[X0]], <32 x i16> [[T]], <32 x i16> [[X1]]) +; CHECK-NEXT: ret <32 x i16> [[R]] +; + %t = or <32 x i16> %m, + %r = call <32 x i16> @llvm.x86.avx512.vpermi2var.hi.512(<32 x i16> %x0, <32 x i16> %t, <32 x i16> %x1) + ret <32 x i16> %r +} + ; ; vXi8 ; @@ -217,6 +337,18 @@ define <16 x i8> @shuffle_vpermv3_v16i8_unary(<16 x i8> %x0) { ret <16 x i8> %r } +define <16 x i8> @shuffle_vpermv3_v16i8_demandedbits(<16 x i8> %x0, <16 x i8> %x1, <16 x i8> %m) { +; CHECK-LABEL: define <16 x i8> @shuffle_vpermv3_v16i8_demandedbits( +; CHECK-SAME: <16 x i8> [[X0:%.*]], <16 x i8> [[X1:%.*]], <16 x i8> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <16 x i8> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <16 x i8> @llvm.x86.avx512.vpermi2var.qi.128(<16 x i8> [[X0]], <16 x i8> [[T]], <16 x i8> [[X1]]) +; CHECK-NEXT: ret <16 x i8> [[R]] +; + %t = or <16 x i8> %m, + %r = call <16 x i8> @llvm.x86.avx512.vpermi2var.qi.128(<16 x i8> %x0, <16 x i8> %t, <16 x i8> %x1) + ret <16 x i8> %r +} + define <32 x i8> @shuffle_vpermv3_v32i8(<32 x i8> %x0, <32 x i8> %x1) { ; CHECK-LABEL: define <32 x i8> @shuffle_vpermv3_v32i8( ; CHECK-SAME: <32 x i8> [[X0:%.*]], <32 x i8> [[X1:%.*]]) { @@ -237,6 +369,18 @@ define <32 x i8> @shuffle_vpermv3_v32i8_unary(<32 x i8> %x0) { ret <32 x i8> %r } +define <32 x i8> @shuffle_vpermv3_v32i8_demandedbits(<32 x i8> %x0, <32 x i8> %x1, <32 x i8> %m) { +; CHECK-LABEL: define <32 x i8> @shuffle_vpermv3_v32i8_demandedbits( +; CHECK-SAME: <32 x i8> [[X0:%.*]], <32 x i8> [[X1:%.*]], <32 x i8> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <32 x i8> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <32 x i8> @llvm.x86.avx512.vpermi2var.qi.256(<32 x i8> [[X0]], <32 x i8> [[T]], <32 x i8> [[X1]]) +; CHECK-NEXT: ret <32 x i8> [[R]] +; + %t = or <32 x i8> %m, + %r = call <32 x i8> @llvm.x86.avx512.vpermi2var.qi.256(<32 x i8> %x0, <32 x i8> %t, <32 x i8> %x1) + ret <32 x i8> %r +} + define <64 x i8> @shuffle_vpermv3_v64i8(<64 x i8> %x0, <64 x i8> %x1) { ; CHECK-LABEL: define <64 x i8> @shuffle_vpermv3_v64i8( ; CHECK-SAME: <64 x i8> [[X0:%.*]], <64 x i8> [[X1:%.*]]) { @@ -256,3 +400,15 @@ define <64 x i8> @shuffle_vpermv3_v64i8_unary(<64 x i8> %x0) { %r = call <64 x i8> @llvm.x86.avx512.vpermi2var.qi.512(<64 x i8> %x0, <64 x i8> , <64 x i8> %x0) ret <64 x i8> %r } + +define <64 x i8> @shuffle_vpermv3_v64i8_demandedbits(<64 x i8> %x0, <64 x i8> %x1, <64 x i8> %m) { +; CHECK-LABEL: define <64 x i8> @shuffle_vpermv3_v64i8_demandedbits( +; CHECK-SAME: <64 x i8> [[X0:%.*]], <64 x i8> [[X1:%.*]], <64 x i8> [[M:%.*]]) { +; CHECK-NEXT: [[T:%.*]] = or <64 x i8> [[M]], +; CHECK-NEXT: [[R:%.*]] = call <64 x i8> @llvm.x86.avx512.vpermi2var.qi.512(<64 x i8> [[X0]], <64 x i8> [[T]], <64 x i8> [[X1]]) +; CHECK-NEXT: ret <64 x i8> [[R]] +; + %t = or <64 x i8> %m, + %r = call <64 x i8> @llvm.x86.avx512.vpermi2var.qi.512(<64 x i8> %x0, <64 x i8> %t, <64 x i8> %x1) + ret <64 x i8> %r +} diff --git a/llvm/test/Transforms/LICM/hoist-add-sub.ll b/llvm/test/Transforms/LICM/hoist-add-sub.ll index 5393cdb1d29c4..d9b868eda579f 100644 --- a/llvm/test/Transforms/LICM/hoist-add-sub.ll +++ b/llvm/test/Transforms/LICM/hoist-add-sub.ll @@ -51,6 +51,55 @@ out_of_bounds: ret i32 -1 } +define i32 @test_01_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_01_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG1:![0-9]+]] +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG2:![0-9]+]] +; CHECK-NEXT: [[INVARIANT_OP:%.*]] = sub nuw i32 [[X]], 4 +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ugt i32 [[IV]], [[INVARIANT_OP]] +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; +entry: + %x = load i32, ptr %x_p, !range !2 + %length = load i32, ptr %length_p, !range !1 + br label %loop + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = sub nuw i32 %x, %iv + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 +} + ; TODO: x - iv < 4 ==> iv > x - 4 define i32 @test_01a(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_01a @@ -114,6 +163,68 @@ failed: ret i32 -2 } +define i32 @test_01a_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_01a_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4 +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4 +; CHECK-NEXT: [[PRECOND_1:%.*]] = icmp uge i32 [[X]], 0 +; CHECK-NEXT: [[PRECOND_2:%.*]] = icmp uge i32 [[LENGTH]], 0 +; CHECK-NEXT: [[PRECOND:%.*]] = and i1 [[PRECOND_1]], [[PRECOND_2]] +; CHECK-NEXT: br i1 [[PRECOND]], label [[LOOP_PREHEADER:%.*]], label [[FAILED:%.*]] +; CHECK: loop.preheader: +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ] +; CHECK-NEXT: [[ARITH:%.*]] = sub nuw i32 [[X]], [[IV]] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[ARITH]], 4 +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; CHECK: failed: +; CHECK-NEXT: ret i32 -2 +; +entry: + %x = load i32, ptr %x_p + %length = load i32, ptr %length_p + %precond_1 = icmp uge i32 %x, 0 + %precond_2 = icmp uge i32 %length, 0 + %precond = and i1 %precond_1, %precond_2 + br i1 %precond, label %loop, label %failed + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = sub nuw i32 %x, %iv + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 + +failed: + ret i32 -2 +} + ; Range info is missing for x, cannot prove no-overflow. Should not hoist. define i32 @test_01_neg(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_01_neg @@ -164,6 +275,54 @@ out_of_bounds: ret i32 -1 } +define i32 @test_01_neg_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_01_neg_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4 +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG0]] +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ] +; CHECK-NEXT: [[ARITH:%.*]] = sub nuw i32 [[X]], [[IV]] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[ARITH]], 4 +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; +entry: + %x = load i32, ptr %x_p + %length = load i32, ptr %length_p, !range !0 + br label %loop + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = sub nuw i32 %x, %iv + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 +} ; x + iv < 4 ==> iv < 4 - x define i32 @test_02(ptr %p, ptr %x_p, ptr %length_p) { @@ -215,6 +374,55 @@ out_of_bounds: ret i32 -1 } +define i32 @test_02_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_02_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG3:![0-9]+]] +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG2]] +; CHECK-NEXT: [[INVARIANT_OP:%.*]] = sub nuw i32 4, [[X]] +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[IV]], [[INVARIANT_OP]] +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; +entry: + %x = load i32, ptr %x_p, !range !3 + %length = load i32, ptr %length_p, !range !1 + br label %loop + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = add nuw i32 %x, %iv + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 +} + ; TODO: x + iv < 4 ==> iv < 4 - x define i32 @test_02a(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_02a @@ -278,12 +486,74 @@ failed: ret i32 -2 } +define i32 @test_02a_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_02a_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4 +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4 +; CHECK-NEXT: [[PRECOND_1:%.*]] = icmp uge i32 [[X]], 0 +; CHECK-NEXT: [[PRECOND_2:%.*]] = icmp uge i32 [[LENGTH]], 0 +; CHECK-NEXT: [[PRECOND:%.*]] = and i1 [[PRECOND_1]], [[PRECOND_2]] +; CHECK-NEXT: br i1 [[PRECOND]], label [[LOOP_PREHEADER:%.*]], label [[FAILED:%.*]] +; CHECK: loop.preheader: +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ] +; CHECK-NEXT: [[ARITH:%.*]] = add nuw i32 [[X]], [[IV]] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[ARITH]], 4 +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; CHECK: failed: +; CHECK-NEXT: ret i32 -2 +; +entry: + %x = load i32, ptr %x_p + %length = load i32, ptr %length_p + %precond_1 = icmp uge i32 %x, 0 + %precond_2 = icmp uge i32 %length, 0 + %precond = and i1 %precond_1, %precond_2 + br i1 %precond, label %loop, label %failed + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = add nuw i32 %x, %iv + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 + +failed: + ret i32 -2 +} + ; iv - x < 4 ==> iv < 4 + x define i32 @test_03(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_03 ; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { ; CHECK-NEXT: entry: -; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG1:![0-9]+]] +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG2]] ; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG0]] ; CHECK-NEXT: [[INVARIANT_OP:%.*]] = add nsw i32 [[X]], 4 ; CHECK-NEXT: br label [[LOOP:%.*]] @@ -328,6 +598,55 @@ out_of_bounds: ret i32 -1 } +define i32 @test_03_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_03_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG2]] +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG2]] +; CHECK-NEXT: [[INVARIANT_OP:%.*]] = add nuw i32 [[X]], 4 +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[IV]], [[INVARIANT_OP]] +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; +entry: + %x = load i32, ptr %x_p, !range !1 + %length = load i32, ptr %length_p, !range !1 + br label %loop + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = sub nuw i32 %iv, %x + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 +} + ; TODO: iv - x < 4 ==> iv < 4 + x define i32 @test_03a(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_03a @@ -391,6 +710,68 @@ failed: ret i32 -2 } +define i32 @test_03a_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_03a_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4 +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4 +; CHECK-NEXT: [[PRECOND_1:%.*]] = icmp ult i32 [[X]], 2147483640 +; CHECK-NEXT: [[PRECOND_2:%.*]] = icmp uge i32 [[LENGTH]], 0 +; CHECK-NEXT: [[PRECOND:%.*]] = and i1 [[PRECOND_1]], [[PRECOND_2]] +; CHECK-NEXT: br i1 [[PRECOND]], label [[LOOP_PREHEADER:%.*]], label [[FAILED:%.*]] +; CHECK: loop.preheader: +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ] +; CHECK-NEXT: [[ARITH:%.*]] = sub nuw i32 [[IV]], [[X]] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[ARITH]], 4 +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; CHECK: failed: +; CHECK-NEXT: ret i32 -2 +; +entry: + %x = load i32, ptr %x_p + %length = load i32, ptr %length_p + %precond_1 = icmp ult i32 %x, 2147483640 + %precond_2 = icmp uge i32 %length, 0 + %precond = and i1 %precond_1, %precond_2 + br i1 %precond, label %loop, label %failed + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = sub nuw i32 %iv, %x + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 + +failed: + ret i32 -2 +} + ; iv + x < 4 ==> iv < 4 - x define i32 @test_04(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_04 @@ -441,6 +822,55 @@ out_of_bounds: ret i32 -1 } +define i32 @test_04_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_04_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4, !range [[RNG3]] +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4, !range [[RNG2]] +; CHECK-NEXT: [[INVARIANT_OP:%.*]] = sub nuw i32 4, [[X]] +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[IV]], [[INVARIANT_OP]] +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; +entry: + %x = load i32, ptr %x_p, !range !3 + %length = load i32, ptr %length_p, !range !1 + br label %loop + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = add nuw i32 %iv, %x + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 +} + ; TODO: iv + x < 4 ==> iv < 4 - x define i32 @test_04a(ptr %p, ptr %x_p, ptr %length_p) { ; CHECK-LABEL: define i32 @test_04a @@ -504,5 +934,69 @@ failed: ret i32 -2 } +define i32 @test_04a_unsigned(ptr %p, ptr %x_p, ptr %length_p) { +; CHECK-LABEL: define i32 @test_04a_unsigned +; CHECK-SAME: (ptr [[P:%.*]], ptr [[X_P:%.*]], ptr [[LENGTH_P:%.*]]) { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[X:%.*]] = load i32, ptr [[X_P]], align 4 +; CHECK-NEXT: [[LENGTH:%.*]] = load i32, ptr [[LENGTH_P]], align 4 +; CHECK-NEXT: [[PRECOND_1:%.*]] = icmp sge i32 [[X]], 0 +; CHECK-NEXT: [[PRECOND_2:%.*]] = icmp sge i32 [[LENGTH]], 0 +; CHECK-NEXT: [[PRECOND:%.*]] = and i1 [[PRECOND_1]], [[PRECOND_2]] +; CHECK-NEXT: br i1 [[PRECOND]], label [[LOOP_PREHEADER:%.*]], label [[FAILED:%.*]] +; CHECK: loop.preheader: +; CHECK-NEXT: br label [[LOOP:%.*]] +; CHECK: loop: +; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ] +; CHECK-NEXT: [[ARITH:%.*]] = add nuw i32 [[IV]], [[X]] +; CHECK-NEXT: [[X_CHECK:%.*]] = icmp ult i32 [[ARITH]], 4 +; CHECK-NEXT: br i1 [[X_CHECK]], label [[OUT_OF_BOUNDS:%.*]], label [[BACKEDGE]] +; CHECK: backedge: +; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]] +; CHECK-NEXT: store i32 1, ptr [[EL_PTR]], align 4 +; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 4 +; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], [[LENGTH]] +; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]] +; CHECK: exit: +; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_NEXT]], [[BACKEDGE]] ] +; CHECK-NEXT: ret i32 [[IV_NEXT_LCSSA]] +; CHECK: out_of_bounds: +; CHECK-NEXT: ret i32 -1 +; CHECK: failed: +; CHECK-NEXT: ret i32 -2 +; +entry: + %x = load i32, ptr %x_p + %length = load i32, ptr %length_p + %precond_1 = icmp sge i32 %x, 0 + %precond_2 = icmp sge i32 %length, 0 + %precond = and i1 %precond_1, %precond_2 + br i1 %precond, label %loop, label %failed + +loop: + %iv = phi i32 [0, %entry], [%iv.next, %backedge] + %arith = add nuw i32 %iv, %x + %x_check = icmp ult i32 %arith, 4 + br i1 %x_check, label %out_of_bounds, label %backedge + +backedge: + %el.ptr = getelementptr i32, ptr %p, i32 %iv + store i32 1, ptr %el.ptr + %iv.next = add nuw nsw i32 %iv, 4 + %loop_cond = icmp ult i32 %iv.next, %length + br i1 %loop_cond, label %loop, label %exit + +exit: + ret i32 %iv.next + +out_of_bounds: + ret i32 -1 + +failed: + ret i32 -2 +} + !0 = !{i32 0, i32 2147483648} !1 = !{i32 0, i32 2147483640} +!2 = !{i32 256, i32 32768} +!3 = !{i32 0, i32 2} diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll b/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll index f467f3cf262d2..93034f4dbe56e 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll @@ -215,16 +215,14 @@ define void @test_if_then(ptr noalias %a, ptr readnone %b) #4 { ; TFCOMMON-NEXT: [[TMP6:%.*]] = icmp ugt [[WIDE_MASKED_LOAD]], shufflevector ( insertelement ( poison, i64 50, i64 0), poison, zeroinitializer) ; TFCOMMON-NEXT: [[TMP7:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP6]], zeroinitializer ; TFCOMMON-NEXT: [[TMP8:%.*]] = call @foo_vector( [[WIDE_MASKED_LOAD]], [[TMP7]]) -; TFCOMMON-NEXT: [[TMP9:%.*]] = xor [[TMP6]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; TFCOMMON-NEXT: [[TMP10:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP9]], zeroinitializer -; TFCOMMON-NEXT: [[PREDPHI:%.*]] = select [[TMP10]], zeroinitializer, [[TMP8]] -; TFCOMMON-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 [[INDEX]] -; TFCOMMON-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI]], ptr [[TMP11]], i32 8, [[ACTIVE_LANE_MASK]]) +; TFCOMMON-NEXT: [[PREDPHI:%.*]] = select [[TMP7]], [[TMP8]], zeroinitializer +; TFCOMMON-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 [[INDEX]] +; TFCOMMON-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI]], ptr [[TMP9]], i32 8, [[ACTIVE_LANE_MASK]]) ; TFCOMMON-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP4]] ; TFCOMMON-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[INDEX_NEXT]], i64 1025) -; TFCOMMON-NEXT: [[TMP12:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; TFCOMMON-NEXT: [[TMP13:%.*]] = extractelement [[TMP12]], i32 0 -; TFCOMMON-NEXT: br i1 [[TMP13]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] +; TFCOMMON-NEXT: [[TMP10:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) +; TFCOMMON-NEXT: [[TMP11:%.*]] = extractelement [[TMP10]], i32 0 +; TFCOMMON-NEXT: br i1 [[TMP11]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] ; TFCOMMON: for.cond.cleanup: ; TFCOMMON-NEXT: ret void ; @@ -259,27 +257,23 @@ define void @test_if_then(ptr noalias %a, ptr readnone %b) #4 { ; TFA_INTERLEAVE-NEXT: [[TMP14:%.*]] = select [[ACTIVE_LANE_MASK2]], [[TMP12]], zeroinitializer ; TFA_INTERLEAVE-NEXT: [[TMP15:%.*]] = call @foo_vector( [[WIDE_MASKED_LOAD]], [[TMP13]]) ; TFA_INTERLEAVE-NEXT: [[TMP16:%.*]] = call @foo_vector( [[WIDE_MASKED_LOAD3]], [[TMP14]]) -; TFA_INTERLEAVE-NEXT: [[TMP17:%.*]] = xor [[TMP11]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; TFA_INTERLEAVE-NEXT: [[TMP18:%.*]] = xor [[TMP12]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; TFA_INTERLEAVE-NEXT: [[TMP19:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP17]], zeroinitializer -; TFA_INTERLEAVE-NEXT: [[TMP20:%.*]] = select [[ACTIVE_LANE_MASK2]], [[TMP18]], zeroinitializer -; TFA_INTERLEAVE-NEXT: [[PREDPHI:%.*]] = select [[TMP19]], zeroinitializer, [[TMP15]] -; TFA_INTERLEAVE-NEXT: [[PREDPHI4:%.*]] = select [[TMP20]], zeroinitializer, [[TMP16]] -; TFA_INTERLEAVE-NEXT: [[TMP21:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 [[INDEX]] -; TFA_INTERLEAVE-NEXT: [[TMP22:%.*]] = call i64 @llvm.vscale.i64() -; TFA_INTERLEAVE-NEXT: [[TMP23:%.*]] = mul i64 [[TMP22]], 2 -; TFA_INTERLEAVE-NEXT: [[TMP24:%.*]] = getelementptr inbounds i64, ptr [[TMP21]], i64 [[TMP23]] -; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI]], ptr [[TMP21]], i32 8, [[ACTIVE_LANE_MASK]]) -; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI4]], ptr [[TMP24]], i32 8, [[ACTIVE_LANE_MASK2]]) +; TFA_INTERLEAVE-NEXT: [[PREDPHI:%.*]] = select [[TMP13]], [[TMP15]], zeroinitializer +; TFA_INTERLEAVE-NEXT: [[PREDPHI4:%.*]] = select [[TMP14]], [[TMP16]], zeroinitializer +; TFA_INTERLEAVE-NEXT: [[TMP17:%.*]] = getelementptr inbounds i64, ptr [[B:%.*]], i64 [[INDEX]] +; TFA_INTERLEAVE-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64() +; TFA_INTERLEAVE-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 2 +; TFA_INTERLEAVE-NEXT: [[TMP20:%.*]] = getelementptr inbounds i64, ptr [[TMP17]], i64 [[TMP19]] +; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI]], ptr [[TMP17]], i32 8, [[ACTIVE_LANE_MASK]]) +; TFA_INTERLEAVE-NEXT: call void @llvm.masked.store.nxv2i64.p0( [[PREDPHI4]], ptr [[TMP20]], i32 8, [[ACTIVE_LANE_MASK2]]) ; TFA_INTERLEAVE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP4]] -; TFA_INTERLEAVE-NEXT: [[TMP25:%.*]] = call i64 @llvm.vscale.i64() -; TFA_INTERLEAVE-NEXT: [[TMP26:%.*]] = mul i64 [[TMP25]], 2 -; TFA_INTERLEAVE-NEXT: [[TMP27:%.*]] = add i64 [[INDEX_NEXT]], [[TMP26]] +; TFA_INTERLEAVE-NEXT: [[TMP21:%.*]] = call i64 @llvm.vscale.i64() +; TFA_INTERLEAVE-NEXT: [[TMP22:%.*]] = mul i64 [[TMP21]], 2 +; TFA_INTERLEAVE-NEXT: [[TMP23:%.*]] = add i64 [[INDEX_NEXT]], [[TMP22]] ; TFA_INTERLEAVE-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[INDEX_NEXT]], i64 1025) -; TFA_INTERLEAVE-NEXT: [[ACTIVE_LANE_MASK_NEXT5]] = call @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[TMP27]], i64 1025) -; TFA_INTERLEAVE-NEXT: [[TMP28:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; TFA_INTERLEAVE-NEXT: [[TMP29:%.*]] = extractelement [[TMP28]], i32 0 -; TFA_INTERLEAVE-NEXT: br i1 [[TMP29]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] +; TFA_INTERLEAVE-NEXT: [[ACTIVE_LANE_MASK_NEXT5]] = call @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[TMP23]], i64 1025) +; TFA_INTERLEAVE-NEXT: [[TMP24:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) +; TFA_INTERLEAVE-NEXT: [[TMP25:%.*]] = extractelement [[TMP24]], i32 0 +; TFA_INTERLEAVE-NEXT: br i1 [[TMP25]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] ; TFA_INTERLEAVE: for.cond.cleanup: ; TFA_INTERLEAVE-NEXT: ret void ; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll index f922873210b05..66d001498e457 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll @@ -1216,7 +1216,7 @@ define float @fadd_conditional(ptr noalias nocapture readonly %a, ptr noalias no ; CHECK-ORDERED-TF: vector.body: ; CHECK-ORDERED-TF-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] ; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK:%.*]] = phi [ [[ACTIVE_LANE_MASK_ENTRY]], [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-ORDERED-TF-NEXT: [[VEC_PHI:%.*]] = phi float [ 1.000000e+00, [[VECTOR_PH]] ], [ [[TMP20:%.*]], [[VECTOR_BODY]] ] +; CHECK-ORDERED-TF-NEXT: [[VEC_PHI:%.*]] = phi float [ 1.000000e+00, [[VECTOR_PH]] ], [ [[TMP18:%.*]], [[VECTOR_BODY]] ] ; CHECK-ORDERED-TF-NEXT: [[TMP10:%.*]] = add i64 [[INDEX]], 0 ; CHECK-ORDERED-TF-NEXT: [[TMP11:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[TMP10]] ; CHECK-ORDERED-TF-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i32 0 @@ -1226,41 +1226,39 @@ define float @fadd_conditional(ptr noalias nocapture readonly %a, ptr noalias no ; CHECK-ORDERED-TF-NEXT: [[TMP15:%.*]] = getelementptr float, ptr [[A]], i64 [[TMP10]] ; CHECK-ORDERED-TF-NEXT: [[TMP16:%.*]] = getelementptr float, ptr [[TMP15]], i32 0 ; CHECK-ORDERED-TF-NEXT: [[WIDE_MASKED_LOAD1:%.*]] = call @llvm.masked.load.nxv4f32.p0(ptr [[TMP16]], i32 4, [[TMP14]], poison) -; CHECK-ORDERED-TF-NEXT: [[TMP17:%.*]] = xor [[TMP13]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; CHECK-ORDERED-TF-NEXT: [[TMP18:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP17]], zeroinitializer -; CHECK-ORDERED-TF-NEXT: [[PREDPHI:%.*]] = select [[TMP18]], shufflevector ( insertelement ( poison, float 3.000000e+00, i64 0), poison, zeroinitializer), [[WIDE_MASKED_LOAD1]] -; CHECK-ORDERED-TF-NEXT: [[TMP19:%.*]] = select [[ACTIVE_LANE_MASK]], [[PREDPHI]], shufflevector ( insertelement ( poison, float -0.000000e+00, i64 0), poison, zeroinitializer) -; CHECK-ORDERED-TF-NEXT: [[TMP20]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], [[TMP19]]) +; CHECK-ORDERED-TF-NEXT: [[PREDPHI:%.*]] = select [[TMP14]], [[WIDE_MASKED_LOAD1]], shufflevector ( insertelement ( poison, float 3.000000e+00, i64 0), poison, zeroinitializer) +; CHECK-ORDERED-TF-NEXT: [[TMP17:%.*]] = select [[ACTIVE_LANE_MASK]], [[PREDPHI]], shufflevector ( insertelement ( poison, float -0.000000e+00, i64 0), poison, zeroinitializer) +; CHECK-ORDERED-TF-NEXT: [[TMP18]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], [[TMP17]]) ; CHECK-ORDERED-TF-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP4]] ; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[INDEX]], i64 [[TMP9]]) -; CHECK-ORDERED-TF-NEXT: [[TMP21:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; CHECK-ORDERED-TF-NEXT: [[TMP22:%.*]] = extractelement [[TMP21]], i32 0 -; CHECK-ORDERED-TF-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] +; CHECK-ORDERED-TF-NEXT: [[TMP19:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) +; CHECK-ORDERED-TF-NEXT: [[TMP20:%.*]] = extractelement [[TMP19]], i32 0 +; CHECK-ORDERED-TF-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]] ; CHECK-ORDERED-TF: middle.block: ; CHECK-ORDERED-TF-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] ; CHECK-ORDERED-TF: scalar.ph: ; CHECK-ORDERED-TF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] -; CHECK-ORDERED-TF-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP20]], [[MIDDLE_BLOCK]] ], [ 1.000000e+00, [[ENTRY]] ] +; CHECK-ORDERED-TF-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP18]], [[MIDDLE_BLOCK]] ], [ 1.000000e+00, [[ENTRY]] ] ; CHECK-ORDERED-TF-NEXT: br label [[FOR_BODY:%.*]] ; CHECK-ORDERED-TF: for.body: ; CHECK-ORDERED-TF-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_INC:%.*]] ] ; CHECK-ORDERED-TF-NEXT: [[RES:%.*]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[FADD:%.*]], [[FOR_INC]] ] ; CHECK-ORDERED-TF-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[IV]] -; CHECK-ORDERED-TF-NEXT: [[TMP23:%.*]] = load float, ptr [[ARRAYIDX]], align 4 -; CHECK-ORDERED-TF-NEXT: [[TOBOOL:%.*]] = fcmp une float [[TMP23]], 0.000000e+00 +; CHECK-ORDERED-TF-NEXT: [[TMP21:%.*]] = load float, ptr [[ARRAYIDX]], align 4 +; CHECK-ORDERED-TF-NEXT: [[TOBOOL:%.*]] = fcmp une float [[TMP21]], 0.000000e+00 ; CHECK-ORDERED-TF-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]] ; CHECK-ORDERED-TF: if.then: ; CHECK-ORDERED-TF-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IV]] -; CHECK-ORDERED-TF-NEXT: [[TMP24:%.*]] = load float, ptr [[ARRAYIDX2]], align 4 +; CHECK-ORDERED-TF-NEXT: [[TMP22:%.*]] = load float, ptr [[ARRAYIDX2]], align 4 ; CHECK-ORDERED-TF-NEXT: br label [[FOR_INC]] ; CHECK-ORDERED-TF: for.inc: -; CHECK-ORDERED-TF-NEXT: [[PHI:%.*]] = phi float [ [[TMP24]], [[IF_THEN]] ], [ 3.000000e+00, [[FOR_BODY]] ] +; CHECK-ORDERED-TF-NEXT: [[PHI:%.*]] = phi float [ [[TMP22]], [[IF_THEN]] ], [ 3.000000e+00, [[FOR_BODY]] ] ; CHECK-ORDERED-TF-NEXT: [[FADD]] = fadd float [[RES]], [[PHI]] ; CHECK-ORDERED-TF-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1 ; CHECK-ORDERED-TF-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]] ; CHECK-ORDERED-TF-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]] ; CHECK-ORDERED-TF: for.end: -; CHECK-ORDERED-TF-NEXT: [[RDX:%.*]] = phi float [ [[FADD]], [[FOR_INC]] ], [ [[TMP20]], [[MIDDLE_BLOCK]] ] +; CHECK-ORDERED-TF-NEXT: [[RDX:%.*]] = phi float [ [[FADD]], [[FOR_INC]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ] ; CHECK-ORDERED-TF-NEXT: ret float [[RDX]] ; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll index f6a6d021f03c9..6fa1e7fbbac60 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll @@ -467,16 +467,15 @@ define void @cond_uniform_load(ptr noalias %dst, ptr noalias readonly %src, ptr ; CHECK-NEXT: [[TMP14:%.*]] = xor [[TMP13]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) ; CHECK-NEXT: [[TMP15:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP14]], zeroinitializer ; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call @llvm.masked.gather.nxv4i32.nxv4p0( [[BROADCAST_SPLAT]], i32 4, [[TMP15]], poison) -; CHECK-NEXT: [[TMP16:%.*]] = select [[ACTIVE_LANE_MASK]], [[TMP13]], zeroinitializer -; CHECK-NEXT: [[PREDPHI:%.*]] = select [[TMP16]], zeroinitializer, [[WIDE_MASKED_GATHER]] -; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP10]] -; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i32 0 -; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0( [[PREDPHI]], ptr [[TMP18]], i32 4, [[ACTIVE_LANE_MASK]]) +; CHECK-NEXT: [[PREDPHI:%.*]] = select [[TMP15]], [[WIDE_MASKED_GATHER]], zeroinitializer +; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP10]] +; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0 +; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0( [[PREDPHI]], ptr [[TMP17]], i32 4, [[ACTIVE_LANE_MASK]]) ; CHECK-NEXT: [[INDEX_NEXT2]] = add i64 [[INDEX1]], [[TMP4]] ; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[INDEX1]], i64 [[TMP9]]) -; CHECK-NEXT: [[TMP19:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) -; CHECK-NEXT: [[TMP20:%.*]] = extractelement [[TMP19]], i32 0 -; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] +; CHECK-NEXT: [[TMP18:%.*]] = xor [[ACTIVE_LANE_MASK_NEXT]], shufflevector ( insertelement ( poison, i1 true, i64 0), poison, zeroinitializer) +; CHECK-NEXT: [[TMP19:%.*]] = extractelement [[TMP18]], i32 0 +; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -485,14 +484,14 @@ define void @cond_uniform_load(ptr noalias %dst, ptr noalias readonly %src, ptr ; CHECK: for.body: ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[COND]], i64 [[INDEX]] -; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX]], align 4 -; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP21]], 0 +; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP20]], 0 ; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END]], label [[IF_THEN:%.*]] ; CHECK: if.then: -; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[SRC]], align 4 +; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[SRC]], align 4 ; CHECK-NEXT: br label [[IF_END]] ; CHECK: if.end: -; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP22]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ] +; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP21]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ] ; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[INDEX]] ; CHECK-NEXT: store i32 [[VAL_0]], ptr [[ARRAYIDX1]], align 4 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 diff --git a/llvm/test/Transforms/LoopVectorize/if-conversion-nest.ll b/llvm/test/Transforms/LoopVectorize/if-conversion-nest.ll index d19ca172a8c0a..8b0c99b353c8b 100644 --- a/llvm/test/Transforms/LoopVectorize/if-conversion-nest.ll +++ b/llvm/test/Transforms/LoopVectorize/if-conversion-nest.ll @@ -33,18 +33,15 @@ define i32 @foo(ptr nocapture %A, ptr nocapture %B, i32 %n) { ; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]] ; CHECK-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP6]], align 4, !alias.scope [[META3]] ; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD2]] -; CHECK-NEXT: [[TMP8:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], -; CHECK-NEXT: [[TMP9:%.*]] = xor <4 x i1> [[TMP8]], -; CHECK-NEXT: [[TMP10:%.*]] = and <4 x i1> [[TMP7]], [[TMP9]] -; CHECK-NEXT: [[TMP11:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD2]], -; CHECK-NEXT: [[TMP12:%.*]] = select <4 x i1> [[TMP11]], <4 x i32> , <4 x i32> -; CHECK-NEXT: [[TMP13:%.*]] = and <4 x i1> [[TMP7]], [[TMP8]] -; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP13]], <4 x i32> , <4 x i32> -; CHECK-NEXT: [[PREDPHI3:%.*]] = select <4 x i1> [[TMP10]], <4 x i32> [[TMP12]], <4 x i32> [[PREDPHI]] +; CHECK-NEXT: [[TMP8:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], +; CHECK-NEXT: [[TMP9:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD2]], +; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[TMP9]], <4 x i32> , <4 x i32> +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP8]], <4 x i32> [[TMP10]], <4 x i32> +; CHECK-NEXT: [[PREDPHI3:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> [[PREDPHI]], <4 x i32> ; CHECK-NEXT: store <4 x i32> [[PREDPHI3]], ptr [[TMP5]], align 4, !alias.scope [[META0]], !noalias [[META3]] ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 -; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] -; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP0]] ; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]] @@ -54,16 +51,16 @@ define i32 @foo(ptr nocapture %A, ptr nocapture %B, i32 %n) { ; CHECK: for.body: ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[IF_END14:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[ARRAYIDX]], align 4 ; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]] -; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4 -; CHECK-NEXT: [[CMP3:%.*]] = icmp sgt i32 [[TMP15]], [[TMP16]] +; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4 +; CHECK-NEXT: [[CMP3:%.*]] = icmp sgt i32 [[TMP12]], [[TMP13]] ; CHECK-NEXT: br i1 [[CMP3]], label [[IF_THEN:%.*]], label [[IF_END14]] ; CHECK: if.then: -; CHECK-NEXT: [[CMP6:%.*]] = icmp sgt i32 [[TMP15]], 19 +; CHECK-NEXT: [[CMP6:%.*]] = icmp sgt i32 [[TMP12]], 19 ; CHECK-NEXT: br i1 [[CMP6]], label [[IF_END14]], label [[IF_ELSE:%.*]] ; CHECK: if.else: -; CHECK-NEXT: [[CMP10:%.*]] = icmp slt i32 [[TMP16]], 4 +; CHECK-NEXT: [[CMP10:%.*]] = icmp slt i32 [[TMP13]], 4 ; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP10]], i32 4, i32 5 ; CHECK-NEXT: br label [[IF_END14]] ; CHECK: if.end14: @@ -112,3 +109,122 @@ for.end: ret i32 undef } +; As above but with multiple variables set per block. +define i32 @multi_variable_if_nest(ptr nocapture %A, ptr nocapture %B, i32 %n) { +; CHECK-LABEL: @multi_variable_if_nest( +; CHECK-NEXT: entry: +; CHECK-NEXT: [[CMP26:%.*]] = icmp sgt i32 [[N:%.*]], 0 +; CHECK-NEXT: br i1 [[CMP26]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]] +; CHECK: for.body.preheader: +; CHECK-NEXT: [[TMP0:%.*]] = zext nneg i32 [[N]] to i64 +; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4 +; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]] +; CHECK: vector.memcheck: +; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[N]], -1 +; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64 +; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 2 +; CHECK-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP3]], 4 +; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[TMP4]] +; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[TMP4]] +; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[A]], [[SCEVGEP1]] +; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[B]], [[SCEVGEP]] +; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]] +; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP0]], 2147483644 +; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] +; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDEX]] +; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP5]], align 4, !alias.scope [[META9:![0-9]+]], !noalias [[META12:![0-9]+]] +; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]] +; CHECK-NEXT: [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP6]], align 4, !alias.scope [[META12]] +; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD2]] +; CHECK-NEXT: [[TMP8:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], +; CHECK-NEXT: [[TMP9:%.*]] = xor <4 x i1> [[TMP8]], +; CHECK-NEXT: [[TMP10:%.*]] = and <4 x i1> [[TMP7]], [[TMP9]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD2]], +; CHECK-NEXT: [[TMP12:%.*]] = select <4 x i1> [[TMP11]], <4 x i32> , <4 x i32> +; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP11]], <4 x i32> , <4 x i32> +; CHECK-NEXT: [[TMP14:%.*]] = and <4 x i1> [[TMP7]], [[TMP8]] +; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP14]], <4 x i32> , <4 x i32> +; CHECK-NEXT: [[PREDPHI3:%.*]] = select <4 x i1> [[TMP10]], <4 x i32> [[TMP12]], <4 x i32> [[PREDPHI]] +; CHECK-NEXT: [[PREDPHI4:%.*]] = select <4 x i1> [[TMP14]], <4 x i32> , <4 x i32> +; CHECK-NEXT: [[PREDPHI5:%.*]] = select <4 x i1> [[TMP10]], <4 x i32> [[TMP13]], <4 x i32> [[PREDPHI4]] +; CHECK-NEXT: store <4 x i32> [[PREDPHI3]], ptr [[TMP5]], align 4, !alias.scope [[META9]], !noalias [[META12]] +; CHECK-NEXT: store <4 x i32> [[PREDPHI5]], ptr [[TMP6]], align 4, !alias.scope [[META12]] +; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 +; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] +; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]] +; CHECK: middle.block: +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP0]] +; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]] +; CHECK: scalar.ph: +; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ] +; CHECK-NEXT: br label [[FOR_BODY:%.*]] +; CHECK: for.body: +; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[IF_END14:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ] +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV]] +; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX]], align 4 +; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]] +; CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4 +; CHECK-NEXT: [[CMP3:%.*]] = icmp sgt i32 [[TMP16]], [[TMP17]] +; CHECK-NEXT: br i1 [[CMP3]], label [[IF_THEN:%.*]], label [[IF_END14]] +; CHECK: if.then: +; CHECK-NEXT: [[CMP6:%.*]] = icmp sgt i32 [[TMP16]], 19 +; CHECK-NEXT: br i1 [[CMP6]], label [[IF_END14]], label [[IF_ELSE:%.*]] +; CHECK: if.else: +; CHECK-NEXT: [[CMP10:%.*]] = icmp slt i32 [[TMP17]], 4 +; CHECK-NEXT: [[X_ELSE:%.*]] = select i1 [[CMP10]], i32 4, i32 5 +; CHECK-NEXT: [[Y_ELSE:%.*]] = select i1 [[CMP10]], i32 6, i32 11 +; CHECK-NEXT: br label [[IF_END14]] +; CHECK: if.end14: +; CHECK-NEXT: [[X_0:%.*]] = phi i32 [ 9, [[FOR_BODY]] ], [ 3, [[IF_THEN]] ], [ [[X_ELSE]], [[IF_ELSE]] ] +; CHECK-NEXT: [[Y_0:%.*]] = phi i32 [ 18, [[FOR_BODY]] ], [ 7, [[IF_THEN]] ], [ [[Y_ELSE]], [[IF_ELSE]] ] +; CHECK-NEXT: store i32 [[X_0]], ptr [[ARRAYIDX]], align 4 +; CHECK-NEXT: store i32 [[Y_0]], ptr [[ARRAYIDX2]], align 4 +; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1 +; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32 +; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[N]], [[LFTR_WIDEIV]] +; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]] +; CHECK: for.end.loopexit: +; CHECK-NEXT: br label [[FOR_END]] +; CHECK: for.end: +; CHECK-NEXT: ret i32 undef +; +entry: + %cmp26 = icmp sgt i32 %n, 0 + br i1 %cmp26, label %for.body, label %for.end + +for.body: + %indvars.iv = phi i64 [ %indvars.iv.next, %if.end14 ], [ 0, %entry ] + %arrayidx = getelementptr inbounds i32, ptr %A, i64 %indvars.iv + %0 = load i32, ptr %arrayidx, align 4 + %arrayidx2 = getelementptr inbounds i32, ptr %B, i64 %indvars.iv + %1 = load i32, ptr %arrayidx2, align 4 + %cmp3 = icmp sgt i32 %0, %1 + br i1 %cmp3, label %if.then, label %if.end14 + +if.then: + %cmp6 = icmp sgt i32 %0, 19 + br i1 %cmp6, label %if.end14, label %if.else + +if.else: + %cmp10 = icmp slt i32 %1, 4 + %x.else = select i1 %cmp10, i32 4, i32 5 + %y.else = select i1 %cmp10, i32 6, i32 11 + br label %if.end14 + +if.end14: + %x.0 = phi i32 [ 9, %for.body ], [ 3, %if.then ], [ %x.else, %if.else ] ; <------------- A PHI with 3 entries that we can still vectorize. + %y.0 = phi i32 [ 18, %for.body ], [ 7, %if.then ], [ %y.else, %if.else ] ; <------------- A PHI with 3 entries that we can still vectorize. + store i32 %x.0, ptr %arrayidx, align 4 + store i32 %y.0, ptr %arrayidx2, align 4 + %indvars.iv.next = add i64 %indvars.iv, 1 + %lftr.wideiv = trunc i64 %indvars.iv.next to i32 + %exitcond = icmp eq i32 %lftr.wideiv, %n + br i1 %exitcond, label %for.end, label %for.body + +for.end: + ret i32 undef +} diff --git a/llvm/test/Transforms/LoopVectorize/if-reduction.ll b/llvm/test/Transforms/LoopVectorize/if-reduction.ll index e9761a60fd6eb..0d5871e24c524 100644 --- a/llvm/test/Transforms/LoopVectorize/if-reduction.ll +++ b/llvm/test/Transforms/LoopVectorize/if-reduction.ll @@ -678,9 +678,8 @@ for.end: ; preds = %for.inc, %entry ; CHECK-DAG: %[[C21:.*]] = xor <4 x i1> %[[C2]], ; CHECK-DAG: %[[ADD:.*]] = fadd fast <4 x float> -; CHECK-DAG: %[[C12:.*]] = select <4 x i1> %[[C11]], <4 x i1> %[[C2]], <4 x i1> zeroinitializer -; CHECK: %[[C22:.*]] = select <4 x i1> %[[C11]], <4 x i1> %[[C21]], <4 x i1> zeroinitializer -; CHECK: %[[S1:.*]] = select <4 x i1> %[[C12]], <4 x float> %[[SUB]], <4 x float> %[[ADD]] +; CHECK-DAG: %[[C22:.*]] = select <4 x i1> %[[C11]], <4 x i1> %[[C21]], <4 x i1> zeroinitializer +; CHECK: %[[S1:.*]] = select <4 x i1> %[[C1]], <4 x float> %[[ADD]], <4 x float> %[[SUB]] ; CHECK: %[[S2:.*]] = select <4 x i1> %[[C22]], {{.*}} <4 x float> %[[S1]] define float @fcmp_fadd_fsub(ptr nocapture readonly %a, i32 %n) nounwind readonly { entry: diff --git a/llvm/test/Transforms/LoopVectorize/phi-cost.ll b/llvm/test/Transforms/LoopVectorize/phi-cost.ll index e571b624ed194..8d407c969b527 100644 --- a/llvm/test/Transforms/LoopVectorize/phi-cost.ll +++ b/llvm/test/Transforms/LoopVectorize/phi-cost.ll @@ -49,8 +49,8 @@ for.end: ; CHECK: define void @phi_three_incoming_values( ; CHECK: vector.body: ; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.*]], %vector.body ] -; CHECK: [[PREDPHI:%.*]] = select <2 x i1> {{.*}}, <2 x i32> , <2 x i32> -; CHECK: [[PREDPHI7:%.*]] = select <2 x i1> {{.*}}, <2 x i32> {{.*}}, <2 x i32> [[PREDPHI]] +; CHECK: [[PREDPHI:%.*]] = select <2 x i1> {{.*}}, <2 x i32> {{.*}}, <2 x i32> +; CHECK: [[PREDPHI7:%.*]] = select <2 x i1> {{.*}}, <2 x i32> [[PREDPHI]], <2 x i32> ; CHECK: store <2 x i32> [[PREDPHI7]], ptr {{.*}} ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2 ; diff --git a/llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll b/llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll index c50bcf8ae88f5..2e111332ef6c4 100644 --- a/llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll +++ b/llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll @@ -587,9 +587,7 @@ define i64 @nested_cond_and(ptr noalias nocapture readonly %a, ptr noalias nocap ; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]] ; CHECK: pred.load.continue14: ; CHECK-NEXT: [[TMP46:%.*]] = phi <4 x i64> [ [[TMP41]], [[PRED_LOAD_CONTINUE12]] ], [ [[TMP45]], [[PRED_LOAD_IF13]] ] -; CHECK-NEXT: [[TMP47:%.*]] = xor <4 x i1> [[TMP25]], -; CHECK-NEXT: [[TMP48:%.*]] = select <4 x i1> [[TMP4]], <4 x i1> [[TMP47]], <4 x i1> zeroinitializer -; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP48]], <4 x i64> [[TMP24]], <4 x i64> [[TMP46]] +; CHECK-NEXT: [[PREDPHI_V:%.*]] = select <4 x i1> [[TMP26]], <4 x i64> [[TMP46]], <4 x i64> [[TMP24]] ; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x i64> [[PREDPHI_V]], <4 x i64> ; CHECK-NEXT: [[PREDPHI15]] = and <4 x i64> [[VEC_PHI]], [[PREDPHI]] ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 diff --git a/llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll b/llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll index 8ee12cc2241c3..6407583061e60 100644 --- a/llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll +++ b/llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll @@ -111,17 +111,16 @@ define void @single_incoming_phi_with_blend_mask(i64 %a, i64 %b) { ; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i16>, ptr [[TMP5]], align 1 ; CHECK-NEXT: [[TMP6:%.*]] = icmp sgt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]] ; CHECK-NEXT: [[TMP7:%.*]] = select <2 x i1> [[TMP3]], <2 x i1> [[TMP6]], <2 x i1> zeroinitializer -; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i1> [[TMP6]], -; CHECK-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP3]], <2 x i1> [[TMP8]], <2 x i1> zeroinitializer -; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP9]], <2 x i16> [[WIDE_LOAD]], <2 x i16> zeroinitializer +; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i1> [[TMP3]], +; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP8]], <2 x i16> zeroinitializer, <2 x i16> [[WIDE_LOAD]] ; CHECK-NEXT: [[PREDPHI1:%.*]] = select <2 x i1> [[TMP7]], <2 x i16> , <2 x i16> [[PREDPHI]] -; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [32 x i16], ptr @dst, i16 0, i64 [[TMP0]] -; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i16, ptr [[TMP10]], i32 0 -; CHECK-NEXT: store <2 x i16> [[PREDPHI1]], ptr [[TMP11]], align 2 +; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds [32 x i16], ptr @dst, i16 0, i64 [[TMP0]] +; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i16, ptr [[TMP9]], i32 0 +; CHECK-NEXT: store <2 x i16> [[PREDPHI1]], ptr [[TMP10]], align 2 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2 ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], -; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32 -; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] +; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32 +; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: @@ -304,17 +303,16 @@ define void @single_incoming_needs_predication(i64 %a, i64 %b) { ; CHECK-NEXT: [[TMP14:%.*]] = phi <2 x i16> [ [[TMP8]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF1]] ] ; CHECK-NEXT: [[TMP15:%.*]] = icmp sgt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]] ; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP2]], <2 x i1> [[TMP15]], <2 x i1> zeroinitializer -; CHECK-NEXT: [[TMP17:%.*]] = xor <2 x i1> [[TMP15]], -; CHECK-NEXT: [[TMP18:%.*]] = select <2 x i1> [[TMP2]], <2 x i1> [[TMP17]], <2 x i1> zeroinitializer -; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP18]], <2 x i16> [[TMP14]], <2 x i16> zeroinitializer +; CHECK-NEXT: [[TMP17:%.*]] = xor <2 x i1> [[TMP2]], +; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP17]], <2 x i16> zeroinitializer, <2 x i16> [[TMP14]] ; CHECK-NEXT: [[PREDPHI3:%.*]] = select <2 x i1> [[TMP16]], <2 x i16> , <2 x i16> [[PREDPHI]] -; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds [32 x i16], ptr @dst, i16 0, i64 [[TMP0]] -; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i16, ptr [[TMP19]], i32 0 -; CHECK-NEXT: store <2 x i16> [[PREDPHI3]], ptr [[TMP20]], align 2 +; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds [32 x i16], ptr @dst, i16 0, i64 [[TMP0]] +; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[TMP18]], i32 0 +; CHECK-NEXT: store <2 x i16> [[PREDPHI3]], ptr [[TMP19]], align 2 ; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2 ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], -; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], 64 -; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]] +; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], 64 +; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]] ; CHECK: middle.block: ; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]] ; CHECK: scalar.ph: diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll index 059e4c38b519b..6fbd05aaedfe5 100644 --- a/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll +++ b/llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll @@ -597,6 +597,690 @@ entry: ret <4 x float> %vecins.3 } +declare float @cosf(float) readonly nounwind willreturn + +; We can not vectorized cos cosce RISCV has no such instruction. +define <4 x float> @cos_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @cos_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @cos_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @cosf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @cosf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @cosf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @cosf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @cosf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @cosf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.cos.f32(float) + +; We can not vectorized cos cosce RISCV has no such instruction. +define <4 x float> @int_cos_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_cos_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_cos_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.cos.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.cos.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.cos.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.cos.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @acosf(float) readonly nounwind willreturn + +; We can not vectorized acos cosce RISCV has no such instruction. +define <4 x float> @acos_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @acos_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @acosf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @acosf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @acosf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @acos_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @acosf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @acosf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @acosf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @acosf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @acosf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @acosf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @acosf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @acosf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.acos.f32(float) + +; We can not vectorized acos cosce RISCV has no such instruction. +define <4 x float> @int_acos_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_acos_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_acos_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.acos.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.acos.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.acos.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.acos.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.acos.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @tanf(float) readonly nounwind willreturn + +; We can not vectorized tan tance RISCV has no such instruction. +define <4 x float> @tan_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @tan_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @tanf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @tanf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @tanf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @tan_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @tanf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @tanf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @tanf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @tanf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @tanf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @tanf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @tanf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @tanf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.tan.f32(float) + +; We can not vectorized tan tance RISCV has no such instruction. +define <4 x float> @int_tan_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_tan_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_tan_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.tan.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.tan.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.tan.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.tan.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.tan.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @atanf(float) readonly nounwind willreturn + +; We can not vectorized atan tance RISCV has no such instruction. +define <4 x float> @atan_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @atan_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @atanf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @atanf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @atanf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @atan_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @atanf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @atanf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @atanf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @atanf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @atanf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @atanf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @atanf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @atanf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.atan.f32(float) + +; We can not vectorized atan tance RISCV has no such instruction. +define <4 x float> @int_atan_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_atan_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_atan_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.atan.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.atan.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.atan.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.atan.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.atan.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @sinhf(float) readonly nounwind willreturn + +; We can not vectorized sinh since RISCV has no such instruction. +define <4 x float> @sinh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @sinh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @sinhf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @sinhf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @sinhf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @sinh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @sinhf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @sinhf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @sinhf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @sinhf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @sinhf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @sinhf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @sinhf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @sinhf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.sinh.f32(float) + +; We can not vectorized sinh since RISCV has no such instruction. +define <4 x float> @int_sinh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_sinh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_sinh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sinh.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.sinh.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.sinh.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.sinh.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.sinh.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @asinhf(float) readonly nounwind willreturn + +; We can not vectorized asinh since RISCV has no such instruction. +define <4 x float> @asinh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @asinh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @asinhf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @asinhf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @asinhf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @asinh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @asinhf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @asinhf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @asinhf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @asinhf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @asinhf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @asinhf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @asinhf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @asinhf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.asinh.f32(float) + +; We can not vectorized asinh since RISCV has no such instruction. +define <4 x float> @int_asinh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_asinh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_asinh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.asinh.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.asinh.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.asinh.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.asinh.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.asinh.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + declare float @coshf(float) readonly nounwind willreturn ; We can not vectorized cosh since RISCV has no such instruction. @@ -711,6 +1395,234 @@ entry: ret <4 x float> %vecins.3 } +declare float @acoshf(float) readonly nounwind willreturn + +; We can not vectorized acosh since RISCV has no such instruction. +define <4 x float> @acosh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @acosh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @acoshf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @acoshf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @acoshf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @acosh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @acoshf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @acoshf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @acoshf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @acoshf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @acoshf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @acoshf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @acoshf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @acoshf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.acosh.f32(float) + +; We can not vectorized acosh since RISCV has no such instruction. +define <4 x float> @int_acosh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_acosh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_acosh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.acosh.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.acosh.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.acosh.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.acosh.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.acosh.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @tanhf(float) readonly nounwind willreturn + +; We can not vectorized tanh since RISCV has no such instruction. +define <4 x float> @tanh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @tanh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @tanhf(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @tanhf(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @tanhf(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @tanh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @tanhf(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @tanhf(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @tanhf(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @tanhf(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @tanhf(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @tanhf(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @tanhf(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @tanhf(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + +declare float @llvm.tanh.f32(float) + +; We can not vectorized tanh since RISCV has no such instruction. +define <4 x float> @int_tanh_4x(ptr %a) { +; CHECK-LABEL: define <4 x float> @int_tanh_4x +; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; CHECK-NEXT: entry: +; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT]]) +; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_1]]) +; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_2]]) +; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_3]]) +; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; CHECK-NEXT: ret <4 x float> [[VECINS_3]] +; +; DEFAULT-LABEL: define <4 x float> @int_tanh_4x +; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] { +; DEFAULT-NEXT: entry: +; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16 +; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0 +; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT]]) +; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0 +; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1 +; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_1]]) +; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1 +; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2 +; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_2]]) +; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2 +; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3 +; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.tanh.f32(float [[VECEXT_3]]) +; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3 +; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]] +; +entry: + %0 = load <4 x float>, ptr %a, align 16 + %vecext = extractelement <4 x float> %0, i32 0 + %1 = tail call fast float @llvm.tanh.f32(float %vecext) + %vecins = insertelement <4 x float> undef, float %1, i32 0 + %vecext.1 = extractelement <4 x float> %0, i32 1 + %2 = tail call fast float @llvm.tanh.f32(float %vecext.1) + %vecins.1 = insertelement <4 x float> %vecins, float %2, i32 1 + %vecext.2 = extractelement <4 x float> %0, i32 2 + %3 = tail call fast float @llvm.tanh.f32(float %vecext.2) + %vecins.2 = insertelement <4 x float> %vecins.1, float %3, i32 2 + %vecext.3 = extractelement <4 x float> %0, i32 3 + %4 = tail call fast float @llvm.tanh.f32(float %vecext.3) + %vecins.3 = insertelement <4 x float> %vecins.2, float %4, i32 3 + ret <4 x float> %vecins.3 +} + declare float @atanhf(float) readonly nounwind willreturn ; We can not vectorized atanh since RISCV has no such instruction. diff --git a/llvm/test/Transforms/SLPVectorizer/resized-alt-shuffle-after-minbw.ll b/llvm/test/Transforms/SLPVectorizer/resized-alt-shuffle-after-minbw.ll new file mode 100644 index 0000000000000..56281424c7114 --- /dev/null +++ b/llvm/test/Transforms/SLPVectorizer/resized-alt-shuffle-after-minbw.ll @@ -0,0 +1,208 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S --passes=slp-vectorizer -slp-vectorize-hor=false < %s | FileCheck %s + +define void @func(i32 %0) { +; CHECK-LABEL: define void @func( +; CHECK-SAME: i32 [[TMP0:%.*]]) { +; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> , i32 [[TMP0]], i32 1 +; CHECK-NEXT: [[TMP3:%.*]] = shl <4 x i32> [[TMP2]], zeroinitializer +; CHECK-NEXT: [[TMP4:%.*]] = or <4 x i32> [[TMP2]], zeroinitializer +; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> +; CHECK-NEXT: [[TMP6:%.*]] = shl i32 [[TMP0]], 0 +; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[TMP6]], 0 +; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <32 x i32> +; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64 +; CHECK-NEXT: [[TMP10:%.*]] = or i64 [[TMP9]], 0 +; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <32 x i32> +; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <32 x i32> [[TMP11]], <32 x i32> , <32 x i32> +; CHECK-NEXT: [[TMP13:%.*]] = insertelement <32 x i32> [[TMP12]], i32 0, i32 0 +; CHECK-NEXT: [[TMP14:%.*]] = call <32 x i32> @llvm.vector.insert.v32i32.v8i32(<32 x i32> [[TMP13]], <8 x i32> zeroinitializer, i64 16) +; CHECK-NEXT: [[TMP15:%.*]] = call <32 x i32> @llvm.vector.insert.v32i32.v4i32(<32 x i32> [[TMP14]], <4 x i32> zeroinitializer, i64 24) +; CHECK-NEXT: [[TMP16:%.*]] = call <32 x i32> @llvm.vector.insert.v32i32.v2i32(<32 x i32> [[TMP15]], <2 x i32> zeroinitializer, i64 14) +; CHECK-NEXT: [[TMP17:%.*]] = call <32 x i32> @llvm.vector.insert.v32i32.v2i32(<32 x i32> [[TMP16]], <2 x i32> zeroinitializer, i64 28) +; CHECK-NEXT: [[TMP18:%.*]] = or <32 x i32> [[TMP8]], [[TMP17]] +; CHECK-NEXT: [[TMP19:%.*]] = sext <32 x i32> [[TMP18]] to <32 x i64> +; CHECK-NEXT: [[TMP20:%.*]] = icmp slt <32 x i64> [[TMP19]], zeroinitializer +; CHECK-NEXT: [[TMP21:%.*]] = extractelement <32 x i1> [[TMP20]], i32 31 +; CHECK-NEXT: [[TMP22:%.*]] = and i1 false, [[TMP21]] +; CHECK-NEXT: [[TMP23:%.*]] = extractelement <32 x i1> [[TMP20]], i32 30 +; CHECK-NEXT: [[TMP24:%.*]] = and i1 false, [[TMP23]] +; CHECK-NEXT: [[TMP25:%.*]] = extractelement <32 x i1> [[TMP20]], i32 29 +; CHECK-NEXT: [[TMP26:%.*]] = and i1 false, [[TMP25]] +; CHECK-NEXT: [[TMP27:%.*]] = extractelement <32 x i1> [[TMP20]], i32 28 +; CHECK-NEXT: [[TMP28:%.*]] = and i1 false, [[TMP27]] +; CHECK-NEXT: [[TMP29:%.*]] = extractelement <32 x i1> [[TMP20]], i32 27 +; CHECK-NEXT: [[TMP30:%.*]] = and i1 false, [[TMP29]] +; CHECK-NEXT: [[TMP31:%.*]] = extractelement <32 x i1> [[TMP20]], i32 26 +; CHECK-NEXT: [[TMP32:%.*]] = and i1 false, [[TMP31]] +; CHECK-NEXT: [[TMP33:%.*]] = extractelement <32 x i1> [[TMP20]], i32 25 +; CHECK-NEXT: [[TMP34:%.*]] = and i1 false, [[TMP33]] +; CHECK-NEXT: [[TMP35:%.*]] = extractelement <32 x i1> [[TMP20]], i32 24 +; CHECK-NEXT: [[TMP36:%.*]] = and i1 false, [[TMP35]] +; CHECK-NEXT: [[TMP37:%.*]] = extractelement <32 x i1> [[TMP20]], i32 23 +; CHECK-NEXT: [[TMP38:%.*]] = and i1 false, [[TMP37]] +; CHECK-NEXT: [[TMP39:%.*]] = extractelement <32 x i1> [[TMP20]], i32 22 +; CHECK-NEXT: [[TMP40:%.*]] = and i1 false, [[TMP39]] +; CHECK-NEXT: [[TMP41:%.*]] = extractelement <32 x i1> [[TMP20]], i32 21 +; CHECK-NEXT: [[TMP42:%.*]] = and i1 false, [[TMP41]] +; CHECK-NEXT: [[TMP43:%.*]] = extractelement <32 x i1> [[TMP20]], i32 20 +; CHECK-NEXT: [[TMP44:%.*]] = and i1 false, [[TMP43]] +; CHECK-NEXT: [[TMP45:%.*]] = extractelement <32 x i1> [[TMP20]], i32 19 +; CHECK-NEXT: [[TMP46:%.*]] = and i1 false, [[TMP45]] +; CHECK-NEXT: [[TMP47:%.*]] = extractelement <32 x i1> [[TMP20]], i32 18 +; CHECK-NEXT: [[TMP48:%.*]] = and i1 false, [[TMP47]] +; CHECK-NEXT: [[TMP49:%.*]] = extractelement <32 x i1> [[TMP20]], i32 17 +; CHECK-NEXT: [[TMP50:%.*]] = and i1 false, [[TMP49]] +; CHECK-NEXT: [[TMP51:%.*]] = extractelement <32 x i1> [[TMP20]], i32 16 +; CHECK-NEXT: [[TMP52:%.*]] = and i1 false, [[TMP51]] +; CHECK-NEXT: [[TMP53:%.*]] = extractelement <32 x i1> [[TMP20]], i32 15 +; CHECK-NEXT: [[TMP54:%.*]] = and i1 false, [[TMP53]] +; CHECK-NEXT: [[TMP55:%.*]] = extractelement <32 x i1> [[TMP20]], i32 14 +; CHECK-NEXT: [[TMP56:%.*]] = and i1 false, [[TMP55]] +; CHECK-NEXT: [[TMP57:%.*]] = extractelement <32 x i1> [[TMP20]], i32 13 +; CHECK-NEXT: [[TMP58:%.*]] = and i1 false, [[TMP57]] +; CHECK-NEXT: [[TMP59:%.*]] = extractelement <32 x i1> [[TMP20]], i32 12 +; CHECK-NEXT: [[TMP60:%.*]] = and i1 false, [[TMP59]] +; CHECK-NEXT: [[TMP61:%.*]] = extractelement <32 x i1> [[TMP20]], i32 11 +; CHECK-NEXT: [[TMP62:%.*]] = and i1 false, [[TMP61]] +; CHECK-NEXT: [[TMP63:%.*]] = extractelement <32 x i1> [[TMP20]], i32 10 +; CHECK-NEXT: [[TMP64:%.*]] = and i1 false, [[TMP63]] +; CHECK-NEXT: [[TMP65:%.*]] = extractelement <32 x i1> [[TMP20]], i32 9 +; CHECK-NEXT: [[TMP66:%.*]] = and i1 false, [[TMP65]] +; CHECK-NEXT: [[TMP67:%.*]] = extractelement <32 x i1> [[TMP20]], i32 8 +; CHECK-NEXT: [[TMP68:%.*]] = and i1 false, [[TMP67]] +; CHECK-NEXT: [[TMP69:%.*]] = extractelement <32 x i1> [[TMP20]], i32 7 +; CHECK-NEXT: [[TMP70:%.*]] = and i1 false, [[TMP69]] +; CHECK-NEXT: [[TMP71:%.*]] = extractelement <32 x i1> [[TMP20]], i32 6 +; CHECK-NEXT: [[TMP72:%.*]] = and i1 false, [[TMP71]] +; CHECK-NEXT: [[TMP73:%.*]] = extractelement <32 x i1> [[TMP20]], i32 5 +; CHECK-NEXT: [[TMP74:%.*]] = and i1 false, [[TMP73]] +; CHECK-NEXT: [[TMP75:%.*]] = extractelement <32 x i1> [[TMP20]], i32 4 +; CHECK-NEXT: [[TMP76:%.*]] = and i1 false, [[TMP75]] +; CHECK-NEXT: [[TMP77:%.*]] = extractelement <32 x i32> [[TMP18]], i32 0 +; CHECK-NEXT: [[TMP78:%.*]] = sext i32 [[TMP77]] to i64 +; CHECK-NEXT: [[TMP79:%.*]] = getelementptr float, ptr addrspace(1) null, i64 [[TMP78]] +; CHECK-NEXT: ret void +; + %2 = shl i32 %0, 0 + %3 = sext i32 %2 to i64 + %4 = shl i32 0, 0 + %5 = sext i32 %4 to i64 + %6 = or i32 0, 0 + %7 = or i32 0, 0 + %8 = zext i32 %6 to i64 + %9 = zext i32 %7 to i64 + %10 = zext i32 0 to i64 + %11 = zext i32 0 to i64 + %12 = zext i32 0 to i64 + %13 = zext i32 0 to i64 + %14 = zext i32 0 to i64 + %15 = zext i32 0 to i64 + %16 = zext i32 0 to i64 + %17 = zext i32 0 to i64 + %18 = zext i32 0 to i64 + %19 = zext i32 0 to i64 + %20 = zext i32 0 to i64 + %21 = zext i32 0 to i64 + %22 = zext i32 0 to i64 + %23 = zext i32 0 to i64 + %24 = zext i32 0 to i64 + %25 = zext i32 0 to i64 + %26 = zext i32 0 to i64 + %27 = or i64 %3, 0 + %28 = or i64 %3, %8 + %29 = or i64 %3, %9 + %30 = or i64 %3, %10 + %31 = or i64 %3, %11 + %32 = or i64 %3, %12 + %33 = or i64 %3, %13 + %34 = or i64 %3, %14 + %35 = or i64 %3, %15 + %36 = or i64 %3, %16 + %37 = or i64 %3, %17 + %38 = or i64 %3, %18 + %39 = or i64 %3, %19 + %40 = or i64 %3, %20 + %41 = or i64 %3, %21 + %42 = or i64 %3, %22 + %43 = or i64 %3, %23 + %44 = or i64 %3, %24 + %45 = or i64 %3, %25 + %46 = or i64 %3, 0 + %47 = or i64 %3, 0 + %48 = or i64 %3, 0 + %49 = or i64 %3, 0 + %50 = or i64 %3, 0 + %51 = or i64 %3, 0 + %52 = or i64 %3, 0 + %53 = or i64 %3, 0 + %54 = or i64 %3, 0 + %55 = or i64 %3, 0 + %56 = or i64 %3, 0 + %57 = or i64 %3, 0 + %58 = or i64 %3, 0 + %59 = icmp slt i64 %28, 0 + %60 = icmp slt i64 %29, 0 + %61 = icmp slt i64 %30, 0 + %62 = icmp slt i64 %31, 0 + %63 = icmp slt i64 %32, 0 + %64 = icmp slt i64 %33, 0 + %65 = icmp slt i64 %34, 0 + %66 = icmp slt i64 %35, 0 + %67 = icmp slt i64 %36, 0 + %68 = icmp slt i64 %37, 0 + %69 = icmp slt i64 %38, 0 + %70 = icmp slt i64 %39, 0 + %71 = icmp slt i64 %40, 0 + %72 = icmp slt i64 %41, 0 + %73 = icmp slt i64 %42, 0 + %74 = icmp slt i64 %43, 0 + %75 = icmp slt i64 %44, 0 + %76 = icmp slt i64 %45, 0 + %77 = icmp slt i64 %46, 0 + %78 = icmp slt i64 %47, 0 + %79 = icmp slt i64 %48, 0 + %80 = icmp slt i64 %49, 0 + %81 = icmp slt i64 %50, 0 + %82 = icmp slt i64 %51, 0 + %83 = icmp slt i64 %52, 0 + %84 = icmp slt i64 %53, 0 + %85 = icmp slt i64 %54, 0 + %86 = icmp slt i64 %55, 0 + %87 = icmp slt i64 %56, 0 + %88 = icmp slt i64 %57, 0 + %89 = icmp slt i64 %58, 0 + %90 = and i1 false, %59 + %91 = and i1 false, %60 + %92 = and i1 false, %61 + %93 = and i1 false, %62 + %94 = and i1 false, %63 + %95 = and i1 false, %64 + %96 = and i1 false, %65 + %97 = and i1 false, %66 + %98 = and i1 false, %67 + %99 = and i1 false, %68 + %100 = and i1 false, %69 + %101 = and i1 false, %70 + %102 = and i1 false, %71 + %103 = and i1 false, %72 + %104 = and i1 false, %73 + %105 = and i1 false, %74 + %106 = and i1 false, %75 + %107 = and i1 false, %76 + %108 = icmp eq i32 %2, 0 + %109 = and i1 false, %77 + %110 = and i1 false, %78 + %111 = and i1 false, %79 + %112 = and i1 false, %80 + %113 = and i1 false, %81 + %114 = and i1 false, %82 + %115 = and i1 false, %83 + %116 = and i1 false, %84 + %117 = and i1 false, %85 + %118 = and i1 false, %86 + %119 = or i64 %5, %26 + %120 = getelementptr float, ptr addrspace(1) null, i64 %119 + %121 = icmp slt i64 %119, 0 + ret void +} diff --git a/llvm/test/Verifier/rtsan-attrs.ll b/llvm/test/Verifier/rtsan-attrs.ll index 42ab85163642b..fcc44d8d63c1d 100644 --- a/llvm/test/Verifier/rtsan-attrs.ll +++ b/llvm/test/Verifier/rtsan-attrs.ll @@ -1,9 +1,9 @@ ; RUN: not llvm-as -disable-output %s 2>&1 | FileCheck %s -; CHECK: Attributes 'sanitize_realtime and nosanitize_realtime' are incompatible! -; CHECK-NEXT: ptr @sanitize_nosanitize -define void @sanitize_nosanitize() #0 { +; CHECK: Attributes 'sanitize_realtime and sanitize_realtime_unsafe' are incompatible! +; CHECK-NEXT: ptr @sanitize_unsafe +define void @sanitize_unsafe() #0 { ret void } -attributes #0 = { sanitize_realtime nosanitize_realtime } +attributes #0 = { sanitize_realtime sanitize_realtime_unsafe } diff --git a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll index 708b5a006be60..d269f92763853 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll @@ -1,24 +1,30 @@ -; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=0 +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s ; 3 kernels: ; - A does a direct call to HelperA ; - B is storing @HelperA ; - C does a direct call to HelperA ; -; The helper functions will get externalized, so C/A will end up -; in the same partition. - -; P0 is empty. -; CHECK0: declare - -; CHECK1: define amdgpu_kernel void @B(ptr %dst) - -; CHECK2: define hidden void @HelperA() -; CHECK2: define amdgpu_kernel void @A() -; CHECK2: define amdgpu_kernel void @C() +; The helper functions will get externalized, which will force A and C into P0 as +; external functions cannot be duplicated. + +; CHECK0: define hidden void @HelperA() +; CHECK0: define amdgpu_kernel void @A() +; CHECK0: declare amdgpu_kernel void @B(ptr) +; CHECK0: define amdgpu_kernel void @C() + +; CHECK1: declare hidden void @HelperA() +; CHECK1: declare amdgpu_kernel void @A() +; CHECK1: declare amdgpu_kernel void @B(ptr) +; CHECK1: declare amdgpu_kernel void @C() + +; CHECK2: declare hidden void @HelperA() +; CHECK2: declare amdgpu_kernel void @A() +; CHECK2: define amdgpu_kernel void @B(ptr %dst) +; CHECK2: declare amdgpu_kernel void @C() define internal void @HelperA() { ret void diff --git a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll index 81f6c8f0fbb3a..731cf4b374c95 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize.ll @@ -1,4 +1,4 @@ -; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 +; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=0 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s diff --git a/llvm/test/tools/llvm-split/AMDGPU/debug-name-hiding.ll b/llvm/test/tools/llvm-split/AMDGPU/debug-name-hiding.ll new file mode 100644 index 0000000000000..6a07ed51ba1be --- /dev/null +++ b/llvm/test/tools/llvm-split/AMDGPU/debug-name-hiding.ll @@ -0,0 +1,20 @@ +; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -debug -amdgpu-module-splitting-log-private 2>&1 | FileCheck %s --implicit-check-not=MyCustomKernel +; REQUIRES: asserts + +; SHA256 of the kernel names. + +; CHECK: a097723d21cf9f35d90e6fb7881995ac8c398b3366a6c97efc657404f9fe301c +; CHECK: 626bc23242de8fcfda7f0e66318d29455c081df6b5380e64d14703c95fcbcd59 +; CHECK: c38d90a7ca71dc5d694bb9e093dadcdedfc4cb4adf7ed7e46d42fe95a0b4ef55 + +define amdgpu_kernel void @MyCustomKernel0() { + ret void +} + +define amdgpu_kernel void @MyCustomKernel1() { + ret void +} + +define amdgpu_kernel void @MyCustomKernel2() { + ret void +} diff --git a/llvm/test/tools/llvm-split/AMDGPU/debug-non-kernel-root.ll b/llvm/test/tools/llvm-split/AMDGPU/debug-non-kernel-root.ll new file mode 100644 index 0000000000000..836b5c05d0653 --- /dev/null +++ b/llvm/test/tools/llvm-split/AMDGPU/debug-non-kernel-root.ll @@ -0,0 +1,36 @@ +; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -debug 2>&1 | FileCheck %s --implicit-check-not="[root]" +; REQUIRES: asserts + +; func_3 is never directly called, it needs to be considered +; as a root to handle this module correctly. + +; CHECK: [root] kernel_1 +; CHECK-NEXT: [dependency] func_1 +; CHECK-NEXT: [dependency] func_2 +; CHECK-NEXT: [root] func_3 +; CHECK-NEXT: [dependency] func_2 + +define amdgpu_kernel void @kernel_1() { +entry: + call void @func_1() + ret void +} + +define linkonce_odr hidden void @func_1() { +entry: + %call = call i32 @func_2() + ret void +} + +define linkonce_odr hidden i32 @func_2() #0 { +entry: + ret i32 0 +} + +define void @func_3() { +entry: + %call = call i32 @func_2() + ret void +} + +attributes #0 = { noinline optnone } diff --git a/llvm/test/tools/llvm-split/AMDGPU/declarations.ll b/llvm/test/tools/llvm-split/AMDGPU/declarations.ll index 755676061b255..10b6cdfef4055 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/declarations.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/declarations.ll @@ -1,13 +1,16 @@ ; RUN: rm -rf %t0 %t1 ; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s -; RUN: not llvm-dis -o - %t1 +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s -; Empty module without any defs should result in a single output module that is -; an exact copy of the input. +; Check that all declarations are put into each partition. ; CHECK0: declare void @A ; CHECK0: declare void @B +; CHECK1: declare void @A +; CHECK1: declare void @B + declare void @A() + declare void @B() diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll index d7e84abd5f968..c2746d1398924 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll @@ -1,6 +1,6 @@ ; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s ; 3 kernels: ; - A calls nothing @@ -13,12 +13,16 @@ ; Additionally, @PerryThePlatypus gets externalized as ; the alias counts as taking its address. -; CHECK0: define amdgpu_kernel void @A +; CHECK0-NOT: define +; CHECK0: @Perry = internal alias ptr (), ptr @PerryThePlatypus +; CHECK0: define hidden void @PerryThePlatypus() +; CHECK0: define amdgpu_kernel void @B +; CHECK0: define amdgpu_kernel void @C +; CHECK0-NOT: define -; CHECK1: @Perry = internal alias ptr (), ptr @PerryThePlatypus -; CHECK1: define hidden void @PerryThePlatypus() -; CHECK1: define amdgpu_kernel void @B -; CHECK1: define amdgpu_kernel void @C +; CHECK1-NOT: define +; CHECK1: define amdgpu_kernel void @A +; CHECK1-NOT: define @Perry = internal alias ptr(), ptr @PerryThePlatypus diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll index c7e13304dc6de..4635264aefb39 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-cost-ranking.ll @@ -1,21 +1,27 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s ; 3 kernels with each their own dependencies should go into 3 ; distinct partitions. The most expensive kernel should be ; seen first and go into the last partition. +; CHECK0-NOT: define ; CHECK0: define amdgpu_kernel void @C ; CHECK0: define internal void @HelperC ; CHECK0-NOT: define +; CHECK1-NOT: define ; CHECK1: define amdgpu_kernel void @A ; CHECK1: define internal void @HelperA +; CHECK1-NOT: define +; CHECK2-NOT: define ; CHECK2: define amdgpu_kernel void @B ; CHECK2: define internal void @HelperB +; CHECK2-NOT: define + define amdgpu_kernel void @A() { call void @HelperA() diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll index 332344a776e82..435e97a581340 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-external.ll @@ -1,20 +1,29 @@ ; RUN: llvm-split -o %t %s -j 4 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t3 | FileCheck --check-prefix=CHECK3 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s +; RUN: llvm-dis -o - %t3 | FileCheck --check-prefix=CHECK3 %s -; CHECK0: define internal void @PrivateHelper1() -; CHECK0: define amdgpu_kernel void @D +; Both overridable helper should go in P0. -; CHECK1: define internal void @PrivateHelper0() -; CHECK1: define amdgpu_kernel void @C +; CHECK0-NOT: define +; CHECK0: define available_externally void @OverridableHelper0() +; CHECK0: define internal void @OverridableHelper1() +; CHECK0: define amdgpu_kernel void @A +; CHECK0: define amdgpu_kernel void @B +; CHECK0-NOT: define -; CHECK2: define internal void @OverridableHelper1() -; CHECK2: define amdgpu_kernel void @B +; CHECK1-NOT: define -; CHECK3: define available_externally void @OverridableHelper0() -; CHECK3: define amdgpu_kernel void @A +; CHECK2-NOT: define +; CHECK2: define internal void @PrivateHelper1() +; CHECK2: define amdgpu_kernel void @D +; CHECK2-NOT: define + +; CHECK3-NOT: define +; CHECK3: define internal void @PrivateHelper0() +; CHECK3: define amdgpu_kernel void @C +; CHECK3-NOT: define define available_externally void @OverridableHelper0() { ret void diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll index 5be945bda48bf..2d870039112cb 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-indirect.ll @@ -1,7 +1,7 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s ; We have 4 kernels: ; - Each kernel has an internal helper @@ -15,19 +15,25 @@ ; indirect call. HelperC/D should also end up in P0 as they ; are dependencies of HelperB. +; CHECK0-NOT: define +; CHECK0: define hidden void @HelperA +; CHECK0: define hidden void @HelperB +; CHECK0: define hidden void @CallCandidate +; CHECK0: define internal void @HelperC ; CHECK0: define internal void @HelperD -; CHECK0: define amdgpu_kernel void @D +; CHECK0: define amdgpu_kernel void @A +; CHECK0: define amdgpu_kernel void @B +; CHECK0-NOT: define -; CHECK1: define internal void @HelperC -; CHECK1: define amdgpu_kernel void @C +; CHECK1-NOT: define +; CHECK1: define internal void @HelperD +; CHECK1: define amdgpu_kernel void @D +; CHECK1-NOT: define -; CHECK2: define hidden void @HelperA -; CHECK2: define hidden void @HelperB -; CHECK2: define hidden void @CallCandidate +; CHECK2-NOT: define ; CHECK2: define internal void @HelperC -; CHECK2: define internal void @HelperD -; CHECK2: define amdgpu_kernel void @A -; CHECK2: define amdgpu_kernel void @B +; CHECK2: define amdgpu_kernel void @C +; CHECK2-NOT: define @addrthief = global [3 x ptr] [ptr @HelperA, ptr @HelperB, ptr @CallCandidate] diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll index 9205a5d1930e5..dc2c5c3c07bee 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-dependency-overridable.ll @@ -1,15 +1,21 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s - -; CHECK0: define amdgpu_kernel void @D - -; CHECK1: define amdgpu_kernel void @C - -; CHECK2: define void @ExternalHelper -; CHECK2: define amdgpu_kernel void @A -; CHECK2: define amdgpu_kernel void @B +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s + +; CHECK0-NOT: define +; CHECK0: define void @ExternalHelper +; CHECK0: define amdgpu_kernel void @A +; CHECK0: define amdgpu_kernel void @B +; CHECK0-NOT: define + +; CHECK1-NOT: define +; CHECK1: define amdgpu_kernel void @D +; CHECK1-NOT: define + +; CHECK2-NOT: define +; CHECK2: define amdgpu_kernel void @C +; CHECK2-NOT: define define void @ExternalHelper() { ret void diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll index a184d92aea9b9..0fc76934afc54 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables-noexternal.ll @@ -1,20 +1,26 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-globals -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s ; 3 kernels use private/internal global variables. ; The GVs should be copied in each partition as needed. +; CHECK0-NOT: define ; CHECK0: @bar = internal constant ptr ; CHECK0: define amdgpu_kernel void @C +; CHECK0-NOT: define +; CHECK1-NOT: define ; CHECK1: @foo = private constant ptr ; CHECK1: define amdgpu_kernel void @A +; CHECK1-NOT: define +; CHECK2-NOT: define ; CHECK2: @foo = private constant ptr ; CHECK2: @bar = internal constant ptr ; CHECK2: define amdgpu_kernel void @B +; CHECK2-NOT: define @foo = private constant ptr poison @bar = internal constant ptr poison diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll index be84a0b5916f0..7564662e7c7c0 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-global-variables.ll @@ -1,22 +1,28 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s ; 3 kernels use private/internal global variables. ; The GVs should be copied in each partition as needed. +; CHECK0-NOT: define ; CHECK0: @foo = hidden constant ptr poison ; CHECK0: @bar = hidden constant ptr poison ; CHECK0: define amdgpu_kernel void @C +; CHECK0-NOT: define +; CHECK1-NOT: define ; CHECK1: @foo = external hidden constant ptr{{$}} ; CHECK1: @bar = external hidden constant ptr{{$}} ; CHECK1: define amdgpu_kernel void @A +; CHECK1-NOT: define +; CHECK2-NOT: define ; CHECK2: @foo = external hidden constant ptr{{$}} ; CHECK2: @bar = external hidden constant ptr{{$}} ; CHECK2: define amdgpu_kernel void @B +; CHECK2-NOT: define @foo = private constant ptr poison @bar = internal constant ptr poison diff --git a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll index 807fb2e5f33ce..459c5a7f1a2db 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll @@ -1,12 +1,12 @@ -; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5 -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=1.2 -amdgpu-module-splitting-large-function-merge-overlap=0.5 +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 %s -; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -amdgpu-module-splitting-max-depth=0 -; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t.nolarge1 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t.nolarge2 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK2 --implicit-check-not=define %s +; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-function-threshold=0 +; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 %s +; RUN: llvm-dis -o - %t.nolarge1 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK1 %s +; RUN: llvm-dis -o - %t.nolarge2 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK2 %s ; 2 kernels (A/B) are large and share all their dependencies. ; They should go in the same partition, the remaining kernel should @@ -15,12 +15,14 @@ ; Also check w/o large kernels processing to verify they are indeed handled ; differently. -; P0 is empty -; CHECK0: declare +; CHECK0-NOT: define +; CHECK1-NOT: define ; CHECK1: define internal void @HelperC() ; CHECK1: define amdgpu_kernel void @C +; CHECK1-NOT: define +; CHECK2-NOT: define ; CHECK2: define internal void @large2() ; CHECK2: define internal void @large1() ; CHECK2: define internal void @large0() @@ -28,9 +30,12 @@ ; CHECK2: define internal void @HelperB() ; CHECK2: define amdgpu_kernel void @A ; CHECK2: define amdgpu_kernel void @B +; CHECK2-NOT: define +; NOLARGEKERNELS-CHECK0-NOT: define ; NOLARGEKERNELS-CHECK0: define internal void @HelperC() ; NOLARGEKERNELS-CHECK0: define amdgpu_kernel void @C +; NOLARGEKERNELS-CHECK0-NOT: define ; NOLARGEKERNELS-CHECK1: define internal void @large2() ; NOLARGEKERNELS-CHECK1: define internal void @large1() @@ -44,7 +49,6 @@ ; NOLARGEKERNELS-CHECK2: define internal void @HelperA() ; NOLARGEKERNELS-CHECK2: define amdgpu_kernel void @A - define internal void @large2() { store volatile i32 42, ptr null call void @large2() diff --git a/llvm/test/tools/llvm-split/AMDGPU/non-kernels-dependency-indirect.ll b/llvm/test/tools/llvm-split/AMDGPU/non-kernels-dependency-indirect.ll index 1314a78b42f3b..167930ce0e806 100644 --- a/llvm/test/tools/llvm-split/AMDGPU/non-kernels-dependency-indirect.ll +++ b/llvm/test/tools/llvm-split/AMDGPU/non-kernels-dependency-indirect.ll @@ -1,7 +1,7 @@ ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s +; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=DEFINE %s +; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=DEFINE %s +; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=DEFINE %s ; We have 4 function: ; - Each function has an internal helper @@ -11,19 +11,19 @@ ; @CallCandidate doesn't have to be in A/B's partition, unlike ; in the corresponding tests for kernels where it has to. +; CHECK0: define hidden void @HelperA +; CHECK0: define hidden void @HelperB ; CHECK0: define internal void @HelperC ; CHECK0: define internal void @HelperD -; CHECK0: define internal void @C -; CHECK0: define internal void @D +; CHECK0: define void @A +; CHECK0: define void @B -; CHECK1: define hidden void @HelperA -; CHECK1: define hidden void @CallCandidate() -; CHECK1: define internal void @A +; CHECK1: define internal void @HelperD +; CHECK1: define void @D -; CHECK2: define hidden void @HelperB +; CHECK2: define hidden void @CallCandidate ; CHECK2: define internal void @HelperC -; CHECK2: define internal void @HelperD -; CHECK2: define internal void @B +; CHECK2: define void @C @addrthief = global [3 x ptr] [ptr @HelperA, ptr @HelperB, ptr @CallCandidate] @@ -51,22 +51,22 @@ define internal void @HelperD() { ret void } -define internal void @A(ptr %call) { +define void @A(ptr %call) { call void @HelperA(ptr %call) ret void } -define internal void @B(ptr %call) { +define void @B(ptr %call) { call void @HelperB(ptr %call) ret void } -define internal void @C() { +define void @C() { call void @HelperC() ret void } -define internal void @D() { +define void @D() { call void @HelperD() ret void } diff --git a/llvm/test/tools/llvm-split/AMDGPU/recursive-search-2.ll b/llvm/test/tools/llvm-split/AMDGPU/recursive-search-2.ll deleted file mode 100644 index 01f2f3627f990..0000000000000 --- a/llvm/test/tools/llvm-split/AMDGPU/recursive-search-2.ll +++ /dev/null @@ -1,128 +0,0 @@ -; RUN: llvm-split -o %t_s3_ %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=2 -; RUN: llvm-dis -o - %t_s3_0 | FileCheck --check-prefix=SPLIT3-CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s3_1 | FileCheck --check-prefix=SPLIT3-CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s3_2 | FileCheck --check-prefix=SPLIT3-CHECK2 --implicit-check-not=define %s - -; RUN: llvm-split -o %t_s5_ %s -j 5 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=2 -; RUN: llvm-dis -o - %t_s5_0 | FileCheck --check-prefix=SPLIT5-CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_1 | FileCheck --check-prefix=SPLIT5-CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_2 | FileCheck --check-prefix=SPLIT5-CHECK2 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_3 | FileCheck --check-prefix=SPLIT5-CHECK3 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_4 | FileCheck --check-prefix=SPLIT5-CHECK4 --implicit-check-not=define %s - -; Test the specifics of the search algorithm. -; This test will change depending on new heuristics we add or remove. - -; -------------------------------------------- - -; SPLIT3-CHECK0: define internal void @HelperA() -; SPLIT3-CHECK0: define internal void @HelperB() -; SPLIT3-CHECK0: define internal void @HelperC() -; SPLIT3-CHECK0: define amdgpu_kernel void @AB() -; SPLIT3-CHECK0: define amdgpu_kernel void @BC() - -; SPLIT3-CHECK1: define amdgpu_kernel void @A() -; SPLIT3-CHECK1: define internal void @HelperA() -; SPLIT3-CHECK1: define amdgpu_kernel void @C() -; SPLIT3-CHECK1: define internal void @HelperC() - -; SPLIT3-CHECK2: define internal void @HelperA() -; SPLIT3-CHECK2: define amdgpu_kernel void @B() -; SPLIT3-CHECK2: define internal void @HelperB() -; SPLIT3-CHECK2: define internal void @HelperC() -; SPLIT3-CHECK2: define amdgpu_kernel void @ABC() - -; -------------------------------------------- - -; SPLIT5-CHECK0: define amdgpu_kernel void @A() -; SPLIT5-CHECK0: define internal void @HelperA() -; SPLIT5-CHECK0: define amdgpu_kernel void @B() -; SPLIT5-CHECK0: define internal void @HelperB() - -; SPLIT5-CHECK1: define internal void @HelperB() -; SPLIT5-CHECK1: define internal void @HelperC() -; SPLIT5-CHECK1: define amdgpu_kernel void @BC - -; SPLIT5-CHECK2: define internal void @HelperA() -; SPLIT5-CHECK2: define internal void @HelperB() -; SPLIT5-CHECK2: define amdgpu_kernel void @AB() - -; SPLIT5-CHECK3: define amdgpu_kernel void @C() -; SPLIT5-CHECK3: define internal void @HelperC() - -; SPLIT5-CHECK4: define internal void @HelperA() -; SPLIT5-CHECK4: define internal void @HelperB() -; SPLIT5-CHECK4: define internal void @HelperC() -; SPLIT5-CHECK4: define amdgpu_kernel void @ABC() - -define amdgpu_kernel void @A() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperA() - ret void -} - -define internal void @HelperA() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @B() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperB() - ret void -} - -define internal void @HelperB() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @C() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperC() - ret void -} - -define internal void @HelperC() { - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @AB() { - store volatile i32 42, ptr null - call void @HelperA() - call void @HelperB() - ret void -} - -define amdgpu_kernel void @BC() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - call void @HelperB() - call void @HelperC() - ret void -} - -define amdgpu_kernel void @ABC() { - call void @HelperA() - call void @HelperB() - call void @HelperC() - ret void -} diff --git a/llvm/test/tools/llvm-split/AMDGPU/recursive-search-8.ll b/llvm/test/tools/llvm-split/AMDGPU/recursive-search-8.ll deleted file mode 100644 index eae57a1988310..0000000000000 --- a/llvm/test/tools/llvm-split/AMDGPU/recursive-search-8.ll +++ /dev/null @@ -1,128 +0,0 @@ -; RUN: llvm-split -o %t_s3_ %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=8 -; RUN: llvm-dis -o - %t_s3_0 | FileCheck --check-prefix=SPLIT3-CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s3_1 | FileCheck --check-prefix=SPLIT3-CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s3_2 | FileCheck --check-prefix=SPLIT3-CHECK2 --implicit-check-not=define %s - -; RUN: llvm-split -o %t_s5_ %s -j 5 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=8 -; RUN: llvm-dis -o - %t_s5_0 | FileCheck --check-prefix=SPLIT5-CHECK0 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_1 | FileCheck --check-prefix=SPLIT5-CHECK1 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_2 | FileCheck --check-prefix=SPLIT5-CHECK2 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_3 | FileCheck --check-prefix=SPLIT5-CHECK3 --implicit-check-not=define %s -; RUN: llvm-dis -o - %t_s5_4 | FileCheck --check-prefix=SPLIT5-CHECK4 --implicit-check-not=define %s - -; Test the specifics of the search algorithm. -; This test will change depending on new heuristics we add or remove. - -; -------------------------------------------- - -; SPLIT3-CHECK0: define internal void @HelperA() -; SPLIT3-CHECK0: define internal void @HelperB() -; SPLIT3-CHECK0: define internal void @HelperC() -; SPLIT3-CHECK0: define amdgpu_kernel void @AB() -; SPLIT3-CHECK0: define amdgpu_kernel void @BC() - -; SPLIT3-CHECK1: define amdgpu_kernel void @A() -; SPLIT3-CHECK1: define internal void @HelperA() -; SPLIT3-CHECK1: define amdgpu_kernel void @C() -; SPLIT3-CHECK1: define internal void @HelperC() - -; SPLIT3-CHECK2: define internal void @HelperA() -; SPLIT3-CHECK2: define amdgpu_kernel void @B() -; SPLIT3-CHECK2: define internal void @HelperB() -; SPLIT3-CHECK2: define internal void @HelperC() -; SPLIT3-CHECK2: define amdgpu_kernel void @ABC() - -; -------------------------------------------- - -; SPLIT5-CHECK0: define amdgpu_kernel void @A() -; SPLIT5-CHECK0: define internal void @HelperA() -; SPLIT5-CHECK0: define amdgpu_kernel void @B() -; SPLIT5-CHECK0: define internal void @HelperB() - -; SPLIT5-CHECK1: define internal void @HelperB() -; SPLIT5-CHECK1: define internal void @HelperC() -; SPLIT5-CHECK1: define amdgpu_kernel void @BC - -; SPLIT5-CHECK2: define internal void @HelperA() -; SPLIT5-CHECK2: define internal void @HelperB() -; SPLIT5-CHECK2: define amdgpu_kernel void @AB() - -; SPLIT5-CHECK3: define amdgpu_kernel void @C() -; SPLIT5-CHECK3: define internal void @HelperC() - -; SPLIT5-CHECK4: define internal void @HelperA() -; SPLIT5-CHECK4: define internal void @HelperB() -; SPLIT5-CHECK4: define internal void @HelperC() -; SPLIT5-CHECK4: define amdgpu_kernel void @ABC() - -define amdgpu_kernel void @A() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperA() - ret void -} - -define internal void @HelperA() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @B() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperB() - ret void -} - -define internal void @HelperB() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @C() { - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - store volatile i64 42, ptr null - call void @HelperC() - ret void -} - -define internal void @HelperC() { - store volatile i32 42, ptr null - ret void -} - -define amdgpu_kernel void @AB() { - store volatile i32 42, ptr null - call void @HelperA() - call void @HelperB() - ret void -} - -define amdgpu_kernel void @BC() { - store volatile i32 42, ptr null - store volatile i32 42, ptr null - call void @HelperB() - call void @HelperC() - ret void -} - -define amdgpu_kernel void @ABC() { - call void @HelperA() - call void @HelperB() - call void @HelperC() - ret void -} diff --git a/llvm/tools/dsymutil/dsymutil.cpp b/llvm/tools/dsymutil/dsymutil.cpp index 728f2ed3e62ac..364a7d63d486e 100644 --- a/llvm/tools/dsymutil/dsymutil.cpp +++ b/llvm/tools/dsymutil/dsymutil.cpp @@ -835,7 +835,7 @@ int dsymutil_main(int argc, char **argv, const llvm::ToolContext &) { if (Crashed) (*Repro)->generate(); - if (!AllOK) + if (!AllOK || Crashed) return EXIT_FAILURE; if (NeedsTempFiles) { diff --git a/llvm/unittests/IR/BasicBlockDbgInfoTest.cpp b/llvm/unittests/IR/BasicBlockDbgInfoTest.cpp index 5615a4493d20a..5ce14d3f6b9ce 100644 --- a/llvm/unittests/IR/BasicBlockDbgInfoTest.cpp +++ b/llvm/unittests/IR/BasicBlockDbgInfoTest.cpp @@ -1569,14 +1569,12 @@ TEST(BasicBlockDbgInfoTest, CloneTrailingRecordsToEmptyBlock) { // The trailing records should've been absorbed into NewBB. EXPECT_FALSE(BB.getTrailingDbgRecords()); EXPECT_TRUE(NewBB->getTrailingDbgRecords()); - if (NewBB->getTrailingDbgRecords()) { - EXPECT_EQ( - llvm::range_size(NewBB->getTrailingDbgRecords()->getDbgRecordRange()), - 1u); + if (DbgMarker *Trailing = NewBB->getTrailingDbgRecords()) { + EXPECT_EQ(llvm::range_size(Trailing->getDbgRecordRange()), 1u); + // Drop the trailing records now, to prevent a cleanup assertion. + Trailing->eraseFromParent(); + NewBB->deleteTrailingDbgRecords(); } - - // Drop the trailing records now, to prevent a cleanup assertion. - NewBB->deleteTrailingDbgRecords(); } } // End anonymous namespace. diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td index 4d48b3de7a57e..709dd922b8fa2 100644 --- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td +++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td @@ -19,6 +19,7 @@ include "mlir/Dialect/LLVMIR/LLVMOpBase.td" include "mlir/Interfaces/SideEffectInterfaces.td" include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td" +def LLVM_PointerGeneric : LLVM_PointerInAddressSpace<0>; def LLVM_PointerGlobal : LLVM_PointerInAddressSpace<1>; def LLVM_PointerShared : LLVM_PointerInAddressSpace<3>; @@ -531,8 +532,10 @@ def ProxyAlias : I32EnumAttrCase<"alias", 0, "alias">; def ProxyAsync : I32EnumAttrCase<"async", 1, "async">; def ProxyAsyncGlobal : I32EnumAttrCase<"async_global", 2, "async.global">; def ProxyAsyncShared : I32EnumAttrCase<"async_shared", 3, "async.shared">; +def ProxyTensorMap : I32EnumAttrCase<"TENSORMAP", 4, "tensormap">; +def ProxyGeneric : I32EnumAttrCase<"GENERIC", 5, "generic">; def ProxyKind : I32EnumAttr<"ProxyKind", "Proxy kind", - [ProxyAlias, ProxyAsync, ProxyAsyncGlobal, ProxyAsyncShared]> { + [ProxyAlias, ProxyAsync, ProxyAsyncGlobal, ProxyAsyncShared, ProxyTensorMap, ProxyGeneric]> { let genSpecializedAttr = 0; let cppNamespace = "::mlir::NVVM"; } @@ -565,6 +568,80 @@ def NVVM_FenceProxyOp : NVVM_PTXBuilder_Op<"fence.proxy">, let hasVerifier = 1; } +// Attrs describing the scope of the Memory Operation +def MemScopeKindCTA : I32EnumAttrCase<"CTA", 0, "cta">; +def MemScopeKindCluster : I32EnumAttrCase<"CLUSTER", 1, "cluster">; +def MemScopeKindGPU : I32EnumAttrCase<"GPU", 2, "gpu">; +def MemScopeKindSYS : I32EnumAttrCase<"SYS", 3, "sys">; + +def MemScopeKind : I32EnumAttr<"MemScopeKind", "NVVM Memory Scope kind", + [MemScopeKindCTA, MemScopeKindCluster, MemScopeKindGPU, MemScopeKindSYS]> { + let genSpecializedAttr = 0; + let cppNamespace = "::mlir::NVVM"; +} +def MemScopeKindAttr : EnumAttr { + let assemblyFormat = "`<` $value `>`"; +} + +def NVVM_FenceProxyAcquireOp : NVVM_Op<"fence.proxy.acquire">, + Arguments<(ins MemScopeKindAttr:$scope, LLVM_PointerGeneric:$addr, I32:$size, + DefaultValuedAttr:$fromProxy, + DefaultValuedAttr:$toProxy)> { + let summary = "Uni-directional proxy fence operation with acquire semantics"; + let description = [{ + `fence.proxy.acquire` is a uni-directional fence used to establish ordering + between a prior memory access performed via the generic proxy and a + subsequent memory access performed via the tensormap proxy + + The address operand `addr` and the operand `size` together specify the + memory range `[addr, addr+size)` on which the ordering guarantees on the + memory accesses across the proxies is to be provided. The only supported + value for the `size` operand is 128 and must be an immediate. Generic Addressing + is used unconditionally, and the address specified by the operand `addr` must + fall within the `.global` state space. Otherwise, the behavior is undefined + [For more information, see PTX ISA] + (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar) + }]; + + let assemblyFormat = "$scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict"; + let llvmBuilder = [{ + createIntrinsicCall( + builder, + getUnidirectionalFenceProxyID($fromProxy, $toProxy, $scope, false), + {$addr, $size}); + }]; + + let hasVerifier = 1; +} + +def NVVM_FenceProxyReleaseOp : NVVM_Op<"fence.proxy.release">, + Arguments<(ins MemScopeKindAttr:$scope, + DefaultValuedAttr:$fromProxy, + DefaultValuedAttr:$toProxy)> { + let summary = "Uni-directional proxy fence operation with release semantics"; + let description = [{ + `fence.proxy.release` is a uni-directional fence used to establish ordering + between a prior memory access performed via the generic proxy and a + subsequent memory access performed via the tensormap proxy. `fence.proxy.release` + operation can form a release sequence that synchronizes with an acquire + sequence that contains the fence.proxy.acquire proxy fence operation + [For more information, see PTX ISA] + (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar) + }]; + + let assemblyFormat = "$scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict"; + let llvmBuilder = [{ + createIntrinsicCall(builder, getUnidirectionalFenceProxyID( + $fromProxy, $toProxy, $scope, true)); + }]; + + let hasVerifier = 1; +} + def SetMaxRegisterActionIncrease : I32EnumAttrCase<"increase", 0>; def SetMaxRegisterActionDecrease : I32EnumAttrCase<"decrease", 1>; def SetMaxRegisterAction : I32EnumAttr<"SetMaxRegisterAction", "NVVM set max register action", diff --git a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp index 4d1896551101e..2c7c3e9d535f7 100644 --- a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp +++ b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp @@ -1004,6 +1004,10 @@ void NVVM::WgmmaMmaAsyncOp::getAsmValues( } } LogicalResult NVVM::FenceProxyOp::verify() { + if (getKind() == NVVM::ProxyKind::TENSORMAP) + return emitOpError() << "tensormap proxy is not a supported proxy kind"; + if (getKind() == NVVM::ProxyKind::GENERIC) + return emitOpError() << "generic proxy not a supported proxy kind"; if (getKind() == NVVM::ProxyKind::async_shared && !getSpace().has_value()) { return emitOpError() << "async_shared fence requires space attribute"; } @@ -1013,6 +1017,30 @@ LogicalResult NVVM::FenceProxyOp::verify() { return success(); } +LogicalResult NVVM::FenceProxyAcquireOp::verify() { + if (getFromProxy() != NVVM::ProxyKind::GENERIC) + return emitOpError("uni-directional proxies only support generic for " + "from_proxy attribute"); + + if (getToProxy() != NVVM::ProxyKind::TENSORMAP) + return emitOpError("uni-directional proxies only support tensormap " + "for to_proxy attribute"); + + return success(); +} + +LogicalResult NVVM::FenceProxyReleaseOp::verify() { + if (getFromProxy() != NVVM::ProxyKind::GENERIC) + return emitOpError("uni-directional proxies only support generic for " + "from_proxy attribute"); + + if (getToProxy() != NVVM::ProxyKind::TENSORMAP) + return emitOpError("uni-directional proxies only support tensormap " + "for to_proxy attribute"); + + return success(); +} + LogicalResult NVVM::SetMaxRegisterOp::verify() { if (getRegCount() % 8) return emitOpError("new register size must be multiple of 8"); diff --git a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp index a09c24dda82af..f93e1cc8780c7 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp @@ -120,6 +120,40 @@ static llvm::Intrinsic::ID getLdMatrixIntrinsicId(NVVM::MMALayout layout, } } +static unsigned getUnidirectionalFenceProxyID(NVVM::ProxyKind fromProxy, + NVVM::ProxyKind toProxy, + NVVM::MemScopeKind scope, + bool isRelease) { + if (fromProxy == NVVM::ProxyKind::GENERIC && + toProxy == NVVM::ProxyKind::TENSORMAP) { + switch (scope) { + case NVVM::MemScopeKind::CTA: { + if (isRelease) + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_release_cta; + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_acquire_cta; + } + case NVVM::MemScopeKind::CLUSTER: { + if (isRelease) + return llvm::Intrinsic:: + nvvm_fence_proxy_tensormap_generic_release_cluster; + return llvm::Intrinsic:: + nvvm_fence_proxy_tensormap_generic_acquire_cluster; + } + case NVVM::MemScopeKind::GPU: { + if (isRelease) + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_release_gpu; + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_acquire_gpu; + } + case NVVM::MemScopeKind::SYS: { + if (isRelease) + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_release_sys; + return llvm::Intrinsic::nvvm_fence_proxy_tensormap_generic_acquire_sys; + } + } + llvm_unreachable("Unknown scope for uni-directional fence.proxy operation"); + } +} + namespace { /// Implementation of the dialect interface that converts operations belonging /// to the NVVM dialect to LLVM IR. diff --git a/mlir/test/Target/LLVMIR/nvvmir-invalid.mlir b/mlir/test/Target/LLVMIR/nvvmir-invalid.mlir new file mode 100644 index 0000000000000..0e563808da970 --- /dev/null +++ b/mlir/test/Target/LLVMIR/nvvmir-invalid.mlir @@ -0,0 +1,33 @@ +// RUN: mlir-translate -verify-diagnostics -split-input-file -mlir-to-llvmir %s + +// ----- + +llvm.func @nvvm_fence_proxy_acquire(%addr : !llvm.ptr, %size : i32) { + // expected-error @below {{'nvvm.fence.proxy.acquire' op uni-directional proxies only support generic for from_proxy attribute}} + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %size from_proxy=#nvvm.proxy_kind to_proxy=#nvvm.proxy_kind + llvm.return +} + +// ----- + +llvm.func @nvvm_fence_proxy_release() { + // expected-error @below {{'nvvm.fence.proxy.release' op uni-directional proxies only support generic for from_proxy attribute}} + nvvm.fence.proxy.release #nvvm.mem_scope from_proxy=#nvvm.proxy_kind to_proxy=#nvvm.proxy_kind + llvm.return +} + +// ----- + +llvm.func @nvvm_fence_proxy_acquire(%addr : !llvm.ptr, %size : i32) { + // expected-error @below {{'nvvm.fence.proxy.acquire' op uni-directional proxies only support tensormap for to_proxy attribute}} + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %size from_proxy=#nvvm.proxy_kind to_proxy=#nvvm.proxy_kind + llvm.return +} + +// ----- + +llvm.func @nvvm_fence_proxy_release() { + // expected-error @below {{'nvvm.fence.proxy.release' op uni-directional proxies only support tensormap for to_proxy attribute}} + nvvm.fence.proxy.release #nvvm.mem_scope from_proxy=#nvvm.proxy_kind to_proxy=#nvvm.proxy_kind + llvm.return +} \ No newline at end of file diff --git a/mlir/test/Target/LLVMIR/nvvmir.mlir b/mlir/test/Target/LLVMIR/nvvmir.mlir index a8ae4d97888c9..6e2787d121ae6 100644 --- a/mlir/test/Target/LLVMIR/nvvmir.mlir +++ b/mlir/test/Target/LLVMIR/nvvmir.mlir @@ -574,3 +574,40 @@ llvm.func @kernel_func(%arg0: !llvm.ptr {llvm.byval = i32, nvvm.grid_constant}) llvm.func @kernel_func(%arg0: !llvm.ptr {llvm.byval = i32, nvvm.grid_constant}, %arg1: f32, %arg2: !llvm.ptr {llvm.byval = f32, nvvm.grid_constant}) attributes {nvvm.kernel} { llvm.return } + + +// ----- +// CHECK-LABEL: @nvvm_fence_proxy_tensormap_generic_release +llvm.func @nvvm_fence_proxy_tensormap_generic_release() { + %c128 = llvm.mlir.constant(128) : i32 + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.release.cta() + nvvm.fence.proxy.release #nvvm.mem_scope + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.release.cluster() + nvvm.fence.proxy.release #nvvm.mem_scope + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.release.gpu() + nvvm.fence.proxy.release #nvvm.mem_scope + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.release.sys() + nvvm.fence.proxy.release #nvvm.mem_scope + llvm.return +} + +// ----- +// CHECK-LABEL: @nvvm_fence_proxy_tensormap_generic_acquire +llvm.func @nvvm_fence_proxy_tensormap_generic_acquire(%addr : !llvm.ptr) { + %c128 = llvm.mlir.constant(128) : i32 + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.acquire.cta(ptr {{%[0-9]+}}, i32 128) + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %c128 + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.acquire.cluster(ptr {{%[0-9]+}}, i32 128) + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %c128 + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.acquire.gpu(ptr {{%[0-9]+}}, i32 128) + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %c128 + + // CHECK: call void @llvm.nvvm.fence.proxy.tensormap_generic.acquire.sys(ptr {{%[0-9]+}}, i32 128) + nvvm.fence.proxy.acquire #nvvm.mem_scope %addr, %c128 + llvm.return +} \ No newline at end of file