Skip to content

Commit d761875

Browse files
kerbowabcahoon
authored andcommitted
[AMDGPU] Add doc updates for kernarg preloading (llvm#67516)
Change-Id: If31308f6e729922236cbd97e7ce000da7e2e6fab
1 parent a6f50e0 commit d761875

File tree

1 file changed

+55
-11
lines changed

1 file changed

+55
-11
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 55 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
361361
``gfx90a`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* *TBA*
362362
- tgsplit flat
363363
- xnack scratch .. TODO::
364-
- Packed
364+
- kernarg preload - Packed
365365
work-item Add product
366366
IDs names.
367367

@@ -382,21 +382,21 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
382382
``gfx940`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
383383
- tgsplit flat
384384
- xnack scratch .. TODO::
385-
- Packed
385+
- kernarg preload - Packed
386386
work-item Add product
387387
IDs names.
388388

389389
``gfx941`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
390390
- tgsplit flat
391391
- xnack scratch .. TODO::
392-
- Packed
392+
- kernarg preload - Packed
393393
work-item Add product
394394
IDs names.
395395

396396
``gfx942`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
397397
- tgsplit flat
398398
- xnack scratch .. TODO::
399-
- Packed
399+
- kernarg preload - Packed
400400
work-item Add product
401401
IDs names.
402402

@@ -4423,12 +4423,24 @@ The fields used by CP for code objects before V3 also match those specified in
44234423
dynamically sized stack.
44244424
This is only set in code
44254425
object v5 and later.
4426-
463:460 1 bit Reserved, must be 0.
4427-
464 1 bit RESERVED_464 Deprecated, must be 0.
4428-
467:465 3 bits Reserved, must be 0.
4429-
468 1 bit RESERVED_468 Deprecated, must be 0.
4430-
469:471 3 bits Reserved, must be 0.
4431-
511:472 5 bytes Reserved, must be 0.
4426+
463:460 4 bits Reserved, must be 0.
4427+
470:464 7 bits KERNARG_PRELOAD_SPEC_LENGTH GFX6-GFX9
4428+
- Reserved, must be 0.
4429+
GFX90A, GFX940
4430+
- The number of dwords from
4431+
the kernarg segment to preload
4432+
into User SGPRs before kernel
4433+
execution. (see
4434+
:ref:`amdgpu-amdhsa-kernarg-preload`).
4435+
479:471 9 bits KERNARG_PRELOAD_SPEC_OFFSET GFX6-GFX9
4436+
- Reserved, must be 0.
4437+
GFX90A, GFX940
4438+
- An offset in dwords into the
4439+
kernarg segment to begin
4440+
preloading data into User
4441+
SGPRs. (see
4442+
:ref:`amdgpu-amdhsa-kernarg-preload`).
4443+
511:480 4 bytes Reserved, must be 0.
44324444
512 **Total size 64 bytes.**
44334445
======= ====================================================================
44344446

@@ -5034,7 +5046,7 @@ for enabled registers are dense starting at SGPR0: the first enabled register is
50345046
SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
50355047
an SGPR number.
50365048

5037-
The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
5049+
The initial SGPRs comprise up to 16 User SGPRs that are set by CP and apply to
50385050
all wavefronts of the grid. It is possible to specify more than 16 User SGPRs
50395051
using the ``enable_sgpr_*`` bit fields, in which case only the first 16 are
50405052
actually initialized. These are then immediately followed by the System SGPRs
@@ -5077,6 +5089,9 @@ SGPR register initial state is defined in
50775089
then Flat Scratch Init 2 See
50785090
(enable_sgpr_flat_scratch :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
50795091
_init)
5092+
then Preloaded Kernargs N/A See
5093+
(kernarg_preload_spec :ref:`amdgpu-amdhsa-kernarg-preload`.
5094+
_length)
50805095
then Private Segment Size 1 The 32-bit byte size of a
50815096
(enable_sgpr_private single work-item's memory
50825097
_segment_size) allocation. This is the
@@ -5209,6 +5224,31 @@ following properties:
52095224
* MTYPE set to support memory coherence that matches the runtime (such as CC for
52105225
APU and NC for dGPU).
52115226

5227+
.. _amdgpu-amdhsa-kernarg-preload:
5228+
5229+
Preloaded Kernel Arguments
5230+
++++++++++++++++++++++++++
5231+
5232+
On hardware that supports this feature, kernel arguments can be preloaded into
5233+
User SGPRs, up to the maximum number of User SGPRs available. The allocation of
5234+
Preload SGPRs occurs directly after the last enabled non-kernarg preload User
5235+
SGPR. (See :ref:`amdgpu-amdhsa-initial-kernel-execution-state`)
5236+
5237+
The data preloaded is copied from the kernarg segment, the amount of data is
5238+
determined by the value specified in the kernarg_preload_spec_length field of
5239+
the kernel descriptor. This data is then loaded into consecutive User SGPRs. The
5240+
number of SGPRs receiving preloaded kernarg data corresponds with the value
5241+
given by kernarg_preload_spec_length. The preloading starts at the dword offset
5242+
within the kernarg segment, which is specified by the
5243+
kernarg_preload_spec_offset field.
5244+
5245+
If the kernarg_preload_spec_length is non-zero, the CP firmware will append an
5246+
additional 256 bytes to the kernel_code_entry_byte_offset. This addition
5247+
facilitates the incorporation of a prologue to the kernel entry to handle cases
5248+
where code designed for kernarg preloading is executed on hardware equipped with
5249+
incompatible firmware. If hardware has compatible firmware the 256 bytes at the
5250+
start of the kernel entry will be skipped.
5251+
52125252
.. _amdgpu-amdhsa-kernel-prolog:
52135253

52145254
Kernel Prolog
@@ -15383,6 +15423,10 @@ terminated by an ``.end_amdhsa_kernel`` directive.
1538315423
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
1538415424
``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
1538515425
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
15426+
``.amdhsa_user_sgpr_kernarg_preload_length`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_LENGTH in
15427+
GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
15428+
``.amdhsa_user_sgpr_kernarg_preload_offset`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_OFFSET in
15429+
GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
1538615430
======================================================== =================== ============ ===================
1538715431

1538815432
.amdgpu_metadata

0 commit comments

Comments
 (0)