plotfi smem evolution july #4

plotfi · 2024-07-18T17:48:45Z

The following are a redo of Triton SMEM based on discussions around ttg.local_gather.

The RFC for local_gather is at https://docs.google.com/document/d/1rmYVe8tTRrPcVHkS5GcmcVl_tVjgMbcfiFWc_ihkTuU/edit?usp=sharing

The target for how to use Shared Memory with Triton using these patches is trending towards something like this:

@triton.jit
def triton_local_gather(data_ ptr, index_ptr, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
    xoffset = tl.program_id(0) * XBLOCK
    xbase = tl.arange(0, XBLOCK)
    rbase = tl.arange(0, RBLOCK)

    x = xoffset + xbase
    t = tl.load(data_ ptr + x)
    s = tl.local_copy(t)

    for offset in range(0, RBLOCK):
        r = roffset + rbase
        indices = tl.load(index_ptr + r)
        g = tl.gather(s, indices)
        # accumulate based on g

  # tl.store based on g accumulation

Local gathers take a pointer to shared memory and a tensor of indices to load from. This patch implementes the Op. (cherry picked from commit 3dac0e3)

Removed cast, set default order based on tensor rank.

local_copy -> local_alloc gather -> local_gather Eventually I plan to make tl.gather generate a tt::LoadOp if the ptr type is one to global memory. tt::GatherOp ideally should either go back into tt:LoadOp or automatically generate to a tt::LoadOp or ttg::LocalGatherOp.

This is required for passing MemDesc values via CallOps TODO: Replace this with a generic opaque Python type wrapper for MLIR Types

python/src/ir.cc

python/triton/language/core.py

plotfi · 2024-07-24T22:50:45Z

test/TritonGPU/local_gather_test.mlir

+  "triton_gpu.threads-per-warp" = 32 : i32
+}
+{
+  tt.func public @triton_global_gather(%arg0: !tt.ptr<bf16> {tt.divisibility = 16 : i32},


change func name to triton_local_gather

…ointer

…totuning The following change requires a private patchset that is not yet available outside of plotfi/triton#4 This patch adds usage of shared memory using the tl.local_copy and tl.gather operations for the TW (time bias) and PW (position bias) tensors for the forward pass kernel. Autotuning is also hooked up to the usage of these shared memory operators

plotfi added 2 commits July 16, 2024 17:08

local_gather op added to TritonGPUOps.td

c929102

Local gathers take a pointer to shared memory and a tensor of indices to load from. This patch implementes the Op. (cherry picked from commit 3dac0e3)

Create basic pieces of tl.local_copy and tl.gather

6420b05

plotfi force-pushed the plotfi-smem-evolution-july branch from 08112f0 to 6420b05 Compare July 19, 2024 17:25

plotfi added 7 commits July 22, 2024 16:06

Fixed that stuborn bug in dst_ty for tl.gather

1d66985

Fix bug in local_copy codegen

dd7a5d2

Removed cast, set default order based on tensor rank.

clang-format changes

a648e29

Adding local_copy+gather pytest case

baf56e3

Adding proper handling of tl.pointer_type address_space field

9807652

[HACK] Handle MemDescType in Triton

534bc74

This is required for passing MemDesc values via CallOps TODO: Replace this with a generic opaque Python type wrapper for MLIR Types

plotfi commented Jul 24, 2024

View reviewed changes

python/src/ir.cc Outdated Show resolved Hide resolved

plotfi commented Jul 24, 2024

View reviewed changes

python/triton/language/core.py Outdated Show resolved Hide resolved

plotfi commented Jul 24, 2024

View reviewed changes

plotfi added 2 commits July 26, 2024 10:56

Clean up the memdesc implementation at the python level

cd9e8df

drop mutable_memory and order, rename memdesc at Py level to shaped_p…

19bed39

…ointer

plotfi mentioned this pull request Aug 22, 2024

[Triton SMEM] Add not-yet-landed usage of Triton SMEM feature with autotuning facebookresearch/generative-recommenders#72

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plotfi smem evolution july #4

plotfi smem evolution july #4

plotfi commented Jul 18, 2024 •

edited

Loading

plotfi Jul 24, 2024

plotfi smem evolution july #4

Are you sure you want to change the base?

plotfi smem evolution july #4

Conversation

plotfi commented Jul 18, 2024 • edited Loading

plotfi Jul 24, 2024

Choose a reason for hiding this comment

plotfi commented Jul 18, 2024 •

edited

Loading