forked from triton-lang/triton
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plotfi smem evolution july #4
Open
plotfi
wants to merge
11
commits into
plotfi-smem-evolution-july-base
Choose a base branch
from
plotfi-smem-evolution-july
base: plotfi-smem-evolution-july-base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
plotfi smem evolution july #4
plotfi
wants to merge
11
commits into
plotfi-smem-evolution-july-base
from
plotfi-smem-evolution-july
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Local gathers take a pointer to shared memory and a tensor of indices to load from. This patch implementes the Op. (cherry picked from commit 3dac0e3)
plotfi
force-pushed
the
plotfi-smem-evolution-july
branch
from
July 19, 2024 17:25
08112f0
to
6420b05
Compare
Removed cast, set default order based on tensor rank.
local_copy -> local_alloc gather -> local_gather Eventually I plan to make tl.gather generate a tt::LoadOp if the ptr type is one to global memory. tt::GatherOp ideally should either go back into tt:LoadOp or automatically generate to a tt::LoadOp or ttg::LocalGatherOp.
This is required for passing MemDesc values via CallOps TODO: Replace this with a generic opaque Python type wrapper for MLIR Types
plotfi
commented
Jul 24, 2024
plotfi
commented
Jul 24, 2024
plotfi
commented
Jul 24, 2024
"triton_gpu.threads-per-warp" = 32 : i32 | ||
} | ||
{ | ||
tt.func public @triton_global_gather(%arg0: !tt.ptr<bf16> {tt.divisibility = 16 : i32}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change func name to triton_local_gather
plotfi
added a commit
to plotfi/generative-recommenders
that referenced
this pull request
Aug 22, 2024
…totuning The following change requires a private patchset that is not yet available outside of plotfi/triton#4 This patch adds usage of shared memory using the tl.local_copy and tl.gather operations for the TW (time bias) and PW (position bias) tensors for the forward pass kernel. Autotuning is also hooked up to the usage of these shared memory operators
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following are a redo of Triton SMEM based on discussions around ttg.local_gather.
The RFC for local_gather is at https://docs.google.com/document/d/1rmYVe8tTRrPcVHkS5GcmcVl_tVjgMbcfiFWc_ihkTuU/edit?usp=sharing
The target for how to use Shared Memory with Triton using these patches is trending towards something like this: