-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fused GEMM + SiLU kernel #180
Open
harsh-nod
wants to merge
12
commits into
iree-org:main
Choose a base branch
from
harsh-nod:gemm_silu
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…g#161) This PR modifies the insertion point for iter args to ensure that the iter args are in the same order as the init args and outputs. This simplifies the mapping between init args, iter args and outputs. Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Fixes iree-org#85 PR based on the work of @maxbartel Requires changes in torch-mlir: [llvm/torch-mlir/#3688](llvm/torch-mlir#3688) Adds the mutable modifier to a global buffer and lifts said buffer to a global if there is a store-producer node associated with it. Signed-off-by: Christopher McGirr <mcgirr@roofline.ai> Co-authored-by: Maximilian Bartel <bartel@roofline.ai>
…iree-org#162) This PR introduces changes to handle elementwise or general arithmetic operations after we did some tiled-loop-reduction ("Reduction") operation. The main problem with the current stack is indexing_dims information for Reduction relies on the user. This would work if it's user/consumer is tkw.write, but in other cases such as BinaryPyOp or UnaryPyOp, it will lack such information. To make matters worst BinaryPyOp/UnaryPyOp depends on it's src/producer for indexing dim, while Reduction op depends on it's dst/consumer for its' indexing dim information. This would ended up causing infinite loop between UnaryPyOp/BinaryPyOp <-> Reduction. This PR fixes the indexing dimension logic Reduction and GetResult (required for expanded Reduction) to be based on it's reduction axis(for Reduction) and it's source/consumer information. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
This PR removes the need for propagating indices using post expansion. The new approach propagates the MMA indices to the MMA dimensions of all tensors (rather than just MMA nodes) and then specializes them depending on whether they lie within the backward slices of the LHS and RHS or forward slices of the ACC. --------- Signed-off-by: Harsh Menon <harsh@nod-labs.com>
This PR adds more documentation about tkw. Specifically, it provides a first draft of the introduction and adds a section on memory access patterns. Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
…g#166) The main motivation behind this PR is to enable multiple induction variable/iterArg on the same tiled "Reduction" loop. To enable above we did a couple things: 1. Enable lowering/expansion on `operator.getitem` (the op that extract multiple results in python i.e `res0, res1 = fn`) by templating it on`GetResult(CustomOp)` since they have the same args and interface and can reuse most of the indexing/expansion helper. 2. Introduce `res_idx`, a variable to represent which result index of an op we are referring to, during expansion and context map. This is useful for ops that has more than one results / variables as outputs. 3. bug fix in expand_reduction, where we hoist out iterating and expanding of `reduction.init_args` out of the loop that iterates and expands over the `yield`/`return_val` of the reduction loop. It is expected that the size of `init_args` is the same as size of `yield`/`return_val`. Hence if we had N iter_args/yields, we ended up expanding the `init_args` N x N time instead of N times. We haven't seen it thus far because we have been only playing with 1 init_arg/iterArg, and 1x1 == 1. 4. Introduce a canonicalization pattern to fold chains of GetResult. this is because GetResult by semantic/design is only expected to extract and have one result. Hence a chain of GetResult should just be replaced by itself. This help clean up the IR. num.4 also helps circumvent issue where Reduction and GetResult is expanded completely by itself not following the DFS structure per dimension like the rest of the expansion code. This becomes especially problematic for multiple IterArg since Getitem is not expecting its' source value to be expanded without it. --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Boian Petkantchin <boian.petkantchin@amd.com>
Instead of generating individual element comparisons and doing `vector.insertelement` generate the whole mask using vector ops. Add support for vector codegen when generating MLIR IR from sympy expressions. Add method `IndexingContext.iota` to generate special symbols which map to `(1,2 ... n-1)` vec expressions. `gen_sympy_index` will start to generate vector ops when encountering such symbols, inserting proper `splat`'s between scalar vals when necessary. --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
…#179) * Adds an option to `aot.export(import_symbolic_shape_expressions=True)` to enable emission of torch-mlir symbolic shape constraints. This is currently set to False until IREE is ready to ingest these by default. Rough sequence of work in IREE proper: * Custom lowering of `torch.symbolic_int` and `torch.bind_symbolic_shape` ops to IREE util "assume" ops. Note that we are only planning to lower "terminal" bindings (basically function arguments and a couple of other such categories). * Canonicalizations to ensure that assume equalities are == 0 (versus the native form from torch where they assume a non zero equality). * Fusion will clone corresponding bindings on dependent dims into dispatch regions. * Existing linalg shape analysis extended and queryable by codegen. --------- Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.