[TRITON-MLIR] Merge `master` #995

ptillet · 2022-12-20T00:21:08Z

No description provided.

…n by specialization parameters (#742)

)

In ```torch._inductor```, we [convert 0d CPU tensor to scalar during triton codegen](pytorch/pytorch#87329), so need add missing triton support for bf16/fp16/fp64.

This reverts commit 584086f.

For stupid reasons, ops on int8 are 3 times slower than on int, and for another set of stupid reasons we are not using cudaMemset for `zero_`, so using `int8` buffer in `do_bench` makes it slow. Co-authored-by: Philippe Tillet <phil@openai.com>

….py (#883) Ran mypy over `build_extern.py`, cleaned up type annotations. Found a fixed a bug where `ExternLibrary(format=)` was being ignored.

The previous `{i}` was silently expanding to the `i` from the enumeration loop on `regular_args` (when it wasn't empty).

The pass lowers nvidia_gpu operations, which is not needed for intel backend. Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

ptillet and others added 19 commits December 19, 2022 12:26

[BACKEND] Fixed missing barrier in basic reduction codegen

2eeee0c

.

8803c84

[TESTING] allclose fixup (#724)

a65cb5a

[TUTORIALS] Attention tutorial fixup

ee481ad

.

7bb72cd

[RUNTIME] Fixed JIT bug that leg some constexpr values to be override…

0373f0a

…n by specialization parameters (#742)

[RUNTIME] Add callback functions for external tools (#738)

873d708

[DOCS] Fixed typos in 01-vector-add.py (#751)

5fac643

[RUNTIME] support multiple devices in the same process (#757)

8f38047

[RUNTIME] Make entry point cache key depend on triton version hash (#765

f3a1976

)

Add bf16/fp16/fp64 support for ty_to_cpp (#800)

16a0286

In ```torch._inductor```, we [convert 0d CPU tensor to scalar during triton codegen](pytorch/pytorch#87329), so need add missing triton support for bf16/fp16/fp64.

[BUILD] Now using cibuildwheel default

e0d25a2

Revert "[BUILD] Now using cibuildwheel default"

715c960

This reverts commit 584086f.

[DOCS] Add install from source instructions to README (#821)

454d906

Better NVIDIA Pascal GPU Support (#827)

d43dfea

[FRONTEND] Fix ExternLibrary(format=) bug; type annotate build_extern…

fc8e5ca

….py (#883) Ran mypy over `build_extern.py`, cleaned up type annotations. Found a fixed a bug where `ExternLibrary(format=)` was being ignored.

Fix format double substitution bug: {i} => {{i}} (#886)

443c91e

The previous `{i}` was silently expanding to the `i` from the enumeration loop on `regular_args` (when it wasn't empty).

Merge branch 'triton-mlir' into phil/triton-mlir-merge-master

4283266

ptillet merged commit 8d496ed into triton-mlir Dec 20, 2022

ptillet deleted the phil/triton-mlir-merge-master branch December 20, 2022 01:03

ZzEeKkAa pushed a commit to ZzEeKkAa/triton that referenced this pull request Aug 5, 2024

[intel] Remove ClusterOpstoLLVM pass (triton-lang#995)

9d6d945

The pass lowers nvidia_gpu operations, which is not needed for intel backend. Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON-MLIR] Merge `master` #995

[TRITON-MLIR] Merge `master` #995

ptillet commented Dec 20, 2022

[TRITON-MLIR] Merge master #995

[TRITON-MLIR] Merge master #995

Conversation

ptillet commented Dec 20, 2022

[TRITON-MLIR] Merge `master` #995

[TRITON-MLIR] Merge `master` #995