Matmul benchmarking: case without tile quantization: #1980

shmsong · 2022-09-13T20:14:17Z

This is the benchmarking PR in this series, tracking the resulting performance from this stack of PRs.

Most recent run on A100:

shmsong · 2022-09-22T16:31:52Z

most recent check on DGX A100 after the recent round of moving commits around and cleaning up:

zasdfgbnm · 2022-09-29T08:32:55Z

benchmarks/cpp/nvfuser/CMakeLists.txt

@@ -20,6 +20,7 @@ if(USE_CUDA)
    softmax_backward.cpp


Note to myself: I have split this file out and merged separately in #2007
This file is no longer needed here anymore.

zasdfgbnm · 2022-09-29T08:33:08Z

benchmarks/cpp/nvfuser/matmul.cpp

@@ -0,0 +1,356 @@
+#include <torch/csrc/jit/codegen/cuda/arith.h>


Note to myself: I have split this file out and merged separately in #2007
This file is no longer needed here anymore.

…base

zasdfgbnm · 2022-09-30T07:43:13Z

After rebasing, this PR is just a trivial PR adding a test, I will merge this now to the bottom of the stack

… options (#1978) * pipe through cpasyncCG * Matmul benchmarking: case without tile quantization: (#1980) * add matmul benchmark * more benchmark and test extension * fixes Co-authored-by: Xiang Gao <qasdfgtyuiop@gmail.com> * fix --------- Co-authored-by: Xiang Gao <qasdfgtyuiop@gmail.com>

shmsong force-pushed the cache_op_interface branch from 97c6ee9 to d0c3fc9 Compare September 21, 2022 23:11

shmsong added 2 commits September 21, 2022 21:14

add matmul benchmark

2efe1b6

more benchmark and test extension

4d8daea

shmsong force-pushed the matmul_benchmarking1 branch from da4b1b7 to 4d8daea Compare September 22, 2022 16:23

zasdfgbnm mentioned this pull request Sep 29, 2022

Add matmul benchmark #2007

Merged

zasdfgbnm reviewed Sep 29, 2022

View reviewed changes

zasdfgbnm added 3 commits September 29, 2022 23:35

Merge branch 'cache_op_interface-rebase' into matmul_benchmarking1-re…

c25ddfd

…base

fixes

5616543

Merge branch 'cache_op_interface' into matmul_benchmarking1

c7a311c

zasdfgbnm merged commit 7c77b39 into cache_op_interface Sep 30, 2022

zasdfgbnm deleted the matmul_benchmarking1 branch September 30, 2022 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matmul benchmarking: case without tile quantization: #1980

Matmul benchmarking: case without tile quantization: #1980

shmsong commented Sep 13, 2022

shmsong commented Sep 22, 2022 •

edited

Loading

zasdfgbnm Sep 29, 2022 •

edited

Loading

zasdfgbnm Sep 29, 2022 •

edited

Loading

zasdfgbnm commented Sep 30, 2022

		@@ -0,0 +1,356 @@
		#include <torch/csrc/jit/codegen/cuda/arith.h>

Matmul benchmarking: case without tile quantization: #1980

Matmul benchmarking: case without tile quantization: #1980

Conversation

shmsong commented Sep 13, 2022

shmsong commented Sep 22, 2022 • edited Loading

zasdfgbnm Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

zasdfgbnm Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

zasdfgbnm commented Sep 30, 2022

shmsong commented Sep 22, 2022 •

edited

Loading

zasdfgbnm Sep 29, 2022 •

edited

Loading

zasdfgbnm Sep 29, 2022 •

edited

Loading