Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Operator] Adding CPU support for matrix multiplication #251

Merged
merged 114 commits into from
May 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
8d0aae4
now remember to backup...
BolinSNLHM Apr 25, 2023
8f8bcca
...
BolinSNLHM Apr 25, 2023
3b8f9c1
change 4x4 kernel to avx intrinsics
BolinSNLHM Apr 26, 2023
7536d30
added some type info
BolinSNLHM Apr 26, 2023
a4ef3e9
commit before changing the compilation command
BolinSNLHM Apr 26, 2023
088970c
now can compile with avx intrinsics
BolinSNLHM Apr 26, 2023
0d406a2
added 32x8 primitives for CPU
BolinSNLHM Apr 29, 2023
9916989
added O3 compiler option
BolinSNLHM Apr 29, 2023
edc5e67
...
BolinSNLHM Apr 29, 2023
9a0aa6a
added more primitives
BolinSNLHM Apr 29, 2023
db2c683
...
BolinSNLHM Apr 29, 2023
33b4451
slight modification of opt88 file
BolinSNLHM Apr 29, 2023
1ad9e7d
added 32x8 imports where necessary
BolinSNLHM Apr 29, 2023
46f1d63
modified two scratch files
BolinSNLHM Apr 29, 2023
3d69a5f
five2: quite some speedup compared to how little has been down in add…
BolinSNLHM Apr 29, 2023
1302698
..... fixed dumb error
BolinSNLHM Apr 29, 2023
f799531
..
BolinSNLHM Apr 29, 2023
b93b408
8x8 kernel: efficiency improved again
BolinSNLHM Apr 29, 2023
3bbc4bd
reordering: some improvements
BolinSNLHM Apr 30, 2023
b053dc5
reordering loop gets a slight boost
BolinSNLHM Apr 30, 2023
9ffa73f
working on packing: back up midway
BolinSNLHM Apr 30, 2023
b29d61a
commented out redundant codes
BolinSNLHM Apr 30, 2023
613e3e2
a version of packing that does not yield much benefit...
BolinSNLHM Apr 30, 2023
a1a6c5e
...
BolinSNLHM Apr 30, 2023
035dca8
fix conflicts
BolinSNLHM Apr 30, 2023
3be1845
resolved conflict
BolinSNLHM Apr 30, 2023
0372647
......
BolinSNLHM Apr 30, 2023
6c67af0
working on packing B: some bugs for now:
BolinSNLHM Apr 30, 2023
fb3ca73
still hasn't figured out packing of B... move to using pointer?
BolinSNLHM May 1, 2023
4982ddf
first version of packing works?
BolinSNLHM May 1, 2023
894ee8a
really strange behavior regarding those definitions...
BolinSNLHM May 1, 2023
47980ce
seems like there's benefit in setting MC large
BolinSNLHM May 2, 2023
578b925
seems like aligning didn't do much
BolinSNLHM May 2, 2023
d0ba954
performance still not satisfactory yet; try to handle general case fo…
BolinSNLHM May 2, 2023
01a33e3
working on general: now at least in the work-in-progress the nice siz…
BolinSNLHM May 2, 2023
9d441a6
finally support for arbitrary size...
BolinSNLHM May 3, 2023
79c1c09
...
BolinSNLHM May 3, 2023
4da2612
working on refactoring; backup
BolinSNLHM May 3, 2023
e9132ff
first version of refactoring macrokernel
BolinSNLHM May 3, 2023
4d487be
what... segfault for only one case after refactoring
BolinSNLHM May 3, 2023
d7ba0f1
finished refactoring macro-kernel
BolinSNLHM May 3, 2023
c8faf37
refactored macro-kernel
BolinSNLHM May 3, 2023
273c0fd
why is it slower after refactoring??
BolinSNLHM May 3, 2023
5fe11a3
finished refactoring out the micro-kernel
BolinSNLHM May 3, 2023
a28d300
little details
BolinSNLHM May 3, 2023
12faa70
change MC to 2048
BolinSNLHM May 3, 2023
aec8b5f
...
BolinSNLHM May 3, 2023
c950603
10x8 does not work so well
BolinSNLHM May 3, 2023
a461637
6x16 really makes a difference
BolinSNLHM May 3, 2023
4596028
Merge branch 'hidet-org:main' into main
BolinSNLHM May 4, 2023
5a5f8f9
Merge branch 'main' of github.com:BolinSNLHM/hidet into main
BolinSNLHM May 4, 2023
d25195a
start working on parallel
BolinSNLHM May 4, 2023
e416720
start workng on parallel
BolinSNLHM May 4, 2023
b9fc9d4
so far the best got...
BolinSNLHM May 4, 2023
e2b34c7
first try... need to experiment more
BolinSNLHM May 4, 2023
b74cbc6
... play with nthreads, go to paper
BolinSNLHM May 4, 2023
66cb61b
nthreads=24 currently promising
BolinSNLHM May 4, 2023
35821ff
stop playing with block sizes for now...
BolinSNLHM May 4, 2023
db3fb2a
exploring parallelizing the third loop
BolinSNLHM May 4, 2023
61a3a44
Merge branch 'hidet-org:main' into main
BolinSNLHM May 6, 2023
8243937
...
BolinSNLHM May 6, 2023
bd872ff
Merge branch 'main' of github.com:BolinSNLHM/hidet into main
BolinSNLHM May 6, 2023
58f6edc
Merge branch 'main' into bolin
BolinSNLHM May 6, 2023
99828d9
eliminate for loops
BolinSNLHM May 6, 2023
fc926f5
removed that parallelizing 3rd loop: seems like a bad idea for some r…
BolinSNLHM May 6, 2023
58b23ce
strange error; push for backup
BolinSNLHM May 8, 2023
afda10a
Merge branch 'hidet-org:main' into main
BolinSNLHM May 8, 2023
a4f2ca8
Merge branch 'hidet-org:main' into bolin
BolinSNLHM May 8, 2023
cc47d1b
finished debugging; seems like they ran slower than before?
BolinSNLHM May 9, 2023
64ac8a3
worked out the first version of the schedule template; the issue w/ o…
BolinSNLHM May 10, 2023
618b0c1
first benchmark...
BolinSNLHM May 10, 2023
4b33400
trying tvm
BolinSNLHM May 15, 2023
5b3e1e3
moving to the server
BolinSNLHM May 15, 2023
e141cf2
...
BolinSNLHM May 15, 2023
03c5ea2
some more trying files...
BolinSNLHM May 17, 2023
9340268
Merge branch 'hidet-org:main' into main
BolinSNLHM May 17, 2023
8e95190
commit before checking out to main...
BolinSNLHM May 21, 2023
bb03b68
Merge branch 'main' of github.com:BolinSNLHM/hidet into main
BolinSNLHM May 21, 2023
5b0f01f
Merge branch 'main' into bolin
BolinSNLHM May 21, 2023
61cd1c7
...
BolinSNLHM May 21, 2023
3e4a16c
working on replicating the oneDNN ref impl in hidet script
BolinSNLHM May 21, 2023
7912abd
Merge branch 'hidet-org:main' into main
BolinSNLHM May 22, 2023
18278cd
Merge branch 'hidet-org:main' into main
BolinSNLHM May 23, 2023
aa2cc45
commit b4 pulling for pointer arithmetic
BolinSNLHM May 23, 2023
ef115e5
solving merge conflict
BolinSNLHM May 23, 2023
7d73e8e
.
BolinSNLHM May 23, 2023
529c07a
..
BolinSNLHM May 23, 2023
1cb9bd6
Merge branch 'hidet-org:main' into main
BolinSNLHM May 23, 2023
6fd08a0
Merge branch 'main' of github.com:BolinSNLHM/hidet into main
BolinSNLHM May 23, 2023
435a401
.
BolinSNLHM May 23, 2023
771c15b
Merge branch 'hidet-org:main' into main
BolinSNLHM May 23, 2023
2d8f8bd
..
BolinSNLHM May 23, 2023
1be5c12
Merge branch 'main' of github.com:BolinSNLHM/hidet into main
BolinSNLHM May 23, 2023
888a285
Merge branch 'main' into bolin
BolinSNLHM May 23, 2023
0b3e45a
.
BolinSNLHM May 24, 2023
54cd1b6
changed codegen to use dynamic
BolinSNLHM May 25, 2023
4424c7d
I should try smaller blocks?
BolinSNLHM May 25, 2023
087eae1
still something wrong with packing with pointer arithmetics...
BolinSNLHM May 26, 2023
ef36d60
.
BolinSNLHM May 26, 2023
322a082
.
BolinSNLHM May 26, 2023
9b46a2d
.
BolinSNLHM May 26, 2023
7a94c2d
deleting
BolinSNLHM May 26, 2023
e3210ab
deleting
BolinSNLHM May 26, 2023
2af5bbf
cleanup
BolinSNLHM May 27, 2023
df0158f
lint
BolinSNLHM May 27, 2023
3904a12
lint
BolinSNLHM May 26, 2023
fcbb094
..
BolinSNLHM May 27, 2023
7b63a91
Merge branch 'cpu-matmul' of github.com:BolinSNLHM/hidet into cpu-matmul
BolinSNLHM May 27, 2023
8bb26e6
.
BolinSNLHM May 27, 2023
a86907e
Update .gitignore
BolinSNLHM May 27, 2023
d6150a6
Update python/hidet/backend/build.py
BolinSNLHM May 27, 2023
ca6e382
addressed changes + test case
BolinSNLHM May 27, 2023
6ebb52f
re-arranged test order so all tests passed....
BolinSNLHM May 27, 2023
1813af8
forgot to run format/lint
BolinSNLHM May 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions python/hidet/backend/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,14 @@ def compile(self, src_path: str, out_lib_path: str, options: Optional[Dict[str,
*['-L{}'.format(library_dir) for library_dir in self.library_dirs],
# optimize host side code via -O3
'-O3',
# enable openmp support for cpu kernels
'-Xcompiler -fopenmp',
# host compiler options: enable openmp, avx2, unroll loops and fast math
'-Xcompiler -fopenmp,-fPIC,-m64,-mavx2,-march=native,-O3,-funroll-loops,-ffast-math',
# the target PTX and SASS version.
'-gencode arch=compute_{cc},code=sm_{cc}'.format(cc=cc_code),
# allow ptxas (PTX assembler) to output information like register/smem usage.
'--ptxas-options=-v',
# compile into position independent code.
'--compiler-options -fPIC',
# '--compiler-options -fPIC,-m64,-mavx2,-march=native, -O3',
# embed the line information into the binary, allow Nsight Compute to get the source code for profiling.
'-lineinfo',
# ftz=true and prec-div=false for fast math
Expand Down Expand Up @@ -184,6 +184,10 @@ def compile(self, src_path: str, out_lib_path: str, options: Optional[Dict[str,
*['-L{}'.format(library_dir) for library_dir in self.library_dirs],
# apply -O3 optimization.
'-O3',
# support avx intrinsics
'-mavx2',
'-m64',
'-march=native',
yaoyaoding marked this conversation as resolved.
Show resolved Hide resolved
# compile into position independent code.
'-fPIC',
# enable OpenMP.
Expand Down
9 changes: 8 additions & 1 deletion python/hidet/backend/codegen.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,9 @@ def visit_ForStmt(self, stmt: ForStmt):
doc += NewLine() + '#pragma unroll'
elif stmt.attr.parallel:
if stmt.attr.parallel_threads:
doc += NewLine() + '#pragma omp parallel for num_threads({})'.format(stmt.attr.parallel_threads)
doc += NewLine() + '#pragma omp parallel for schedule(dynamic) num_threads({})'.format(
stmt.attr.parallel_threads
)
else:
doc += NewLine() + '#pragma omp parallel for'
doc += NewLine() + Text('for (') + init_doc + '; ' + cond_doc + '; ' + update_doc + ') '
Expand Down Expand Up @@ -555,6 +557,8 @@ def visit_DataType(self, t: DataType):
'tfloat32': 'tfloat32_t',
'complex64': 'complex64_t',
'complex128': 'complex128_t',
'float32x4': '__m128',
'float32x8': '__m256',
}
return Text(scalar_type_map[t.name])

Expand Down Expand Up @@ -613,6 +617,8 @@ def require_headers(self) -> Doc:
doc += Text('#include <hidet/runtime/cuda/complex.h>') + NewLine()
doc += Text('#include <hidet/runtime/cuda/context.h>') + NewLine()

doc += Text('#include <immintrin.h>') + NewLine()

# nvcc use float to 'store' tfloat32 data
doc += Text('typedef float tfloat32_t;') + NewLine()
doc += Text('typedef __nv_bfloat16 bfloat16_t;') + NewLine()
Expand Down Expand Up @@ -684,6 +690,7 @@ def require_headers(self) -> Doc:
doc += Text('#include <hidet/runtime/cpu/float16.h>') + NewLine()
doc += Text('#include <hidet/runtime/cpu/bfloat16.h>') + NewLine()
doc += Text('#include <hidet/runtime/cpu/complex.h>') + NewLine()
doc += Text('#include <immintrin.h>')
doc += NewLine()
return doc

Expand Down
2 changes: 1 addition & 1 deletion python/hidet/graph/ops/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
from .definitions.conv2d_transpose import conv2d_transpose, conv2d_transpose_gemm
from .definitions.conv3d import conv3d, conv3d_gemm
from .definitions.conv3d_transpose import conv3d_transpose
from .definitions.matmul import batch_matmul, matmul
from .definitions.matmul import batch_matmul, matmul, matmul_x86
from .definitions.pool import avg_pool2d, avg_pool3d, adaptive_avg_pool1d, adaptive_avg_pool2d, adaptive_avg_pool3d
from .definitions.pool import max_pool2d, max_pool3d, adaptive_max_pool1d, adaptive_max_pool2d, adaptive_max_pool3d
from .definitions.activation import relu, leaky_relu, sigmoid, hardsigmoid, clip, relu6, prelu, gelu, silu, hardswish
Expand Down
2 changes: 1 addition & 1 deletion python/hidet/graph/ops/definitions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
from .conv3d_transpose import conv3d_transpose
from .matmul import batch_matmul, matmul

from .matmul import BatchMatmulOp, MatmulOp
from .matmul import BatchMatmulOp, MatmulOp, Matmulx86Op
from .conv2d import Conv2dOp
from .arithmetic import ErfOp, PowOp, AddOp, SubtractOp, MultiplyOp, DivideOp, WhereOp
from .compare import EqualOp
Expand Down
4 changes: 4 additions & 0 deletions python/hidet/graph/ops/definitions/matmul/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,7 @@
from .matmul import matmul, MatmulOp, MatmulTask
from .batch_matmul import batch_matmul, BatchMatmulOp, BatchMatmulTask
from . import resolve

from .matmul_f32_x86 import matmul_x86

from .matmul_f32_x86 import MatmulF32Taskx86, Matmulx86Op
Loading