Take multi_headed_attention as an example.
-
implement your layer as in ./layers/multiple_headed_attention.cpp
-
register multiple_headed_attention.cpp in ./layers/CMakeLists.txt
-
add python API in turbo_transformers/python/turbo_transformers/layers/modeling_decoder.py
-
register in ./turbo_transformers/python/turbo_transformers/layers/init.py
-
add a
py::class_
in ./turbo_transformers/python/pybind.cpp -
add an unitest in ./turbo_transformers/python/tests/multi_headed_attention_test.py