Skip to content

Commit

Permalink
improve styling and link to benchmark script in Performance
Browse files Browse the repository at this point in the history
  • Loading branch information
4imothy committed Nov 7, 2024
1 parent b5a593e commit 8ef3539
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 3 deletions.
7 changes: 7 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ def run(self):
rst_prolog = f'''
.. _repo: {repo}
.. |repo| replace:: **Source Code**
.. _script: {repo_main + '/bench/gather.py'}
.. |script| replace:: *script*
.. _custom: {repo_src + '/custom'}
.. |custom| replace:: *custom*
.. _custom_cmake: {repo_main + '/cmake/custom.cmake'}
Expand All @@ -62,6 +64,11 @@ def run(self):
.. |doc| replace:: **Documentation**
.. |name| replace:: *{ai3.__name__}*
.. |pkg_name| replace:: *{pkg_name}*
.. _cuDNN: https://developer.nvidia.com/cudnn
.. |cuDNN| replace:: *cuDNN*
.. _SYCL: https://www.khronos.org/sycl
.. |SYCL| replace:: *SYCL*
'''

project = ai3.__name__
Expand Down
6 changes: 3 additions & 3 deletions docs/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ Performance

.. _details:

The `cuDNN <https://developer.nvidia.com/cudnn>`_ and `SYCL
<https://www.khronos.org/sycl/>`_ benchmarks for both *ai3* and *PyTorch* were
The |cudnn|_ and |sycl|_ benchmarks for both *ai3* and *PyTorch* were
gathered using an *NVIDIA GeForce L40S GPU* with *16* gigabytes of memory. The
final latencies used are the average over *10* runs after *10* warm up runs.
The implementations for the algorithms include select ones provided by *cuDNN*
and implementations from *ai3* which leverage *SYCL*.
and implementations from *ai3* which leverage *SYCL*. Benchmarks are
gathered using this |script|_.

0 comments on commit 8ef3539

Please sign in to comment.