Skip to content

Commit be239c0

Browse files
guyang3532facebook-github-bot
authored andcommitted
Merge Plugin/0.2 (#284)
Summary: Pull Request resolved: #284 Reviewed By: leitian, chaekit Differential Revision: D29052618 Pulled By: gdankel fbshipit-source-id: 9f38cdfc7c7e73f5f62844ef857ebe6fed46f30a
1 parent 897a49c commit be239c0

File tree

101 files changed

+16146
-2407
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+16146
-2407
lines changed

.github/workflows/tb_plugin_ci.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ on:
55
branches:
66
- master
77
- release/**
8-
- tb_plugin
8+
- plugin/**
99

1010
pull_request:
1111
branches:
1212
- master
1313
- release/**
14-
- tb_plugin
14+
- plugin/**
1515

1616
jobs:
1717
build:
@@ -37,6 +37,6 @@ jobs:
3737
set -e
3838
cd tb_plugin
3939
sh ./ci_scripts/install_env.sh
40-
pip install .
40+
pip install .[gs]
4141
cd test
4242
pytest

tb_plugin/README.md

Lines changed: 170 additions & 49 deletions
Large diffs are not rendered by default.

tb_plugin/docs/gpu_utilization.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
* GPU Utilization: GPU busy time / all steps time. The bigger, the better. All steps time is the total time of all profiler steps(or called as iterations).
2+
GPU busy time is the time during “all steps time” when is at least one GPU kernel running on this GPU.
3+
However, this high-level utilization metric is coarse. It can’t tell how many SMs(Stream Multiprocessors) are in use.
4+
For example, a kernel with a single thread running continuously will get 100% GPU utilization.
5+
6+
* Est. SM Efficiency: Estimated Stream Multiprocessor Efficiency. The bigger, the better. This metric of a kernel, SM_Eff_K = min(blocks of this kernel / SM number of this GPU, 100%).
7+
This overall number is the sum of all kernels' SM_Eff_K weighted by kernel's execution duration, divided by “all steps time”.
8+
It shows GPU Stream Multiprocessors’ utilization.
9+
Although it is finer grained than above “GPU Utilization”, it still can’t tell the whole story.
10+
For example, a kernel with only one thread per block can’t fully utilize each SM.
11+
12+
* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm).
13+
Occupancy is the ratio of active warps on an SM to the maximum number of
14+
active warps supported by the SM. The theoretical occupancy of a kernel is upper limit occupancy of this kernel, limited by multiple
15+
factors such as kernel shape, kernel used resource, and the GPU compute capability.
16+
Est. Achieved Occupancy of a kernel, OCC_K = min(threads of the kernel / SM number / max threads per SM, theoretical occupancy of the kernel).
17+
This overall number is the weighted sum of all kernels OCC_K using kernel's execution duration as weight. It shows fine-grained low-level GPU utilization.
-2.28 KB
Loading
85.2 KB
Loading
-27.5 KB
Loading
136 KB
Loading
-10 KB
Loading
33 KB
Loading
99.4 KB
Loading

0 commit comments

Comments
 (0)