pytorch
diff --git a/‎.github/workflows/tb_plugin_ci.yml‎
Lines changed: 3 additions & 3 deletions b/‎.github/workflows/tb_plugin_ci.yml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎tb_plugin/README.md‎
Lines changed: 170 additions & 49 deletions b/‎tb_plugin/README.md‎
Lines changed: 170 additions & 49 deletions
diff --git a/‎tb_plugin/docs/gpu_utilization.md‎
Lines changed: 17 additions & 0 deletions b/‎tb_plugin/docs/gpu_utilization.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎tb_plugin/docs/images/control_panel.PNG‎
-2.28 KB b/‎tb_plugin/docs/images/control_panel.PNG‎
-2.28 KB
diff --git a/‎tb_plugin/docs/images/distributed_view.PNG‎
85.2 KB b/‎tb_plugin/docs/images/distributed_view.PNG‎
85.2 KB
diff --git a/‎tb_plugin/docs/images/kernel_view.PNG‎
-27.5 KB b/‎tb_plugin/docs/images/kernel_view.PNG‎
-27.5 KB
diff --git a/‎tb_plugin/docs/images/memory_view.PNG‎
136 KB b/‎tb_plugin/docs/images/memory_view.PNG‎
136 KB
diff --git a/‎tb_plugin/docs/images/operator_view.PNG‎
-10 KB b/‎tb_plugin/docs/images/operator_view.PNG‎
-10 KB
diff --git a/‎tb_plugin/docs/images/overall_view.PNG‎
33 KB b/‎tb_plugin/docs/images/overall_view.PNG‎
33 KB
diff --git a/‎tb_plugin/docs/images/trace_view.PNG‎
99.4 KB b/‎tb_plugin/docs/images/trace_view.PNG‎
99.4 KB
@@ -5,13 +5,13 @@ on:
     branches:
       - master
       - release/**
-      - tb_plugin
+      - plugin/**
 
   pull_request:
     branches:
       - master
       - release/**
-      - tb_plugin
+      - plugin/**
 
 jobs:
   build:
@@ -37,6 +37,6 @@ jobs:
           set -e
           cd tb_plugin
           sh ./ci_scripts/install_env.sh
-          pip install .
+          pip install .[gs]
           cd test
           pytest
@@ -0,0 +1,17 @@
+* GPU Utilization: GPU busy time / all steps time. The bigger, the better. All steps time is the total time of all profiler steps(or called as iterations). 
+                   GPU busy time is the time during “all steps time” when is at least one GPU kernel running on this GPU. 
+                   However, this high-level utilization metric is coarse. It can’t tell how many SMs(Stream Multiprocessors) are in use. 
+                   For example, a kernel with a single thread running continuously will get 100% GPU utilization. 
+
+* Est. SM Efficiency: Estimated Stream Multiprocessor Efficiency. The bigger, the better. This metric of a kernel, SM_Eff_K = min(blocks of this kernel / SM number of this GPU, 100%). 
+                      This overall number is the sum of all kernels' SM_Eff_K weighted by kernel's execution duration, divided by “all steps time”. 
+                      It shows GPU Stream Multiprocessors’ utilization. 
+                      Although it is finer grained than above “GPU Utilization”, it still can’t tell the whole story. 
+                      For example, a kernel with only one thread per block can’t fully utilize each SM. 
+
+* Est. Achieved Occupancy: The bigger, the better. The definition of occupancy is [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm). 
+                           Occupancy is the ratio of active warps on an SM to the maximum number of
+                           active warps supported by the SM. The theoretical occupancy of a kernel is upper limit occupancy of this kernel, limited by multiple 
+                           factors such as kernel shape, kernel used resource, and the GPU compute capability. 
+                           Est. Achieved Occupancy of a kernel, OCC_K = min(threads of the kernel / SM number / max threads per SM, theoretical occupancy of the kernel). 
+                           This overall number is the weighted sum of all kernels OCC_K using kernel's execution duration as weight. It shows fine-grained low-level GPU utilization.