Skip to content

Commit

Permalink
merge plugin/vnext to main (pytorch#546)
Browse files Browse the repository at this point in the history
Summary:
1. add documentation
2. fix wrong merge with tensorcore
3. PyTorch lightning improvements

Pull Request resolved: pytorch#546

Reviewed By: chaekit

Differential Revision: D34569540

Pulled By: robieta

fbshipit-source-id: deab4c5f6f7d82cf49d018facc5c3b7682d29489
  • Loading branch information
guotuofeng authored and facebook-github-bot committed Mar 4, 2022
1 parent 897a3e1 commit fc0d7bf
Show file tree
Hide file tree
Showing 11 changed files with 58 additions and 11 deletions.
42 changes: 42 additions & 0 deletions tb_plugin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,48 @@ one worker is much larger than others, there may be a problem of loading balance
* Data Transfer Time (us): Total time actually used for data transfer in operator of this type.
* Ave Data Transfer Time (us): Average time actually used for data transfer in each operator of this type.

* Module View

If the torch.nn.Module information is dumped into the result Chrome tracing file by Pytorch profiler, the plugin could display the nn.Module hierarchy and summary.

![Alt text](./docs/images/module_view.png)

* The top table shows each torch.nn.Module statistics information including:
* Occurrences: how many times the module is called in the training process.
* Operators: how many operators the module invokes.
* Host Total Time: The accumulated time spent on Host, including the child submodule.
* Host Self Time: The accumulated time spent on Host, not including the child submodule.
* Device Total Time: The accumulated time spent on GPU of the operators contained in the module, including the child submodule.
* Device Self Time: The accumulated time spent on GPU of the operators contained in the module, not including the child submodule.

* The middle flamegraph shows the torch.nn.Module hierarchy information
* The bottom graph shows the main thread operators tree.

* Lightning View

If the Chrome tracing file is from PytorchLightning job, the plugin will show a Lightning View which is customized for Pytorch Lightning.
All the data of this view is from PytorchLightning framework.

![Alt text](./docs/images/lightning_view.png)

* The top table shows the model structure. The meaning of metrics in the table is same as Module View.
* The middle flamegraph shows the model hierarchy information.
* The bottom graph shows the call tree of all hooks in PytorchLightning.

* Diff Run View

The diff run feature helps to compare two run by logical timeline. The key comparision operators include backward, dataloader, torch.nn.Module, optimizer. If each operator contains these sub-operators internally, the diff run could be zoom in by click the bar.

![Alt text](./docs/images/diff_view.png)

* The top bar chart shows each operator type and trend comparision result.
* The middle line chart shows the delta and accumulated execution time difference against each operator type.
* The bottom table show the operators difference for the following categories:
* Host Total Duration: The accumulated time spent on Host, including this operator’s child operators.
* Host Self Duration: The accumulated time spent on Host, not including this operator’s child operators.
* Device Total Duration: The accumulated time spent on GPU, including this operator’s child operators.
* Device Self Duration: The accumulated time spent on GPU, not including this operator’s child operators.

### PyTorch Profiler TensorBoard Plugin 0.2 Release Notes

Known Issues: This software does not support Python 3.9.0, 3.9.1, 3.9.2.
Expand Down
3 changes: 1 addition & 2 deletions tb_plugin/ci_scripts/install_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ if [ "$PYTORCH_VERSION" = "nightly" ]; then
pip install --pre torchvision --no-deps -f "https://download.pytorch.org/whl/nightly/$CUDA_VERSION/torch_nightly.html"
elif [ "$PYTORCH_VERSION" = "1.11rc" ]; then
pip install --pre torch -f "https://download.pytorch.org/whl/test/$CUDA_VERSION/torch_test.html"
#pip install --pre torchvision --no-deps -f "https://download.pytorch.org/whl/test/$CUDA_VERSION/torch_test.html"
pip install --pre torchvision --no-deps -f "https://download.pytorch.org/whl/nightly/$CUDA_VERSION/torch_nightly.html"
pip install --pre torchvision --no-deps -f "https://download.pytorch.org/whl/test/$CUDA_VERSION/torch_test.html"
elif [ "$PYTORCH_VERSION" = "stable" ]; then
pip install torch torchvision
fi
Expand Down
Binary file added tb_plugin/docs/images/diff_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tb_plugin/docs/images/lightning_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tb_plugin/docs/images/module_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions tb_plugin/fe/src/app.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ export enum Views {
Trace = 'Trace',
Distributed = 'Distributed',
Memory = 'Memory',
Module = 'Module'
Module = 'Module',
Lightning = 'Lightning'
}

const ViewNames = {
Expand All @@ -58,7 +59,8 @@ const ViewNames = {
[Views.Trace]: Views.Trace,
[Views.Distributed]: Views.Distributed,
[Views.Memory]: Views.Memory,
[Views.Module]: Views.Module
[Views.Module]: Views.Module,
[Views.Lightning]: Views.Lightning
}

const drawerWidth = 340
Expand Down Expand Up @@ -407,6 +409,7 @@ export const App = () => {
case Views.Memory:
return <MemoryView run={run} worker={worker} span={span} />
case Views.Module:
case Views.Lightning:
return <ModuleView run={run} worker={worker} span={span} />
}
} else {
Expand Down
1 change: 1 addition & 0 deletions tb_plugin/torch_tb_profiler/consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
DISTRIBUTED_VIEW = View(5, 'distributed', 'Distributed')
MEMORY_VIEW = View(6, 'memory', 'Memory')
MODULE_VIEW = View(7, 'module', 'Module')
LIGHTNING_VIEW = View(8, 'lightning', 'Lightning')

TOOLTIP_GPU_UTIL = \
'GPU Utilization:\n' \
Expand Down
7 changes: 4 additions & 3 deletions tb_plugin/torch_tb_profiler/profiler/module_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from collections import namedtuple
from typing import Dict, Generator, Iterable, List, Optional, Set, Tuple, Union

from .node import (DataLoaderNode, ModuleNode, OperatorNode, PLModuleNode,
ProfilerStepNode, is_operator_node)
from .node import (DataLoaderNode, ModuleNode, OperatorNode, OptimizerNode,
PLModuleNode, ProfilerStepNode, is_operator_node)
from .trace import BaseEvent, EventTypes, PLModuleEvent, PythonFunctionEvent


Expand Down Expand Up @@ -186,7 +186,8 @@ def _aggregate_modules(modules: Iterable[Union[ModuleNode, PLModuleNode]]) -> Di
def _get_node_list(tid2tree: Dict[int, OperatorNode], node_class) -> Generator[OperatorNode, None, None]:
"""Get all node with node_class from the operator tree"""
def traverse_node(node):
if type(node) not in (ProfilerStepNode, ModuleNode, OperatorNode, PLModuleNode, DataLoaderNode):
# Check OptimizerNode here because in PytorchLightning PLModuleNode is under OptimizerNoder.
if type(node) not in (ProfilerStepNode, ModuleNode, OperatorNode, OptimizerNode, PLModuleNode, DataLoaderNode):
return

if isinstance(node, node_class):
Expand Down
4 changes: 3 additions & 1 deletion tb_plugin/torch_tb_profiler/profiler/run_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ def generate_run_profile(self):

profile_run.module_stats = aggegate_module_view(self.profile_data.tid2tree, self.profile_data.events)
profile_run.pl_module_stats = aggegate_pl_module_view(self.profile_data.tid2tree, self.profile_data.events)
if profile_run.module_stats or (profile_run.is_pytorch_lightning and profile_run.pl_module_stats):
if profile_run.is_pytorch_lightning and profile_run.pl_module_stats:
profile_run.views.append(consts.LIGHTNING_VIEW)
elif profile_run.module_stats:
profile_run.views.append(consts.MODULE_VIEW)

return profile_run
Expand Down
3 changes: 1 addition & 2 deletions tb_plugin/torch_tb_profiler/profiler/tensor_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ def __contains__(cls, item):

class TC_OP_Allowlist(metaclass=TC_Allowlist_Meta):
# Refer to https://github.com/pytorch/pytorch/blob/69b2bf70f9c0e591ce5e566afa59e19618031ead/aten/src/ATen/autocast_mode.cpp#L290-L351 # noqa: E501
allowlist = ['aten::_convolution', 'aten::_convolution_nogroup',
'aten::conv1d', 'aten::conv2d', 'aten::conv3d', 'aten::conv_tbc',
allowlist = ['aten::_convolution', 'aten::conv1d', 'aten::conv2d', 'aten::conv3d', 'aten::conv_tbc',
'aten::conv_transpose1d', 'aten::conv_transpose2d', 'aten::conv_transpose3d',
'aten::convolution', 'aten::cudnn_convolution', 'aten::cudnn_convolution_transpose',
'aten::prelu', 'aten::addmm', 'aten::addmv', 'aten::addr',
Expand Down
2 changes: 1 addition & 1 deletion tb_plugin/torch_tb_profiler/static/index.html

Large diffs are not rendered by default.

0 comments on commit fc0d7bf

Please sign in to comment.