-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for combining DDP + RPC #800
Conversation
Summary: The example includes a simple model consisting of a sparse part and a dense part. The sparse part is an nn.EmbeddingBag stored on a parameter server and the dense part is an nn.Linear module residing on the trainers. The dense part on the trainers are replicated via DistributedDataParallel. A master creates the nn.EmbeddingBag and drives the training loop on the trainers. The training loop performs an embedding lookup via the Distributed RPC Framework and then executes the local dense component. Test Plan: Reviewers: Subscribers: Tasks: Tags:
rpc.shutdown() | ||
|
||
|
||
if __name__=="__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this to the run_python_examples.sh script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like distributed
is commented out in that script: https://github.com/pytorch/examples/blob/master/run_python_examples.sh#L178? I don't see any other distributed/rpc examples in that script either. I'm wondering if there was a reason to disable them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... was just commented out a week ago, could have been by mistake:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, actually they were added as commented out. I think we can just add a distributed function and uncomment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at #794 it seems like the goal was to have the entire script run within 5 minutes. If we add distributed to it, I don't think we can satisfy the goal without updating other examples in distributed. I'd prefer to make this change in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Can we add a requirements.txt
file to mention that this feature needs v1.6.0+?
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
cc @jlin27 |
* Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * RPC profiling recipe (#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Push latest changes from master into release/1.6 (#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (#1078) * Add RPC tutorial * Update to include recipes * Add Graph Mode Dynamic Quant tutorial (#1065) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Graph Mode Dynamic Quant tutorial Summary: Tutorial to demonstrate graph mode dynamic quant on BERT model. Currently not directly runnable as it requires to download glue dataset and fine-tuned model Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Add mobile recipes images * Update mobile recipe index * Remove RPC Profiling recipe from index * 1.6 model freezing tutorial (#1077) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Model Freezing in TorchScript Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Update title * Update recipes_index.rst Touch for rebuild. * Update dcgan_faces_tutorial.py Update labels to be floats to work around torch.full inference change. Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: supriyar <supriyar@fb.com> Co-authored-by: Brian Johnson <brianjo@fb.com> Co-authored-by: gchanan <gchanan@fb.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * Add files via upload * Create numeric_suite_tutorial.py * jlin27_numeric_suite_tutorial Made some syntax edits because original headings were not rendering properly and breaking the build: - Removed the lines of pound sign (#) delimiters under text because when placed under text, it renders them all as headers - Add lines of pound delimiters above certain blocks of text to force them to show up as plain text between the code rather than comments with the code - Added code syntax (e.g.``compare_weights``) Suggestions: - Link to code or documentation (for example in the beginning when referencing new code or new concepts) - Add a conclusion section with links to references or learn more at the end - Examples: https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html#conclusion Fixes: - Currently the tutorial references images in `/_static/img/` but they are placed in `/_static/`. Make sure these match up. * Delete compare_output.png * Delete compare_stub.png * Delete shadow.png * Add files via upload * RPC profiling recipe (#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Update numeric_suite_tutorial.py * Update numeric_suite_tutorial.py * Push latest changes from master into release/1.6 (#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (#1078) * Add RPC tutorial * Update to include recipes * Add Graph Mode Dynamic Quant tutorial (#1065) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Graph Mode Dynamic Quant tutorial Summary: Tutorial to demonstrate graph mode dynamic quant on BERT model. Currently not directly runnable as it requires to download glue dataset and fine-tuned model Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Add mobile recipes images * Update mobile recipe index * Remove RPC Profiling recipe from index * 1.6 model freezing tutorial (#1077) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Model Freezing in TorchScript Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: supriyar <supriyar@fb.com> Co-authored-by: Jessica Lin <jlin2700@gmail.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * Graph mode static quantization tutorial * RPC profiling recipe (#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Push latest changes from master into release/1.6 (#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (#1078) * Add RPC tutorial * Update to include recipes * Graph mode static quantization tutorial Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (pytorch#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (pytorch#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (pytorch#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * RPC profiling recipe (pytorch#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Push latest changes from master into release/1.6 (pytorch#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (pytorch#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (pytorch#1078) * Add RPC tutorial * Update to include recipes * Add Graph Mode Dynamic Quant tutorial (pytorch#1065) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Graph Mode Dynamic Quant tutorial Summary: Tutorial to demonstrate graph mode dynamic quant on BERT model. Currently not directly runnable as it requires to download glue dataset and fine-tuned model Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Add mobile recipes images * Update mobile recipe index * Remove RPC Profiling recipe from index * 1.6 model freezing tutorial (pytorch#1077) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Model Freezing in TorchScript Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Update title * Update recipes_index.rst Touch for rebuild. * Update dcgan_faces_tutorial.py Update labels to be floats to work around torch.full inference change. Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: supriyar <supriyar@fb.com> Co-authored-by: Brian Johnson <brianjo@fb.com> Co-authored-by: gchanan <gchanan@fb.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (pytorch#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (pytorch#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (pytorch#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * Add files via upload * Create numeric_suite_tutorial.py * jlin27_numeric_suite_tutorial Made some syntax edits because original headings were not rendering properly and breaking the build: - Removed the lines of pound sign (#) delimiters under text because when placed under text, it renders them all as headers - Add lines of pound delimiters above certain blocks of text to force them to show up as plain text between the code rather than comments with the code - Added code syntax (e.g.``compare_weights``) Suggestions: - Link to code or documentation (for example in the beginning when referencing new code or new concepts) - Add a conclusion section with links to references or learn more at the end - Examples: https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html#conclusion Fixes: - Currently the tutorial references images in `/_static/img/` but they are placed in `/_static/`. Make sure these match up. * Delete compare_output.png * Delete compare_stub.png * Delete shadow.png * Add files via upload * RPC profiling recipe (pytorch#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Update numeric_suite_tutorial.py * Update numeric_suite_tutorial.py * Push latest changes from master into release/1.6 (pytorch#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (pytorch#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (pytorch#1078) * Add RPC tutorial * Update to include recipes * Add Graph Mode Dynamic Quant tutorial (pytorch#1065) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Graph Mode Dynamic Quant tutorial Summary: Tutorial to demonstrate graph mode dynamic quant on BERT model. Currently not directly runnable as it requires to download glue dataset and fine-tuned model Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Add mobile recipes images * Update mobile recipe index * Remove RPC Profiling recipe from index * 1.6 model freezing tutorial (pytorch#1077) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add Model Freezing in TorchScript Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com> Co-authored-by: supriyar <supriyar@fb.com> Co-authored-by: Jessica Lin <jlin2700@gmail.com>
* Add TorchScript fork/join tutorial * Add note about zipfile format in serialization tutorial * Profiler recipe (pytorch#1019) * Profiler recipe Summary: Adding a recipe for profiler Test Plan: make html-noplot * [mobile] Mobile Perf Recipe * Minor syntax edits to mobile perf recipe * Remove built files * [android] android native app recipe * [mobile_perf][recipe] Add ChannelsLast recommendation * Adding distributed pipeline parallel tutorial * Add async execution tutorials * Fix code block in pipeline tutorial * Adding an Overview Page for PyTorch Distributed (pytorch#1056) * Adding an Overview Page for PyTorch Distributed * Let existing PT Distributed tutorials link to the overview page * Add a link to AMP * Address Comments * Remove unnecessary dist.barrier() * [Mobile Perf Recipe] Add the benchmarking part for iOS (pytorch#1055) * [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <jplin@fb.com> * Graph mode static quantization tutorial * RPC profiling recipe (pytorch#1068) * Initial commit * Update * Complete most of recipe * Add image * Link image * Remove extra file * update * Update * update * Push latest changes from master into release/1.6 (pytorch#1074) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add prototype_source directory * Add prototype directory * Add prototype * Remove extra "done" * Add REAME.txt * Update for prototype instructions * Update for prototype feature * refine torchvision_tutorial doc for windows * Update neural_style_tutorial.py (pytorch#1059) Updated the mistake in the Loading Images Section. * torch_script_custom_ops restructure (pytorch#1057) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port custom ops tutorial to new registration API, increase testability. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Kill some other occurrences of RegisterOperators Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update README.md * Make torch_script_custom_classes tutorial runnable I also fixed some warnings in the tutorial, and fixed some minor bitrot (e.g., torch::script::Module to torch::jit::Module) I also added some missing quotes around some bash expansions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update torch_script_custom_classes to use TORCH_LIBRARY (pytorch#1062) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> * Tutorial for DDP + RPC (pytorch#1071) * Update feature classification labels * Update NVidia -> Nvidia * Bring back default filename_pattern so that by default we run all galleries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tutorial for DDP + RPC. Summary: Based on example from pytorch/examples#800 * Add to main section Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Added separate code file and used literalinclude Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: pritam <pritam.damania@fb.com> * Make RPC profiling recipe into prototype tutorial (pytorch#1078) * Add RPC tutorial * Update to include recipes * Graph mode static quantization tutorial Co-authored-by: James Reed <jamesreed@fb.com> Co-authored-by: Jessica Lin <jplin@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com> Co-authored-by: Ivan Kobzarev <ivankobzarev@fb.com> Co-authored-by: Shen Li <shenli@devfair017.maas> Co-authored-by: Shen Li <cs.shenli@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Rohan Varma <rvarm1@fb.com> Co-authored-by: Edward Z. Yang <ezyang@fb.com> Co-authored-by: Yang Gu <yangu@microsoft.com> Co-authored-by: Hritik Bhandari <bhandari.hritik@gmail.com> Co-authored-by: Pritam Damania <9958665+pritamdamania87@users.noreply.github.com> Co-authored-by: pritam <pritam.damania@fb.com>
Summary: The example includes a simple model consisting of a sparse part
and a dense part. The sparse part is an nn.EmbeddingBag stored on a
parameter server and the dense part is an nn.Linear module residing on
the trainers. The dense part on the trainers are replicated via
DistributedDataParallel.
A master creates the nn.EmbeddingBag and drives the training loop on the
trainers. The training loop performs an embedding lookup via the
Distributed RPC Framework and then executes the local dense component.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: