From 6f1f81db751d28e675728ab36007be05db0614af Mon Sep 17 00:00:00 2001 From: Erjia Guan Date: Thu, 2 Jun 2022 08:40:27 -0700 Subject: [PATCH] Update tutorial about placing sharding_filter (#487) Summary: See the feedback from a user: https://github.com/pytorch/data/issues/454#issuecomment-1143256345 We should explicitly ask users to place `sharding_filter` as early as possible. Pull Request resolved: https://github.com/pytorch/data/pull/487 Reviewed By: wenleix Differential Revision: D36812259 Pulled By: ejguan fbshipit-source-id: 4c983f3216a80be398f85b20871e65b0e41627e0 --- docs/source/tutorial.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst index 910b7ca83..c5c241ded 100644 --- a/docs/source/tutorial.rst +++ b/docs/source/tutorial.rst @@ -176,6 +176,11 @@ When we re-run, we will get: ... n_sample = 6 +Note: + +- Place ``ShardingFilter`` (``datapipe.sharding_filter``) as early as possible in the pipeline, especially before expensive + operations such as decoding, in order to avoid repeating these expensive operations across worker/distributed processes. + You can find more DataPipe implementation examples for various research domains `on this page `_.