From fd3d6a1ca8410c51b27ec389ab6e0b8912b753b7 Mon Sep 17 00:00:00 2001 From: erjia Date: Wed, 1 Jun 2022 15:47:56 +0000 Subject: [PATCH] Update tutorial about placing sharding_filter --- docs/source/tutorial.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/source/tutorial.rst b/docs/source/tutorial.rst index 910b7ca83..c5c241ded 100644 --- a/docs/source/tutorial.rst +++ b/docs/source/tutorial.rst @@ -176,6 +176,11 @@ When we re-run, we will get: ... n_sample = 6 +Note: + +- Place ``ShardingFilter`` (``datapipe.sharding_filter``) as early as possible in the pipeline, especially before expensive + operations such as decoding, in order to avoid repeating these expensive operations across worker/distributed processes. + You can find more DataPipe implementation examples for various research domains `on this page `_.