Make output buffers for arugment inputs to GPU operators pinned. #3728

mzient · 2022-03-09T16:40:08Z

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Category:

Other Performance optimization

Description:

To date, the argument inputs were not treated as being on a stage boundary, so they were not pinned. This change increases the scope of input buffer pinning in an effort to avoid H2D copies from non-pinned memory.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2649

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-09T16:42:55Z

CI MESSAGE: [4109799]: BUILD STARTED

JanuszL · 2022-03-09T16:45:45Z

dali/pipeline/executor/executor.h

@@ -640,13 +640,27 @@ std::vector<int> Executor<WorkspacePolicy, QueuePolicy>::GetTensorQueueSizes(con
 template <typename WorkspacePolicy, typename QueuePolicy>
 void Executor<WorkspacePolicy, QueuePolicy>::PrepinData(
    std::vector<tensor_data_store_queue_t> &tensor_to_store_queue, const OpGraph &graph) {
-  // We only pin what we need
+  // We only pin what we need:
+  // The inputs of mixed ops are potentially used for H2D copies...
  for (int i = 0; i < graph.NumOp(OpType::MIXED); i++) {


Now it will apply this also for decoders. I'm not sure if we want this (in some cases they don't need the input to be pinned, in other we copy to a staging buffer - like https://github.com/NVIDIA/DALI/blob/main/dali/operators/decoder/nvjpeg/nvjpeg_decoder_decoupled_api.h#L867).

Do you think it's a big problem? I could specifically exclude decoders by name, although it seems a bit ugly.

I was rather curious if it was a conscious decision. Maybe you can remove this https://github.com/NVIDIA/DALI/blob/main/dali/operators/decoder/nvjpeg/nvjpeg_decoder_decoupled_api.h#L867 if you are going to pin the input anyway.

dali-automaton · 2022-03-09T18:10:56Z

CI MESSAGE: [4109799]: BUILD PASSED

jantonguirao · 2022-03-10T08:09:36Z

dali/pipeline/executor/executor.h

+    for (int j = 0; j < node.spec.NumInput(); ++j) {
      auto tid = node.parent_tensors[j];
      // Use pinned memory only when it is useful
-      if (node.spec.name() == "MakeContiguous" && node.spec.NumOutput() == 1) {
+      auto &parent_tensor_queue =
+          get_queue<OpType::CPU, StorageDevice::CPU>(tensor_to_store_queue_[tid]);
+      for (auto &tensor : parent_tensor_queue) {
+        tensor->set_pinned(node.spec.OutputDevice(0) == "gpu" && !RestrictPinnedMemUsage());
+      }
+    }
+  }


bool pinned = node.spec.OutputDevice(0) == "gpu" && !RestrictPinnedMemUsage(); for (int j = 0; j < node.spec.NumInput(); ++j) { ... for (auto &tensor : parent_tensor_queue) { tensor->set_pinned(pinned); }

You can extract the condition outside of both loops.

dali/pipeline/executor/executor.h

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-10T13:13:20Z

CI MESSAGE: [4117733]: BUILD STARTED

dali-automaton · 2022-03-10T14:14:56Z

CI MESSAGE: [4117733]: BUILD PASSED

…DIA#3728) * Make output buffers for argument inputs to GPU operators pinned. * Pin GPU operators' CPU inputs and all mixed operators' inputs (except decoders) Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Make output buffers for arugment inputs to GPU operators pinned.

6f76107

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL self-assigned this Mar 9, 2022

JanuszL reviewed Mar 9, 2022

View reviewed changes

JanuszL approved these changes Mar 9, 2022

View reviewed changes

jantonguirao approved these changes Mar 10, 2022

View reviewed changes

jantonguirao self-assigned this Mar 10, 2022

Exclude decoders from input pinning.

2db2813

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL approved these changes Mar 10, 2022

View reviewed changes

jantonguirao approved these changes Mar 10, 2022

View reviewed changes

mzient merged commit 925ea0c into NVIDIA:main Mar 10, 2022

JanuszL mentioned this pull request Mar 30, 2022

DALI 2022 roadmap #3774

Closed

klecki mentioned this pull request Sep 27, 2022

Add pass-through tracking to auto-pinning buffers #4294

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make output buffers for arugment inputs to GPU operators pinned. #3728

Make output buffers for arugment inputs to GPU operators pinned. #3728

mzient commented Mar 9, 2022

dali-automaton commented Mar 9, 2022

JanuszL Mar 9, 2022

mzient Mar 9, 2022

JanuszL Mar 9, 2022

dali-automaton commented Mar 9, 2022

jantonguirao Mar 10, 2022

dali-automaton commented Mar 10, 2022

dali-automaton commented Mar 10, 2022

Make output buffers for arugment inputs to GPU operators pinned. #3728

Make output buffers for arugment inputs to GPU operators pinned. #3728

Conversation

mzient commented Mar 9, 2022

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

dali-automaton commented Mar 9, 2022

JanuszL Mar 9, 2022

Choose a reason for hiding this comment

mzient Mar 9, 2022

Choose a reason for hiding this comment

JanuszL Mar 9, 2022

Choose a reason for hiding this comment

dali-automaton commented Mar 9, 2022

jantonguirao Mar 10, 2022

Choose a reason for hiding this comment

dali-automaton commented Mar 10, 2022

dali-automaton commented Mar 10, 2022