Skip to content

Commit

Permalink
Fix a issue that CUDA EP fallback to much nodes to CPU for some case …
Browse files Browse the repository at this point in the history
…which cause huge data copy. If the node's inputs are all initializer, we shouldn't fallback the node to CPU. (#1727)

Fix an issue that CUDA EP fallback too much nodes to CPU for some case which cause huge data copy.
#1675

Currently, if the node's inputs are all as initialier, CUDA EP will fallback it to CPU. And it will also fallback some nodes under it. It could cause some huge data copy. for the case reported by a user, it has several Slices with input from initializer, and a Concat op to concat the output from Slice output. The data is huge 16MB after concat, which make the data copy from CPU to GPU quite costly because it's a sync copy.

Fix
If the node's inputs are all initializer, we shouldn't fallback the node to CPU.
  • Loading branch information
HectorSVC authored Aug 29, 2019
1 parent 25d02a3 commit 810ee00
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions onnxruntime/core/providers/cuda/cuda_execution_provider.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1018,19 +1018,25 @@ CUDAExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph,
// Note that nodes with only inputs from initializer would not be place on CUDA
// Ideally, those nodes should be eliminated in constant folding
bool should_force_outside = true;
bool all_input_are_initializer = true;
node.ForEachWithIndex(
node.InputDefs(),
[&](const NodeArg& def, size_t index) {
const ONNX_NAMESPACE::TensorProto* initializer = nullptr;
// The input is not a initializer and the input is from CPU
// or the input declared as CPU memory and is from CPU
// in that case we should still keep the node on CUDA
if ((!graph.GetInitializedTensor(def.Name(), initializer) && !defs_outside_cuda.count(&def)) ||
bool initializer_input = graph.GetInitializedTensor(def.Name(), initializer);
if ((!initializer_input && !defs_outside_cuda.count(&def)) ||
(defs_outside_cuda.count(&def) && cuda_kernel_def->kernel_def->IsInputOnCpu(index)))
should_force_outside = false;
if (!initializer_input) {
all_input_are_initializer = false;
}
return Status::OK();
});
if (should_force_outside) {
// If all the inputs are initialier, we shouldn't force it to CPU
if (should_force_outside && !all_input_are_initializer) {
force_outside = true;
}
}
Expand Down

0 comments on commit 810ee00

Please sign in to comment.