Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete data layout transform #7440

Merged
merged 17 commits into from
Jan 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions paddle/framework/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,13 @@ cc_library(scope SRCS scope.cc DEPS glog threadpool)
cc_test(scope_test SRCS scope_test.cc DEPS scope)

cc_library(data_device_transform SRCS data_device_transform.cc DEPS tensor)
nv_test(data_device_transform_test SRCS data_device_transform_test.cu
DEPS operator op_registry init math_function)

cc_library(data_type_transform SRCS data_type_transform.cc DEPS tensor)

cc_library(data_layout_transform SRCS data_layout_transform.cc DEPS tensor math_function)
cc_test(data_layout_transform_test SRCS data_layout_transform_test.cc DEPS data_layout_transform)

cc_library(data_transform SRCS data_transform.cc DEPS math_function tensor
framework_proto selected_rows data_device_transform data_type_transform data_layout_transform)
Expand Down Expand Up @@ -82,5 +87,3 @@ cc_test(init_test SRCS init_test.cc DEPS init)

cc_test(op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context framework_proto)
cc_test(cow_ptr_tests SRCS details/cow_ptr_test.cc)
nv_test(data_device_transform_test SRCS data_device_transform_test.cu
DEPS operator op_registry init math_function)
1 change: 1 addition & 0 deletions paddle/framework/data_device_transform_test.cu
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ TEST(Operator, CPUtoGPU) {
// get output
auto* output2 = scope.Var("OUT2");
gpu_op->Run(scope, cuda_place);
VLOG(3) << "after gpu_op run";

// auto* output2_ptr = output2->Get<LoDTensor>().data<float>();
DeviceContextPool& pool = DeviceContextPool::Instance();
Expand Down
51 changes: 30 additions & 21 deletions paddle/framework/data_layout_transform.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -14,12 +14,23 @@ limitations under the License. */

#include "paddle/framework/data_layout_transform.h"

#include "paddle/framework/tensor.h"
#include "paddle/operators/math/math_function.h"

namespace paddle {
namespace framework {

std::vector<int> GetAxis(const DataLayout& from, const DataLayout& to) {
PADDLE_ENFORCE_NE(from, to,
"layout transform should transform different layout");
if (from == DataLayout::kNCHW && to == DataLayout::kNHWC) {
return {0, 2, 3, 1};
} else if (from == DataLayout::kNHWC && to == DataLayout::kNCHW) {
return {0, 3, 1, 2};
} else {
PADDLE_THROW("unsupported transform");
}
}

struct CastDataLayout {
CastDataLayout(const platform::DeviceContext* ctx,
const std::vector<int>& axis, const framework::Tensor& in,
Expand All @@ -44,38 +55,36 @@ struct CastDataLayout {
}
};

void TransDataLayout(const std::vector<int>& axis,
const platform::DeviceContext* ctx,
const KernelTypePair& kernel_pair, const Variable& in,
Variable* out) {
PADDLE_ENFORCE(in.IsType<Tensor>(), "Only support Tensor transform!.");
void TransDataLayout(const OpKernelType& kernel_type_for_var,
const OpKernelType& expected_kernel_type, const Tensor& in,
Tensor* out) {
PADDLE_ENFORCE(
platform::places_are_same_class(kernel_pair.first.place_,
kernel_pair.second.place_),
platform::places_are_same_class(kernel_type_for_var.place_,
expected_kernel_type.place_),
"TransDataLayout only support DataLayout transform on same place!");
PADDLE_ENFORCE(kernel_pair.first.data_type_ == kernel_pair.second.data_type_,
"TransDataLayout only support Datatype are same!");

auto src = in.Get<Tensor>();
auto* dst = out->GetMutable<Tensor>();
PADDLE_ENFORCE(arity(src.dims()) == 4, "Input Arity Only Suppport 4!");
PADDLE_ENFORCE(arity(in.dims()) == 4, "Input Arity only support 4!");

auto& pool = platform::DeviceContextPool::Instance();

auto src_dim = src.dims();
auto src_dim = in.dims();
std::vector<int64_t> dst_dim;

auto axis = GetAxis(kernel_type_for_var.data_layout_,
expected_kernel_type.data_layout_);
dst_dim.resize(axis.size());
for (size_t i = 0; i < axis.size(); i++) {
dst_dim[i] = src_dim[axis[i]];
}

dst->Resize(make_ddim(dst_dim));
auto place = kernel_pair.second.place_;
dst->mutable_data(place, src.type());
out->Resize(make_ddim(dst_dim));
out->mutable_data(expected_kernel_type.place_, in.type());

auto src_type = kernel_pair.first.data_type_;
framework::VisitDataType(src_type, CastDataLayout(ctx, axis, src, dst));
framework::VisitDataType(
framework::ToDataType(in.type()),
CastDataLayout(pool.Get(expected_kernel_type.place_), axis, in, out));

dst->set_layout(kernel_pair.second.data_layout_);
out->set_layout(expected_kernel_type.data_layout_);
}

} // namespace framework
Expand Down
12 changes: 6 additions & 6 deletions paddle/framework/data_layout_transform.h
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -15,17 +15,17 @@ limitations under the License. */
#pragma once

#include "paddle/framework/op_kernel_type.h"
#include "paddle/framework/tensor.h"
#include "paddle/framework/variable.h"

namespace paddle {
namespace framework {

using KernelTypePair = std::pair<OpKernelType, OpKernelType>;
std::vector<int> GetAxis(const DataLayout& from, const DataLayout& to);

void TransDataLayout(const std::vector<int>& axis,
const platform::DeviceContext* ctx,
const KernelTypePair& kernel_pair, const Variable& in,
Variable* out);
void TransDataLayout(const OpKernelType& kernel_type_for_var,
const OpKernelType& expected_kernel_type, const Tensor& in,
Tensor* out);

} // namespace framework
} // namespace paddle
44 changes: 44 additions & 0 deletions paddle/framework/data_layout_transform_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/framework/data_layout_transform.h"

#include "gtest/gtest.h"
#include "paddle/platform/device_context.h"

TEST(DataTransform, DataLayoutFunction) {
using namespace paddle::framework;
using namespace paddle::platform;

auto place = CPUPlace();
Tensor in = Tensor();
Tensor out = Tensor();
in.mutable_data<double>(make_ddim({2, 3, 1, 2}), place);
in.set_layout(DataLayout::kNHWC);

auto kernel_nhwc = OpKernelType(proto::DataType::FP32, place,
DataLayout::kNHWC, LibraryType::kPlain);
auto kernel_ncwh = OpKernelType(proto::DataType::FP32, place,
DataLayout::kNCHW, LibraryType::kPlain);

TransDataLayout(kernel_nhwc, kernel_ncwh, in, &out);

EXPECT_TRUE(out.layout() == DataLayout::kNCHW);
EXPECT_TRUE(out.dims() == make_ddim({2, 2, 3, 1}));

TransDataLayout(kernel_ncwh, kernel_nhwc, in, &out);

EXPECT_TRUE(in.layout() == DataLayout::kNHWC);
EXPECT_TRUE(in.dims() == make_ddim({2, 3, 1, 2}));
}
31 changes: 28 additions & 3 deletions paddle/framework/data_transform.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,43 @@ limitations under the License. */
#include "paddle/framework/data_transform.h"

#include "paddle/framework/data_device_transform.h"
#include "paddle/framework/data_layout_transform.h"

namespace paddle {
namespace framework {

static void PassTensorData(Tensor* from, Tensor* to) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove static kerword

to->ShareDataWith(*from);
*from = Tensor();
}

void DataTransform(const OpKernelType& expected_kernel_type,
const OpKernelType& kernel_type_for_var,
const Tensor& input_tensor, Tensor* out) {
const Tensor& input_tensor, Tensor* output_tensor) {
bool transformed = false;
Tensor in;
in.ShareDataWith(input_tensor);
Tensor out;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the datatype transform ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in next pr

// do layout transform
if (NeedTransformLayout(expected_kernel_type.data_layout_,
kernel_type_for_var.data_layout_)) {
TransDataLayout(kernel_type_for_var, expected_kernel_type, in, &out);
transformed = true;
PassTensorData(&out, &in);
}

// do device transform
if (!platform::is_same_place(kernel_type_for_var.place_,
expected_kernel_type.place_)) {
DeviceTransform(input_tensor, expected_kernel_type.place_, out);
DeviceTransform(in, expected_kernel_type.place_, &out);
transformed = true;
PassTensorData(&out, &in);
}
PADDLE_ENFORCE_NOT_NULL(out, "out should not be null");

PADDLE_ENFORCE(transformed, "no transform is done, please check!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment follow UpperCase, and in written English.
"No transform is applied"

// get output data
output_tensor->ShareDataWith(in);
}

void CopyVariableWithTensor(const Variable& in_var, const Tensor& tensor,
Expand Down
7 changes: 6 additions & 1 deletion paddle/framework/op_kernel_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,14 @@ inline std::string KernelTypeToString(const OpKernelType& kernel_key) {
return stream.str();
}

inline bool NeedTransformLayout(const DataLayout& l, const DataLayout& r) {
return l != DataLayout::kAnyLayout && r != DataLayout::kAnyLayout && l != r;
}

inline bool TransFromNeeded(const OpKernelType& l, const OpKernelType& r) {
return (!platform::places_are_same_class(l.place_, r.place_)) ||
(l.data_type_ != r.data_type_) || (l.data_layout_ != r.data_layout_);
(l.data_type_ != r.data_type_) ||
Copy link
Contributor

@dzhwinter dzhwinter Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are in the wrong way.

let's start from a simple convolution case:

data -> conv2d -> batch_norm -> fc -> softmax

we may have a quite different layout pipelines choices.

NHWC->NCHW -> NCHW -> NCHW
NHWC->NHWC -> NCHW -> NCHW
NHWC->NHWC -> NHWC -> NCHW
....
NCHW->NCHW -> NCHW -> NCHW
NCHW->NHWC -> NCHW -> NCHW
....

All of them are legal, but only one of them is the best. in cudnn, NHWC is 20% faster than other formats.
So, the answer to select one from above, it's would be NHWC pipeline, no matter what the data format is, just transform the data into NHWC, then go through the data pipeline.

In the case above, I want to claim that we do not run on mixed devices, we can run on different device type..
It means the data pipeline layout can be determined once we have the device information, we only need to transform data before or after the data pipeline.
data pipeline refer to conv2d -> batch_norm -> fc -> softmax, without data input and output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, we support multiple device, but not mixed device.

Copy link
Member Author

@jacquesqiao jacquesqiao Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I Agree with you, we have a data process chain, and there should be some best practice. On the other hand, I also think that the current data transform just does transform according to expected_kernel_type and actual_kernel_type_for_var, it does not consider what kind of expected_kernel_type it should use, I think to find the best expected_kernel_type should be implemented in GetExpectedKernelType by the user or by the framework. So it does not conflict with the current implementation.

NeedTransformLayout(l.data_layout_, r.data_layout_);
}

} // namespace framework
Expand Down