Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] in Xnnpack EP, the conversion for fused activation param isn't correct #23115

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions onnxruntime/core/providers/cpu/fp16/fp16_activations.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ struct LeakyRelu<MLFloat16> : public ElementWiseRangedTransform<MLFloat16> {
// MlasTanhActivation,
// MlasLogisticActivation,
// MlasClipActivation,
// Once it's added, please update TestNhwcConvReluClipFusion_FP16
// in xnnpack_basic_test.cc
// to enable outputs verification for Clip activation.
// MlasHardSigmoidActivation,

} // namespace functors
Expand Down
17 changes: 13 additions & 4 deletions onnxruntime/core/providers/xnnpack/detail/utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

#include "core/common/common.h"
#include "core/common/safeint.h"
#include "core/framework/float16.h"
#include "core/framework/node_unit.h"
#include "core/framework/tensorprotoutils.h"
#include "core/graph/graph.h"
Expand Down Expand Up @@ -245,7 +246,7 @@ std::unique_ptr<IndexedSubGraph::MetaDef> FuseActivation(const NodeUnit& node_un

const auto& activation_type = activation.OpType();
if (activation_type == "Clip") {
min = std::numeric_limits<float>::min();
min = std::numeric_limits<float>::lowest();
max = std::numeric_limits<float>::max();
bool min_max_are_attributes = activation.SinceVersion() == 1 || activation.SinceVersion() == 6;

Expand All @@ -267,9 +268,17 @@ std::unique_ptr<IndexedSubGraph::MetaDef> FuseActivation(const NodeUnit& node_un
ORT_ENFORCE(utils::HasExternalData(value) == false,
"External data is not supported for the scalar min/max Clip values");

value_to_set = utils::HasRawData(value)
? *reinterpret_cast<const float*>(value.raw_data().data())
: value.float_data()[0];
int32_t arg_type;
if (GetType(arg, arg_type) && arg_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if GetType(arg, arg_type) failed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally type info is always available, so I think this is ok. Shape info may be missing depending on the model.

The Conv op looks to be setup to allow fp32, u8, s8 and optionally fp16. Should this also handle u8 and s8 or should ClipReluChecker limit fusion to fp32 and fp16?

Copy link
Contributor Author

@mszhanyi mszhanyi Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, core runtime Clip fusion only supports float too.

if (initializer) {
Initializer i(*initializer, graph.ModelPath());
switch (initializer->data_type()) {
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT:
value = *i.data<float>();
break;
// double isn't currently supported
// case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE:
// value = static_cast<float>(*i.data<double>());
// break;
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT16:
value = math::halfToFloat(i.data<MLFloat16>()->val);
break;
default:
ORT_THROW("Unexpected data type for Clip input of ", initializer->data_type());
.
Shall we update them together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @snnn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the core Clip fusion as-is for now. Can be a separate PR if we think there's a use-case that would benefit.

Are you planning on updating ClipReluChecker to limit the types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may need more time to understand ClipQuantFusion
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/clip_quantizelinear.cc
But for the known reason, I have no idea about the next plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ClipQuantFusion is a separate topic as that's about ignoring a Clip or Relu when the Q zp and scale make it redundant.

I was asking if the XNNPACK EP ClipReluChecker needs to be updated to either limit the types it allows, or whether FuseActivation needs to handle u8 or s8 input for the Clip min/max.

This has no checks on types:

const NodeUnit* ClipReluChecker(const NodeUnit& node_unit,
const GraphViewer& graph,
const std::unordered_map<const Node*, const NodeUnit*>& supported_node_unit_map) {

But FuseActivation always uses a float in the activation params and with this PR is explicitly only checking for fp32 and fp16.

e.g. if there's a Conv node with u8 or s8 input it looks like ClipReluChecker will allow the activation, but FuseActivation won't do the right thing as the Clip min/max would be u8 or s8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked https://onnx.ai/onnx/operators/onnx__Conv.html#type-constraints, Onnx Conv node shouldn't have u8 or s8 inputs.

// arg is of type FP16
value_to_set = utils::HasRawData(value)
? (*reinterpret_cast<const MLFloat16*>(value.raw_data().data())).ToFloat()
: value.float_data()[0];
} else if (GetType(arg, arg_type) && arg_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT) {
value_to_set = utils::HasRawData(value)
? *reinterpret_cast<const float*>(value.raw_data().data())
: value.float_data()[0];
}
}
mszhanyi marked this conversation as resolved.
Show resolved Hide resolved
}
};
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/test/providers/cpu/nn/conv_fp16_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1323,4 +1323,4 @@ TEST(ConvFp16Test, SharedPrepackedWeights) {
} // namespace test
} // namespace onnxruntime

#endif // MLAS_F16VEC_INTRINSICS_SUPPORTED
#endif // MLAS_F16VEC_INTRINSICS_SUPPORTED
87 changes: 87 additions & 0 deletions onnxruntime/test/providers/xnnpack/xnnpack_basic_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@

#include "core/common/logging/logging.h"
#include "core/common/span_utils.h"
#include "core/framework/float16.h"
#include "core/framework/utils.h"
#include "core/graph/graph.h"
#include "core/providers/xnnpack/xnnpack_execution_provider.h"
#include "core/providers/xnnpack/xnnpack_init.h"
#include "core/session/inference_session.h"
#include "core/session/onnxruntime_cxx_api.h"
#include "core/session/onnxruntime_session_options_config_keys.h"
Expand Down Expand Up @@ -89,6 +91,91 @@ TEST(XnnpackEP, TestNhwcConvReluClipFusion) {
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion", std::move(ep), feeds, params);
}

#ifdef XNNPACK_FP16_SUPPORTED
// This test can be removed if Mlas implemented FP16 Clip fusion.
// Now TestNhwcConvReluClipFusion_FP16 skips output verification
TEST(XnnpackEP, TestNhwcConvReluFusion_FP16) {
const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "nhwc_conv_relu_model_fp16.onnx";

RandomValueGenerator generator;
TensorShape input_shape_x{1, 16, 16, 192};
std::vector<MLFloat16> input_x = generator.Uniform<MLFloat16>(input_shape_x.GetDims(), -128, 128);

OrtValue ml_value_x;
CreateMLValue<MLFloat16>(input_shape_x.GetDims(), input_x.data(), OrtMemoryInfo(), &ml_value_x);

NameMLValMap feeds;
feeds.insert(std::make_pair("model_input", ml_value_x));

std::function<void(const Graph&)> verify = [](const Graph& graph) -> void {
ASSERT_EQ(graph.NumberOfNodes(), 2) << "Transpose nodes should have been removed, and "
"Conv+Relu should have been fused, leaving 2 nodes.";
auto node_iter = graph.Nodes().begin();
auto check_node = [](const Node& node, const std::string& fusion_type) {
const auto& attr = node.GetAttributes();
auto activation = attr.find("activation");
ASSERT_NE(activation, attr.cend()) << "Fused node should have activation attribute";
ASSERT_EQ(activation->second.s(), fusion_type);
};

++node_iter;
check_node(*node_iter, "Relu");
};

EPVerificationParams params;
params.ep_node_assignment = ExpectedEPNodeAssignment::Some;
params.fp32_abs_err = 0.5f;
params.graph_verifier = &verify;

auto ep = DefaultXnnpackExecutionProvider();
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluFusion_FP16", std::move(ep), feeds, params);
};

// Now, this Test is mainly check whether Xnnpack Clip fusion works.
TEST(XnnpackEP, TestNhwcConvReluClipFusion_FP16) {
const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "nhwc_conv_clip_relu_fp16.onnx";

RandomValueGenerator generator;
TensorShape input_shape_x{1, 16, 16, 192};
std::vector<MLFloat16> input_x = generator.Uniform<MLFloat16>(input_shape_x.GetDims(), -128, 128);

OrtValue ml_value_x;
CreateMLValue<MLFloat16>(input_shape_x.GetDims(), input_x.data(), OrtMemoryInfo(), &ml_value_x);

NameMLValMap feeds;
feeds.insert(std::make_pair("model_input", ml_value_x));

std::function<void(const Graph&)> verify = [](const Graph& graph) -> void {
ASSERT_EQ(graph.NumberOfNodes(), 3) << "Transpose nodes should have been removed, and "
"Conv+Relu and Conv+Clip should have been fused, leaving 3 nodes.";
auto node_iter = graph.Nodes().begin();
auto check_node = [](const Node& node, const std::string& fusion_type) {
const auto& attr = node.GetAttributes();
auto activation = attr.find("activation");
ASSERT_NE(activation, attr.cend()) << "Fused node should have activation attribute";
ASSERT_EQ(activation->second.s(), fusion_type);
};

// check 2nd and 3rd nodes.
// the first node is the Conv that does not get fused (created after first call to GetCapability)
// the 2nd and 3rd nodes are the fused nodes (created after second call to GetCapability)
++node_iter;
check_node(*node_iter, "Clip");
++node_iter;
check_node(*node_iter, "Relu");
};

EPVerificationParams params;
params.ep_node_assignment = ExpectedEPNodeAssignment::Some;
params.fp32_abs_err = 0.5f;
params.graph_verifier = &verify;

auto ep = DefaultXnnpackExecutionProvider();
// So far, CPU EP doesn't support Fp16 Conv fusion, so verify_outputs is skipped.
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false);
}
#endif

// test we can share the cpu ep allocator with the xnnpack EP
TEST(XnnpackEP, TestAllocatorSharing) {
auto init_session = [](std::vector<std::shared_ptr<IExecutionProvider>>& eps,
Expand Down
Binary file not shown.
Binary file not shown.
Loading