Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU]whisper readvalue optimize #26130

Open
wants to merge 96 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
2916414
Add profiler for CPU plugin.
xipingyan Aug 10, 2023
451c76d
Mark ReadValue's inputs and corresponding Assign.
xipingyan Aug 20, 2024
137beee
Only mark: ReadValue->Assign pairs.
xipingyan Aug 20, 2024
737fe5c
Optimize pattern match.
xipingyan Aug 21, 2024
6b05005
transformation test pass.
xipingyan Sep 3, 2024
58d9f6f
Test pass.
xipingyan Sep 6, 2024
d54dc25
Fix error: one param link to mulitple ReadValueWithSubgraphNode
xipingyan Sep 6, 2024
a533d73
Add submodel infer to MemoryInput::runDynamic
xipingyan Sep 6, 2024
f7339e3
Debug code
xipingyan Sep 6, 2024
e142a06
Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…
xipingyan Sep 6, 2024
4a2dba0
fix merge error
xipingyan Sep 6, 2024
bf7e493
Dynamic shape test pass
xipingyan Sep 6, 2024
d90144e
test whisper pass
xipingyan Sep 6, 2024
5a98e7b
Disable debug log to test performance. Got expected result:
xipingyan Sep 9, 2024
577721d
Add test.
xipingyan Sep 9, 2024
c23062e
Remove stateName in ov::Node
xipingyan Sep 10, 2024
592919b
Add env: ENABLE_RV for comprison test.
xipingyan Sep 10, 2024
f134307
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Sep 10, 2024
258c3c8
Merge branch 'xp/whisper_readvalue_optimize' of https://github.com/xi…
xipingyan Sep 10, 2024
5c771b0
rm debug log
xipingyan Sep 10, 2024
17b8ce3
[CPU] Introduce SubModel op and Composite node
EgorDuplensky Jul 4, 2024
3a6b83a
Apply review comments
EgorDuplensky Sep 6, 2024
f57851d
Merge branch 'egor/introduce_subgraph_node' into xp/whisper_readvalue…
xipingyan Sep 10, 2024
ca7dde8
Integrate SubModel, WIP.
xipingyan Sep 10, 2024
df7bb70
Remove not used variable.
xipingyan Sep 11, 2024
aa33812
Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…
xipingyan Sep 12, 2024
62772a9
Remove USE_SUBMODEL
xipingyan Sep 12, 2024
cebed6f
Fix search gap.
xipingyan Sep 13, 2024
d0f7986
Tmp version, test integrate new interface of Graph, Init and Activate.
xipingyan Sep 13, 2024
47f436c
After calling new interface of graph, fix memory inPlace issue.
xipingyan Sep 16, 2024
dbb9a3e
remove tmp code
xipingyan Sep 16, 2024
2ed69c0
"remove debug log"
xipingyan Sep 16, 2024
44c8fb2
Remove profiling code.
xipingyan Sep 18, 2024
87664d4
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Sep 18, 2024
153b4b8
Revert unchanged code.
xipingyan Sep 18, 2024
e188276
Simplify codes.
xipingyan Sep 18, 2024
2833602
Add judge whether subGraph can be called.
xipingyan Sep 18, 2024
0a6f13f
Fix test fail issue: readvalue have no any input.
xipingyan Sep 18, 2024
7ae318f
Remove get_body, m_subgraph, and update haveSubgraph.
xipingyan Sep 19, 2024
79b8272
Replace set_body with base class set_function
xipingyan Sep 19, 2024
9c0989d
remove debug env
xipingyan Sep 20, 2024
3eaea83
1: Add check memoryNode null
xipingyan Sep 25, 2024
d3bf35c
1: Remove reset_prime_mem setter
xipingyan Sep 25, 2024
266cf38
Remove getSupportedDescriptors in cpp.
xipingyan Sep 25, 2024
e94e67f
1. Removed const MemoryPtr& prime_mem() const
xipingyan Sep 26, 2024
e5402f8
1: Remove redefine supportedPrimitiveDescriptors
xipingyan Sep 26, 2024
74147b7
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Sep 27, 2024
8702a13
Simply codes.
xipingyan Oct 8, 2024
fe6f9b4
1: Remove semicolon after {}
xipingyan Oct 8, 2024
5f14cc2
1: visit_attributes add variable id info
xipingyan Oct 9, 2024
a4bc6cd
Remove is_in_subgraph, use hashset to check if exist.
xipingyan Oct 11, 2024
2da8a3a
Shorten final_successor_is_only_root to successor_is_root
xipingyan Oct 12, 2024
942a10c
Add hashset variable visited to reduce time complexity.
xipingyan Oct 14, 2024
6ce3e63
Take ReadValue Assign pair as stand feature.
xipingyan Oct 16, 2024
ef00d66
Move to MemoryInputSingle
xipingyan Oct 17, 2024
140862a
Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…
xipingyan Oct 17, 2024
f4a9dce
dynamic test pass.
xipingyan Oct 19, 2024
afaf42f
Remove resolveInPlaceEdges call in the Graph::Allocate.
xipingyan Oct 21, 2024
00da6b7
Add shape inference.
xipingyan Oct 22, 2024
bb006dc
Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…
xipingyan Oct 22, 2024
37bc18f
Remove search depth limit.
xipingyan Oct 22, 2024
ad2a87a
Update node search function name, make it easy to understand.
xipingyan Oct 22, 2024
6ccd217
Initial State doesn't work
maxnick Oct 18, 2024
f0142fc
Fix transformation
maxnick Oct 21, 2024
c3db59d
Fix isSuitable node check
maxnick Oct 21, 2024
0d01e93
Introduce a Single mode in the MemoryInputBase class
maxnick Oct 21, 2024
8403952
Make MemoryInput strictly non const
maxnick Oct 22, 2024
a34ab0c
Add single buffer state implementation
maxnick Oct 22, 2024
7319791
Merge remote-tracking branch 'origin/master' into Drop_Assign
maxnick Oct 22, 2024
dcb3603
Add commit call to the MemoryInputSingle
maxnick Oct 22, 2024
2c61fb7
Enable LoRA test with random tensors
maxnick Oct 22, 2024
ced56d6
Clean up LoRA test
maxnick Oct 23, 2024
1074f19
1: Merge branch 'maxsim/drop_assgin' into xp/whisper_readvalue_optimize
xipingyan Oct 25, 2024
dd5dd8a
remove memoryoutputsinglestub
xipingyan Oct 25, 2024
5e91806
Removed Assign node check when finding init_graph of ReadValue.
xipingyan Oct 25, 2024
cf8a18f
Let test support Indirect ReadValue Assgin Pair.
xipingyan Oct 25, 2024
893888e
Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…
xipingyan Oct 30, 2024
6fe7302
Update OutputConfig desc
xipingyan Oct 30, 2024
2f14480
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Nov 1, 2024
786c59e
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Nov 4, 2024
a998273
1: Remove InitGraphStatefulModelInplace, ReadValueAssignTest can cove…
xipingyan Nov 5, 2024
4abc1b0
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Nov 5, 2024
5d5ae39
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Nov 6, 2024
388c72a
create a separate input memory objects instead of share them in subgr…
xipingyan Nov 11, 2024
d0fb105
Move subGraph init into MemoryInput::initOptimalPrimitiveDescriptor()
xipingyan Nov 13, 2024
de904bb
Merge branch 'master' into xp/whisper_readvalue_optimize
xipingyan Nov 14, 2024
b621364
Temporarily skip this pattern. If MemoryInputSDPA supports Subgraph i…
xipingyan Nov 14, 2024
63bb470
1: Remove "using MemoryInputBase::MemoryInputBase;inheritance "
xipingyan Nov 15, 2024
5d6c9de
Remove assignedMem->redefineDesc(...
xipingyan Nov 15, 2024
e2371d5
update: std::shared_ptr<ov::Model> func = nullptr,
xipingyan Nov 15, 2024
800bca3
Simplify code about ov::optional init.
xipingyan Nov 15, 2024
84cddcd
Correct grammar for test description.
xipingyan Nov 15, 2024
128e3ab
Recover remove parent edges code, because it will trigger fail for mu…
xipingyan Nov 15, 2024
5bbf247
Just compare node with pointer,
xipingyan Nov 16, 2024
fa3ed85
All visited nodes should be flagged.
xipingyan Nov 16, 2024
1e9a327
Use shared_ptr to wrapper subGraph.
xipingyan Nov 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/plugins/intel_cpu/src/cpu_types.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ static const TypeToNameMap& get_type_to_name_tbl() {
{"Loop", Type::TensorIterator},
{"ReadValue", Type::MemoryInput}, // for construction from name ctor, arbitrary name is used
{"Assign", Type::MemoryOutput}, // for construction from layer ctor
{"ReadValueWithSubgraph", Type::MemoryInput},
{"Convert", Type::Convert},
{"NV12toRGB", Type::ColorConvert},
{"NV12toBGR", Type::ColorConvert},
Expand Down
2 changes: 2 additions & 0 deletions src/plugins/intel_cpu/src/extension.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "transformations/cpu_opset/common/op/power_static.hpp"
#include "transformations/cpu_opset/common/op/sdpa.hpp"
#include "transformations/cpu_opset/common/op/swish_cpu.hpp"
#include "transformations/cpu_opset/common/op/read_value_with_subgraph.hpp"
#include "transformations/cpu_opset/x64/op/interaction.hpp"
#include "transformations/cpu_opset/x64/op/mha.hpp"
#include "transformations/cpu_opset/x64/op/llm_mlp.hpp"
Expand Down Expand Up @@ -77,6 +78,7 @@ class TypeRelaxedExtension : public ov::OpExtension<ov::op::TypeRelaxed<Op>> {
OP_EXTENSION(ov::intel_cpu::SwishNode) \
OP_EXTENSION(ov::intel_cpu::SDPAWithTransposeReshape) \
OP_EXTENSION(ov::intel_cpu::NgramNode) \
OP_EXTENSION(ov::intel_cpu::ReadValueWithSubgraph) \
OP_EXTENSION(ov::op::internal::GatherCompressed) \
OP_EXTENSION(ov::op::internal::NonMaxSuppressionIEInternal) \
OP_EXTENSION(ov::op::internal::MulticlassNmsIEInternal) \
Expand Down
59 changes: 42 additions & 17 deletions src/plugins/intel_cpu/src/graph_optimizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3120,12 +3120,20 @@ void GraphOptimizer::MatchSdpaKvCache(Graph &graph) {
auto memInputNode = std::dynamic_pointer_cast<node::MemoryInputBase>(node);
OPENVINO_ASSERT(memInputNode, "MemoryInput node ", node->getName(), " has unexpected dynamic type");

ov::optional<Shape> input_shape;
ov::optional<ov::element::Type> input_prc;
ov::optional<std::vector<Shape>> input_shape;
ov::optional<std::vector<ov::element::Type>> input_prc;

std::vector<Shape> input_shape_vec;
std::vector<ov::element::Type> input_prc_vec;
if (!node->getParentEdges().empty()) {
input_shape = ov::optional<Shape>(node->getInputShapeAtPort(0));
input_prc = ov::optional<ov::element::Type>(node->getOriginalInputPrecisionAtPort(0));
for (size_t i = 0; i < node->getParentEdges().size(); i++) {
input_shape_vec.push_back(node->getInputShapeAtPort(i));
input_prc_vec.push_back(node->getOriginalInputPrecisionAtPort(i));
}
}
if (input_shape_vec.size() > 0) {
input_shape = ov::optional<std::vector<Shape>>(input_shape_vec);
input_prc = ov::optional<std::vector<ov::element::Type>>(input_prc_vec);
}
xipingyan marked this conversation as resolved.
Show resolved Hide resolved

//search for SDPA
Expand Down Expand Up @@ -3251,12 +3259,18 @@ void GraphOptimizer::DropRedundantMemoryOutput(Graph &graph) {
auto memInputNode = std::dynamic_pointer_cast<node::MemoryInputBase>(node);
OPENVINO_ASSERT(memInputNode, "MemoryInput node ", node->getName(), " has unexpected dynamic type");

ov::optional<Shape> inputShape;
ov::optional<ov::element::Type> inputPrc;

ov::optional<std::vector<Shape>> inputShapes;
ov::optional<std::vector<ov::element::Type>> inputPrcs;
if (!node->getParentEdges().empty()) {
inputShape = ov::optional<Shape>(node->getInputShapeAtPort(0));
inputPrc = ov::optional<ov::element::Type>(node->getOriginalInputPrecisionAtPort(0));
inputShapes = ov::optional<std::vector<Shape>>(std::vector<Shape>{});
inputPrcs = ov::optional<std::vector<ov::element::Type>>(std::vector<ov::element::Type>{});

auto& input_shape_vec = *inputShapes;
auto& input_prc_vec = *inputPrcs;
for (size_t i = 0; i < node->getParentEdges().size(); i++) {
input_shape_vec.push_back(node->getInputShapeAtPort(i));
input_prc_vec.push_back(node->getOriginalInputPrecisionAtPort(i));
}
}

//search for the MemoryOutputNode
Expand All @@ -3273,24 +3287,35 @@ void GraphOptimizer::DropRedundantMemoryOutput(Graph &graph) {
graph.RemoveEdge(memoryOutputNode->getParentEdgeAt(0));
// there are no output edges from MemoryOutput nodes

CPU_GRAPH_OPTIMIZER_SCOPE(DropRedundantMemoryOutput_SubGraph);
auto memInpNd = std::dynamic_pointer_cast<node::MemoryInput>(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert nullptr check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next line can cover nullptr check,
auto subGraph = memInpNd ? memInpNd->getSubGraph() : nullptr;

xipingyan marked this conversation as resolved.
Show resolved Hide resolved
auto subGraph = memInpNd ? memInpNd->getSubGraph() : nullptr;

// now replace the existing MemoryInput with a special type that works without the corresponding MemoryOutput
auto memInputSingle = std::make_shared<MemoryInputSingle>(memInputNode->getId(),
memInputNode->getName(),
memInputNode->getTypeStr(),
memInputNode->getOutputShapeAtPort(0),
memInputNode->getOriginalOutputPrecisionAtPort(0),
graph.getGraphContext(),
inputShape,
inputPrc);

inputShapes,
inputPrcs,
subGraph);
graph.AddNode(memInputSingle);

if (!memInputNode->getParentEdges().empty()) {
auto parentEdge = memInputNode->getParentEdgeAt(0);
auto parent = parentEdge->getParent();
const auto inputNum = parentEdge->getInputNum();
graph.RemoveEdge(parentEdge);
graph.CreateEdge(parent, memInputSingle, inputNum, 0);
auto parentEdgeNum = memInputNode->getParentEdges().size();
std::vector<ov::intel_cpu::EdgePtr> parentEdges;
for (size_t i = 0; i < parentEdgeNum; i++) {
auto parentEdge = memInputNode->getParentEdgeAt(i);
auto parent = parentEdge->getParent();
const auto inputNum = parentEdge->getInputNum();
parentEdges.push_back(parentEdge);
graph.CreateEdge(parent, memInputSingle, inputNum, parentEdge->getOutputNum());
}
for (auto parentEdge : parentEdges) {
graph.RemoveEdge(parentEdge);
}
xipingyan marked this conversation as resolved.
Show resolved Hide resolved
}

for (auto&& edge : memInputNode->getChildEdgesAtPort(0)) {
Expand Down
12 changes: 9 additions & 3 deletions src/plugins/intel_cpu/src/nodes/input.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include "nodes/node_config.h"
#include "openvino/core/parallel.hpp"
#include "shape_inference/shape_inference_pass_through.hpp"
#include "transformations/cpu_opset/common/op/read_value_with_subgraph.hpp"

using namespace dnnl;
using namespace dnnl::impl::cpu::x64;
Expand Down Expand Up @@ -217,13 +218,14 @@ jit_has_subnormals_base::fn_t jit_has_subnormals_function() {
#endif

Input::Input(const std::shared_ptr<ov::Node>& op, const GraphContext::CPtr context)
: Node(op, context, PassThroughShapeInferFactory()) {
: Node(op, context, PassThroughShapeInferFactory(op)) {
if (!one_of(op->get_type_info(),
op::v0::Parameter::get_type_info_static(),
op::v0::Constant::get_type_info_static(),
op::v0::Result::get_type_info_static(),
op::v3::ReadValue::get_type_info_static(),
op::v6::ReadValue::get_type_info_static()))
op::v6::ReadValue::get_type_info_static(),
ov::intel_cpu::ReadValueWithSubgraph::get_type_info_static()))
OPENVINO_THROW_NOT_IMPLEMENTED("CPU Input node doesn't support ngraph operation ",
op->get_type_name(),
" with name ",
Expand Down Expand Up @@ -479,7 +481,11 @@ void Input::selectOptimalPrimitiveDescriptor() {
supportedPrimitiveDescriptors.clear();

// and just use parent memory descriptor for Output node to avoid reorders insertion
NodeConfig config({PortConfig(getParentOutputMemDesc(getParentEdgeAt(0)), BlockedMemoryDesc::FULL_MASK, 0)}, {});
std::vector<PortConfig> inConfs;
for (size_t i = 0; i < getParentEdges().size(); i++) {
inConfs.push_back({PortConfig(getParentOutputMemDesc(getParentEdgeAt(i)), BlockedMemoryDesc::FULL_MASK, 0)});
}
NodeConfig config(inConfs, {});

supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::unknown);
selectPrimitiveDescriptorByIndex(0);
Expand Down
Loading
Loading