Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU]whisper readvalue optimize #26130

Open
wants to merge 96 commits into
base: master
Choose a base branch
from

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Aug 20, 2024

Details:

Tickets:

  • 128743

Profile each node execute time.
Support Static and Dynamic infer.

Signed-off-by: xipingya <xiping.yan@intel.com>
If reset is not called, these marked nodes also desn't need to be executed.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@xipingyan xipingyan requested a review from maxnick August 20, 2024 08:33
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Aug 20, 2024
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 3, 2024
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
decoder network: 20ms -> 5 ms.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@github-actions github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings labels Sep 10, 2024
Signed-off-by: xipingya <xiping.yan@intel.com>
@xipingyan xipingyan force-pushed the xp/whisper_readvalue_optimize branch from 569f53f to dd5dd8a Compare October 25, 2024 06:11
Copy link
Contributor Author

@xipingyan xipingyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @maxnick Please help review again, I have merged your PR.

src/plugins/intel_cpu/src/nodes/memory.hpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot removed category: IE Tests OpenVINO Test: plugins and common category: TEMPLATE OpenVINO Template plugin labels Oct 30, 2024
CuriousPanCake pushed a commit to CuriousPanCake/openvino that referenced this pull request Nov 6, 2024
…toolkit#26819)

### Details:
- *Pattern: QKV_Reshape -> QKV_Transpose ->
SDPA->OUT_Transpse->OUT_Reshape*
 - *Fuse this pattern to: SDPA*
- *This hotspot can be observed after
openvinotoolkit#26130, this PR's
implementation doesn't depend on it.*

### Tickets:
 - *153616*

---------

Signed-off-by: xipingya <xiping.yan@intel.com>
…aph input memory.

avoid data corruption.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Comment on lines 190 to 200
MemoryInput(const std::shared_ptr<ov::Node>& op, const GraphContext::CPtr ctx);
MemoryInput(const std::string id,
const std::string& name,
const std::string& type,
const Shape& output_shape,
const ov::element::Type& output_prc,
const GraphContext::CPtr context,
const ov::optional<std::vector<Shape>>& input_shape,
const ov::optional<std::vector<ov::element::Type>>& input_prc,
std::shared_ptr<ov::Model> func,
mode mode = mode::read_value_assign);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you define MemoryInput constructors. Please remove the constructor inheritance (the line above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed
using MemoryInputBase::MemoryInputBase;

src/plugins/intel_cpu/src/nodes/memory.hpp Show resolved Hide resolved
private:
std::shared_ptr<ov::Model> body = nullptr;
ov::intel_cpu::Graph subGraph;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using unique_ptr, to reduce the size of the MemoryInput object, as we don't always need to allocate memory for subGraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because body is also used in other places, and getSubGraph() also return it to outside.
If we use unique_ptr, it cannot be copied or assigned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking not about the body but about the subGraph

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I found cpu plugin only support to CPP 11, std::make_unique is enabled since CPP 14.
If I upgrade CPP 11 to 14, it will have a big impact.
So I use shared_ptr first.


if (haveSubgraph() && isDynamic) {
// Update to MemInpSingleShapeInfer
shapeInference = PassThroughShapeInferFactory(body).makeShapeInfer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that this is correct? The input subgraph being fused into MemoryInput, may produce shapes different from the Parameters attached to MemoryInput. In this case we should use InternalDynShapeInferFactory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said: "The input subgraph being fused into MemoryInput, may produce shapes different from the Parameters attached to MemoryInput".
This is my reason of adding this shape infer.
https://github.com/xipingyan/openvino/blob/b621364e611bf7304a2d286fb914d296f06a4ae5/src/plugins/intel_cpu/src/shape_inference/shape_inference_pass_through.hpp#L36-L49

we need to run m_body->validate_nodes_and_infer_types(); in order to get final real shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't. We can postpone this step till execute. This is how the internal dynamic nodes work. Please take a look at the composite node also. This additional shape inference, which you introduced, adds additional unnecessary overhead as it's bein run every execute call while we really need the output shapes from the subgraph only when its processing is required (only when state is reset), in all the other cases we retrieve the output shape from the state itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it introduces unnecessary shape infer if there is no reset state. I will try to move them to runtime.

Comment on lines 673 to 674
subGraph.Init(body, context, graphInputConfig, graphOutputConfig);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a sanity check that the subgraph's actual input/output primitive descriptors are compatible with the ones from the MemoryInput node config.

Copy link
Contributor Author

@xipingyan xipingyan Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did other test, I found I seem not should add sanity check for input and output primitive.
In order to avoid insert reorder in outside of ReadValueWithSubgraph, inner subgraph input use ReadValueWithSubgraph parent primitives, and inner subgraph output uses ReadValueWithSubgraph child primitive. They maybe should be not compatible.

INTERNAL_OP_SCOPE(intel_cpu_ReadValueWithSubgraphNode_clone_with_new_inputs);

check_new_args_count(this, new_args);
auto op = std::make_shared<ov::intel_cpu::ReadValueWithSubgraph>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pass the variable here?

Comment on lines +49 to +52
if (node->get_output_target_inputs(0).size() == 0u) {
found_output = true;
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense to store the whole DFS subtree, since this is a path to the output node, so checking if a child node is a part of such a subtree may speed up subsequent dfs calls.
In the current implementation we store only nodes visited in reverse_dfs, in the worst case, when we have 1/2N nodes of the ReadValue path all of them parents of the same subgraph leading to an output, so we will have 1/2N * 1/2N iterations, which is still O(N^2). But, if we store the output path as well, all the nodes will be visited once, giving O(N) complexity.

Copy link
Contributor Author

@xipingyan xipingyan Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. all visited nodes also should be flagged. Updated.

…n the future, it may be deleted.

Signed-off-by: xipingya <xiping.yan@intel.com>
2: Adopt parent configuration, avoid to insert reorder before the MemoryInput.
3: move prepare param to runDynamic, because it is not called each time.
It is responsibility of MemoryInputSingle or MemoryOutput
Add: CPU_GRAPH_OPTIMIZER_SCOPE(DropRedundantMemoryOutput_SubGraph);
before create edge, call graph.RemoveEdge(parentEdge);

Signed-off-by: xipingya <xiping.yan@intel.com>
…litply parents edges.

Signed-off-by: xipingya <xiping.yan@intel.com>
Update comments: // Flag: find Output node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants