CVS-175736-[OVEP] Enable stateful mode for Phi-silica models #821

Kotomi-Du · 2025-10-03T00:23:58Z

Description

Recognize other LLM models specifically for phi-silica models to trigger the path of making stateful model.

Motivation and Context

Combined with #830 and #831 , these changes improved the memory footprint and performance for Phi-Silica workload. Without these changes, it consumed 16GB memory and got 1 fps when running the workload on OVEP GPU backend. After the change, the memory usage reduced to 3.7GB and the performance achieves 16fps.

If feature goes to new ABI?

Yes

Jira Ticket :

https://jira.devtools.intel.com/browse/CVS-175736

Kotomi-Du · 2025-10-03T00:27:37Z

onnxruntime/core/providers/openvino/ov_interface.cc

-  if (gpu_or_npu) {
-    prefill_use_full_chat_history = true;
-  }
+  // bool gpu_or_npu = ((device.find("NPU") != std::string::npos) || (device.find("GPU") != std::string::npos));


need to discuss with ORT-GenAI team how to handle this logic

Co-author: Beheshti, Nazanin

mklimenk · 2025-10-13T09:05:04Z

onnxruntime/core/providers/openvino/ov_interface.cc

+  // check if there is input_ids tensors and if the tensor type is int64,
+  // because logic prefill_use_full_chat_history is only for specific inputs and data type
+  auto input_ids_opt = FindTensor("input_ids");
+  if (gpu_or_npu && input_ids_opt.has_value() && input_ids_opt->get_element_type() != ov::element::i64) {


The comment contradicts the code. The comment says the tensor type is int64, while the condition checks if it's not. Please check if there's a bug

mklimenk · 2025-10-13T09:07:02Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

+  }
+
+  if (ModelHasInputOutputNames(ov_model, "/model/embed_tokens/Gather_output_0")) {
+    main_input_name = "/model/embed_tokens/Gather_output_0";


I'm not a fan of hardcoding specific input names, which can change over time and break things. Could you please see whether there's a possibility to avoid that?

mklimenk · 2025-10-13T09:08:08Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

    main_input_name = "input_ids";
  }

+  if (ModelHasInputOutputNames(ov_model, "input_hidden_states")) {
+    main_input_name = "input_hidden_states";
+  }
+
+  if (ModelHasInputOutputNames(ov_model, "/model/embed_tokens/Gather_output_0")) {
+    main_input_name = "/model/embed_tokens/Gather_output_0";
+  }


We have a lot of code duplication here. Please make a helper function that takes an array of strings as input and checks whether they're present in the model.

mklimenk · 2025-10-13T09:14:41Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

-        key_value_input_names.push_back(name);
+  const auto& params = model->get_parameters();
+  bool found = false;
+  for (auto i = 0; i < params.size(); i++) {


Reminder: watch for signed/unsigned mismatches. auto i = 0 deduces an int, which doesn't match the container .size() type

ankitm3k · 2025-10-13T16:26:18Z

Please attach a JIRA for this feature request in the PR description.

Kotomi-Du · 2025-10-17T19:02:19Z

Please attach a JIRA for this feature request in the PR description.

done, let me know if you have further question

Kotomi-Du · 2025-10-28T00:40:16Z

Hi, @ankitm3k any feedback on this PR?

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

Copilot

Pull Request Overview

This PR enables stateful mode for Phi-silica models by enhancing the OpenVINO provider's ability to recognize different LLM model types and properly handle their key-value cache inputs. This optimization reduces memory usage from 16GB to 3.7GB and improves performance from 1fps to 16fps for Phi-Silica workloads on OVEP GPU backend.

Enhanced input name detection with prioritized candidate names for different model types
Improved key-value cache input recognition to handle "keys" and "values" patterns beyond just "key_values"
Added conditional logic for prefill optimization based on input tensor characteristics

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
ov_stateful_patch_utils.cc	Added flexible input name detection and expanded key-value cache pattern matching
ov_interface.cc	Enhanced stateful request initialization with input type validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-28T20:24:56Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

+  const auto& params = model->get_parameters();
+  bool found = false;
+  for (size_t i = 0; i < params.size(); i++) {
+    auto param_name = params.at(i)->output(0).get_any_name();


Using params.at(i) is less efficient than direct indexing with params[i]. Consider using range-based for loop or direct indexing for better performance.

Suggested change

auto param_name = params.at(i)->output(0).get_any_name();

auto param_name = params[i]->output(0).get_any_name();

Copilot · 2025-10-28T20:24:56Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

    if (!found) {
-      not_kv_inputs.push_back(input.get_any_name());
+      not_kv_inputs.push_back(param_name);
    }


The 'found' variable is never reset to false between iterations, causing incorrect classification of subsequent parameters. Reset 'found = false' at the beginning of each loop iteration.

RyanMetcalfeInt8 · 2025-10-30T12:29:28Z

@Kotomi-Du -- Can you resolve the existing feedback on this PR? Thanks!

update

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

Copilot · 2025-10-30T22:19:57Z

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc

+      } else if (name.find("keys") != std::string::npos) {
+        key_value_input_names.push_back(name);
+        found = true;
+        break;
+      } else if (name.find("values") != std::string::npos) {
+        key_value_input_names.push_back(name);
+        found = true;
+        break;


The logic for detecting 'keys' and 'values' patterns treats them separately with identical code blocks. This could lead to only finding one type of cache input when both should be collected. Consider restructuring to collect all matching key-value inputs rather than breaking after the first match.

Kotomi-Du · 2025-10-30T22:25:47Z

@Kotomi-Du -- Can you resolve the existing feedback on this PR? Thanks!

@RyanMetcalfeInt8 addressed, also requested new copilot review which is not helpful.

RyanMetcalfeInt8

LGTM

Kotomi-Du changed the base branch from master to ovep-develop October 3, 2025 00:24

Kotomi-Du marked this pull request as draft October 3, 2025 00:26

Kotomi-Du commented Oct 3, 2025

View reviewed changes

Kotomi-Du requested a review from mdvoretc-intel October 3, 2025 00:33

Kotomi-Du added 2 commits October 10, 2025 16:36

trigger stateful path for Phisilica model

65bbecc

Co-author: Beheshti, Nazanin

unify the code

1e132f3

Kotomi-Du force-pushed the make_stateful_phisilica branch from 0fe0302 to 1e132f3 Compare October 11, 2025 00:29

Kotomi-Du marked this pull request as ready for review October 11, 2025 00:30

Kotomi-Du requested review from RyanMetcalfeInt8, ankitm3k and preetha-intel October 11, 2025 00:33

mklimenk suggested changes Oct 13, 2025

View reviewed changes

ankitm3k force-pushed the make_stateful_phisilica branch from 1e132f3 to e50603d Compare October 13, 2025 11:29

address PR review

25c6976

Kotomi-Du force-pushed the make_stateful_phisilica branch from e50603d to 25c6976 Compare October 17, 2025 19:00

MayureshV1 changed the title ~~[OVEP] Enable stateful mode for Phi-silica models~~ CVS-175736-[OVEP] Enable stateful mode for Phi-silica models Oct 27, 2025

RyanMetcalfeInt8 reviewed Oct 28, 2025

View reviewed changes

onnxruntime/core/providers/openvino/ov_stateful_patch_utils.cc Outdated Show resolved Hide resolved

update the keyword for matching key_value_input_names

513e198

MayureshV1 requested review from RyanMetcalfeInt8 and Copilot October 28, 2025 20:24

Copilot AI reviewed Oct 28, 2025

View reviewed changes

MayureshV1 removed the request for review from preetha-intel October 28, 2025 20:25

Kotomi-Du added 3 commits October 30, 2025 14:11

optimize the code

d7ee534

update

revert original code which is functional

3c1c4c3

remove useless change

7c1720d

Kotomi-Du requested a review from Copilot October 30, 2025 22:19

Copilot AI reviewed Oct 30, 2025

View reviewed changes

RyanMetcalfeInt8 approved these changes Nov 3, 2025

View reviewed changes

Merge branch 'ovep-develop' into make_stateful_phisilica

2041402

Kotomi-Du merged commit b9a73f3 into intel:ovep-develop Nov 3, 2025
3 of 5 checks passed

Kotomi-Du mentioned this pull request Nov 6, 2025

CVS-175736 - [OVEP] Optimize Stateful Path: use output-to-input strategy to get the pairs of KV name #845

Open

	auto param_name = params.at(i)->output(0).get_any_name();
	auto param_name = params[i]->output(0).get_any_name();

CVS-175736-[OVEP] Enable stateful mode for Phi-silica models #821

CVS-175736-[OVEP] Enable stateful mode for Phi-silica models #821

Conversation

Kotomi-Du commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

If feature goes to new ABI?

Jira Ticket :

Uh oh!

Kotomi-Du Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

mklimenk Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

mklimenk Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

mklimenk Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

mklimenk Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

ankitm3k commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kotomi-Du commented Oct 17, 2025

Uh oh!

Kotomi-Du commented Oct 28, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

RyanMetcalfeInt8 commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Kotomi-Du commented Oct 30, 2025

Uh oh!

RyanMetcalfeInt8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kotomi-Du commented Oct 3, 2025 •

edited

Loading

ankitm3k commented Oct 13, 2025 •

edited

Loading