-
Notifications
You must be signed in to change notification settings - Fork 287
[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310
Conversation
6576b03
to
074f0c2
Compare
d748dd5
to
f5470cd
Compare
2053a10
to
b4f8905
Compare
b4f8905
to
2044408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments! But probably first question is around the changes to causal_lm and gemma_causal_lm. Why is this so backend specific? This is much more involved than changes for jax/torch/tensorflow
@@ -211,6 +237,10 @@ def test_all_presets(self): | |||
input_data=self.input_data, | |||
) | |||
|
|||
@pytest.mark.skipif( | |||
keras.config.backend() == "openvino", | |||
reason="OpenVINO is for inference only", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score is inference only, is there a reason this needs to be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs roll
operation, which is not supported, but I thought if I can make generate
work for now, I'll enable this test.
@@ -158,7 +158,8 @@ def convert_preprocessing_inputs(x): | |||
# If we have a string input, use tf.tensor. | |||
if isinstance(x, np.ndarray) and x.dtype.type is np.str_: | |||
return tf.convert_to_tensor(x) | |||
x = ops.convert_to_tensor(x) | |||
if keras.config.backend() != "openvino": | |||
x = ops.convert_to_tensor(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does convert_to_tensor do on the openvino backend, should it just convert to numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the generate wrapper for OpenVINO, the output is already NumPy. To avoid unnecessary reconversion in post_process, I disabled that step for the OpenVINO backend.
@@ -89,6 +97,9 @@ def test_generate(self): | |||
causal_lm.preprocessor = None | |||
outputs = causal_lm.generate(prompt_ids, stop_token_ids=None) | |||
# Assert prompt is in output in token id space. | |||
if keras.config.backend() == "openvino": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would extending this for our asserts mean we could ditch some of these switch cases?
keras-hub/keras_hub/src/tests/test_case.py
Lines 21 to 37 in 25c9062
def convert_to_comparible_type(x): | |
"""Convert tensors to comparable types. | |
Any string are converted to plain python types. Any jax or torch tensors | |
are converted to numpy. | |
""" | |
if getattr(x, "dtype", None) == tf.string: | |
if isinstance(x, tf.RaggedTensor): | |
x = x.to_list() | |
if isinstance(x, tf.Tensor): | |
x = x.numpy() if x.shape.rank == 0 else x.numpy().tolist() | |
return tree.map_structure(lambda x: x.decode("utf-8"), x) | |
if isinstance(x, (tf.Tensor, tf.RaggedTensor)): | |
return x | |
if hasattr(x, "__array__"): | |
return ops.convert_to_numpy(x) | |
return x |
if keras.config.backend() == "openvino": | ||
"""Set all logits to a large negative number | ||
to avoid NaNs produced by ov.einsum""" | ||
logits = ops.ones_like(logits) * ops.convert_to_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems suspicious, should we be fixing the code not the test? maybe we need a openvino fix before calling einsum in some places? what is failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, when I cancelled the splitting approach, it worked!
@@ -60,6 +64,10 @@ def test_causal_lm_basics(self): | |||
expected_output_shape=(2, 8, 11), | |||
) | |||
|
|||
@pytest.mark.skipif( | |||
keras.config.backend() == "openvino", | |||
reason="OpenVINO is for inference only", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also inference only, no gradient descent. should this be enabled?
cache=current_cache, | ||
cache_update_index=cache_update_index, | ||
|
||
use_openvino = keras.config.backend() == "openvino" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems quite hard to maintain and like something we should probably avoid. why do we need a pass that is so different than other backends? shouldn't the point of keras' multi-backend setup allow to just call layers normally?
import numpy as np | ||
import openvino as ov | ||
import openvino.runtime.opset14 as ov_opset | ||
from keras.src.backend.openvino.core import OPENVINO_DTYPES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd like to avoid private imports from keras if we can, they are hard to maintain if keras changes it's code structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I need them to make an OpenVINO model graph.
@@ -132,6 +137,113 @@ def make_generate_function(self): | |||
return self.generate_function | |||
|
|||
self.generate_function = self.generate_step | |||
if keras.config.backend() == "openvino": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to think if there's a way we can keep this resembling other backends a little more, this is fairly heavyweight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve done my best to minimize backend-specific bias in the code.
I'd appreciate it if you could take another look.
hi @mattdangerw , |
164e2fd
to
8f58d0b
Compare
8f58d0b
to
2e7bece
Compare
Description of the change
As a part of my GSoC25 project to support inference with the openvino backend for Gemma and Mistral,
This is my PR for supporting the Gemma and Mistral pipelines.
Reference
https://docs.openvino.ai/2025/index.html
https://keras.io/api/
https://keras.io/keras_hub/
Colab Notebook
Checklist