[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310

Mohamed-Ashraf273 · 2025-06-22T15:58:03Z

Description of the change

As a part of my GSoC25 project to support inference with the openvino backend for Gemma and Mistral,
This is my PR for supporting the Gemma and Mistral pipelines.

Reference

https://docs.openvino.ai/2025/index.html
https://keras.io/api/
https://keras.io/keras_hub/

Colab Notebook

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

mattdangerw

Left some initial comments! But probably first question is around the changes to causal_lm and gemma_causal_lm. Why is this so backend specific? This is much more involved than changes for jax/torch/tensorflow

mattdangerw · 2025-06-24T20:15:21Z

keras_hub/src/models/gemma/gemma_causal_lm_test.py

@@ -211,6 +237,10 @@ def test_all_presets(self):
                input_data=self.input_data,
            )

+    @pytest.mark.skipif(
+        keras.config.backend() == "openvino",
+        reason="OpenVINO is for inference only",


score is inference only, is there a reason this needs to be disabled?

It needs roll operation, which is not supported, but I thought if I can make generate work for now, I'll enable this test.

mattdangerw · 2025-06-24T20:16:23Z

keras_hub/src/utils/tensor_utils.py

@@ -158,7 +158,8 @@ def convert_preprocessing_inputs(x):
        # If we have a string input, use tf.tensor.
        if isinstance(x, np.ndarray) and x.dtype.type is np.str_:
            return tf.convert_to_tensor(x)
-        x = ops.convert_to_tensor(x)
+        if keras.config.backend() != "openvino":
+            x = ops.convert_to_tensor(x)


what does convert_to_tensor do on the openvino backend, should it just convert to numpy?

At the end of the generate wrapper for OpenVINO, the output is already NumPy. To avoid unnecessary reconversion in post_process, I disabled that step for the OpenVINO backend.

mattdangerw · 2025-06-24T20:18:02Z

keras_hub/src/models/gemma/gemma_causal_lm_test.py

@@ -89,6 +97,9 @@ def test_generate(self):
        causal_lm.preprocessor = None
        outputs = causal_lm.generate(prompt_ids, stop_token_ids=None)
        # Assert prompt is in output in token id space.
+        if keras.config.backend() == "openvino":


Would extending this for our asserts mean we could ditch some of these switch cases?

keras-hub/keras_hub/src/tests/test_case.py

Lines 21 to 37 in 25c9062

def convert_to_comparible_type(x):

"""Convert tensors to comparable types.

Any string are converted to plain python types. Any jax or torch tensors

are converted to numpy.

"""

if getattr(x, "dtype", None) == tf.string:

if isinstance(x, tf.RaggedTensor):

x = x.to_list()

if isinstance(x, tf.Tensor):

x = x.numpy() if x.shape.rank == 0 else x.numpy().tolist()

return tree.map_structure(lambda x: x.decode("utf-8"), x)

if isinstance(x, (tf.Tensor, tf.RaggedTensor)):

return x

if hasattr(x, "__array__"):

return ops.convert_to_numpy(x)

return x

mattdangerw · 2025-06-24T20:19:40Z

keras_hub/src/models/gemma/gemma_causal_lm_test.py

+            if keras.config.backend() == "openvino":
+                """Set all logits to a large negative number 
+                    to avoid NaNs produced by ov.einsum"""
+                logits = ops.ones_like(logits) * ops.convert_to_tensor(


this seems suspicious, should we be fixing the code not the test? maybe we need a openvino fix before calling einsum in some places? what is failing?

You are right, when I cancelled the splitting approach, it worked!

mattdangerw · 2025-06-24T20:20:22Z

keras_hub/src/models/gemma/gemma_causal_lm_test.py

@@ -60,6 +64,10 @@ def test_causal_lm_basics(self):
            expected_output_shape=(2, 8, 11),
        )

+    @pytest.mark.skipif(
+        keras.config.backend() == "openvino",
+        reason="OpenVINO is for inference only",


this is also inference only, no gradient descent. should this be enabled?

mattdangerw · 2025-06-24T20:23:18Z

keras_hub/src/models/gemma/gemma_causal_lm.py

-                cache=current_cache,
-                cache_update_index=cache_update_index,
+
+        use_openvino = keras.config.backend() == "openvino"


this seems quite hard to maintain and like something we should probably avoid. why do we need a pass that is so different than other backends? shouldn't the point of keras' multi-backend setup allow to just call layers normally?

mattdangerw · 2025-06-24T20:25:45Z

keras_hub/src/models/causal_lm.py

+            import numpy as np
+            import openvino as ov
+            import openvino.runtime.opset14 as ov_opset
+            from keras.src.backend.openvino.core import OPENVINO_DTYPES


we'd like to avoid private imports from keras if we can, they are hard to maintain if keras changes it's code structure.

But I need them to make an OpenVINO model graph.

mattdangerw · 2025-06-24T20:26:38Z

keras_hub/src/models/causal_lm.py

@@ -132,6 +137,113 @@ def make_generate_function(self):
            return self.generate_function

        self.generate_function = self.generate_step
+        if keras.config.backend() == "openvino":


let's try to think if there's a way we can keep this resembling other backends a little more, this is fairly heavyweight.

I’ve done my best to minimize backend-specific bias in the code.
I'd appreciate it if you could take another look.

Mohamed-Ashraf273 · 2025-06-24T20:41:04Z

Left some initial comments! But probably first question is around the changes to causal_lm and gemma_causal_lm. Why is this so backend specific? This is much more involved than changes for jax/torch/tensorflow

hi @mattdangerw ,
I’ve been working on enabling inference for Gemma with OpenVINO by implementing the missing operations. The main challenge I ran into is that building the entire graph as a single model makes execution difficult (takes too long + RAM may overflow) , I'm still investigating why.
To address this, I introduced a subgraph-based approach by splitting the full graph at key points. I also added logic to store compiled subgraphs in CausalLM so they can be reused across generations instead of rebuilding them each time.

github-actions bot added the Gemma Gemma model specific issues label Jun 22, 2025

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 26 times, most recently from 6576b03 to 074f0c2 Compare June 23, 2025 17:26

[OpenVINO backend] supporting inference for gemma with ov backend

692ae90

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 2 times, most recently from d748dd5 to f5470cd Compare June 24, 2025 13:36

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch 4 times, most recently from 2053a10 to b4f8905 Compare June 24, 2025 14:44

remove disc mechanism

2044408

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from b4f8905 to 2044408 Compare June 24, 2025 19:40

mattdangerw reviewed Jun 24, 2025

View reviewed changes

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 164e2fd to 8f58d0b Compare June 24, 2025 23:33

remove unwanted code and test passes

2e7bece

Mohamed-Ashraf273 force-pushed the supporting_gemma_inference_with_ov_backend branch from 8f58d0b to 2e7bece Compare June 25, 2025 00:59

Mohamed-Ashraf273 changed the title ~~[OpenVINO backend] supporting inference for gemma with ov backend~~ [OpenVINO backend] supporting inference for gemma and mistral with ov backend Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310

[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310

Mohamed-Ashraf273 commented Jun 22, 2025 •

edited

Loading

Uh oh!

mattdangerw left a comment

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025 •

edited

Loading

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025 •

edited

Loading

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025

Uh oh!

mattdangerw Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025

Uh oh!

Mohamed-Ashraf273 commented Jun 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

	def convert_to_comparible_type(x):
	"""Convert tensors to comparable types.

	Any string are converted to plain python types. Any jax or torch tensors
	are converted to numpy.
	"""
	if getattr(x, "dtype", None) == tf.string:
	if isinstance(x, tf.RaggedTensor):
	x = x.to_list()
	if isinstance(x, tf.Tensor):
	x = x.numpy() if x.shape.rank == 0 else x.numpy().tolist()
	return tree.map_structure(lambda x: x.decode("utf-8"), x)
	if isinstance(x, (tf.Tensor, tf.RaggedTensor)):
	return x
	if hasattr(x, "__array__"):
	return ops.convert_to_numpy(x)
	return x

[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310

Are you sure you want to change the base?

[OpenVINO backend] supporting inference for gemma and mistral with ov backend #2310

Conversation

Mohamed-Ashraf273 commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Reference

Colab Notebook

Checklist

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mohamed-Ashraf273 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Mohamed-Ashraf273 commented Jun 22, 2025 •

edited

Loading

Mohamed-Ashraf273 Jun 24, 2025 •

edited

Loading

Mohamed-Ashraf273 Jun 24, 2025 •

edited

Loading

Mohamed-Ashraf273 commented Jun 24, 2025 •

edited

Loading