Skip to content

Comments

[API] Adding support for serving of DiffUsers models.#489

Merged
slin1237 merged 1 commit intosgl-project:mainfrom
shenoyvvarun:vasheno/diffusion-support
Jan 8, 2026
Merged

[API] Adding support for serving of DiffUsers models.#489
slin1237 merged 1 commit intosgl-project:mainfrom
shenoyvvarun:vasheno/diffusion-support

Conversation

@shenoyvvarun
Copy link
Contributor

@shenoyvvarun shenoyvvarun commented Jan 6, 2026

What this PR does

  • add diffusion pipeline support to BaseModelSpec and ServingRuntime supported formats (incl. deep-copy + frontend types)
  • extend runtime selector matcher to check diffusion pipelines and add coverage; doc the new compatibility dimension
  • add Qwen-Image ClusterBaseModel, SRT runtime (diffusers pipeline metadata), and sample InferenceService wired to both.

Key Design choice

  • There is no standard convention for model_index.json (unlike the config.json for transformers based models). All models seem to have the following fields
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.27.0"
}

Other fields depends on the model but, these are the most common. (I can remove this based on feedback)

{
  "scheduler": [],
  "text_encoder": [],
  "tokenizer": [],
  "transformer": [],
  "vae": []
}

Why we need it

  • To support serving of Diffusion models in OME.

Fixes #

How to test

  • Created the runtime, basemodel and inferenceservice (tested with QwenImage, QwenImageEdit)
e.g.

kubectl get clusterbasemodel
NAME         DISABLED   VERSION   VENDOR   FRAMEWORK   FRAMEWORKVERSION   MODELFORMAT   ARCHITECTURE   CAPABILITIES    SIZE   COMPARTMENTID   READY   AGE
qwen-image   false      1.0.0     Qwen     diffusers   0.34.0.dev0        diffusers                    TEXT_TO_IMAGE                          Ready   5h15m

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

Followups

  • UI changes (if needed)

@github-actions github-actions bot added documentation Documentation changes api API/Types changes in pkg/apis helm Helm chart changes crd CRD definition changes runtime Runtime configuration changes models Model configuration changes ci CI/CD workflow changes tests Test changes config Configuration changes dependencies Dependency updates labels Jan 6, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @shenoyvvarun, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the platform's capabilities by integrating native support for DiffUsers models. It introduces new API fields to precisely define diffusion pipeline components within model and runtime specifications, and enhances the runtime selection mechanism to intelligently match models with compatible serving environments. This foundational work enables seamless deployment and management of advanced generative AI models, exemplified by the inclusion of a Qwen-Image model configuration.

Highlights

  • Diffusers Model Support: Introduced comprehensive API support for serving DiffUsers models, enabling the platform to manage and deploy diffusion-based generative AI models.
  • API Extension for Diffusion Pipelines: Extended BaseModelSpec and ServingRuntime Custom Resource Definitions (CRDs) to include a new diffusionPipeline field, allowing detailed metadata about diffusion model components (e.g., scheduler, text encoder, tokenizer, transformer, VAE) to be specified.
  • Enhanced Runtime Selector Matching: Updated the runtime selector matcher logic to incorporate diffusionPipeline compatibility checks, ensuring that models are matched with serving runtimes that support their specific diffusion pipeline configurations.
  • Example Deployment for Qwen-Image: Added sample ClusterBaseModel, ClusterServingRuntime, and InferenceService configurations for the Qwen-Image model, demonstrating how to deploy a DiffUsers model using the new API extensions.
  • Makefile Refactoring: Refactored the Makefile to streamline Go test execution by introducing a reusable macro and centralizing test environment configuration, improving maintainability and readability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for serving DiffUsers models, a significant enhancement. The changes span across API definitions, CRDs, runtime selection logic, and include new sample configurations. The overall implementation is solid, but I've identified a potential logic issue in the new runtime matching code that could lead to incorrect compatibility checks. I've also pointed out a couple of minor inconsistencies for improvement. My feedback aims to improve the correctness and maintainability of the new functionality.

@shenoyvvarun shenoyvvarun force-pushed the vasheno/diffusion-support branch from 6c85ee9 to 2bcde70 Compare January 7, 2026 01:07
@slin1237 slin1237 merged commit 9f23ca5 into sgl-project:main Jan 8, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API/Types changes in pkg/apis ci CI/CD workflow changes config Configuration changes crd CRD definition changes dependencies Dependency updates documentation Documentation changes helm Helm chart changes models Model configuration changes runtime Runtime configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants