Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Selection for ScaledJob based on Event Data #5100

Closed
benhid opened this issue Oct 19, 2023 · 11 comments
Closed

Image Selection for ScaledJob based on Event Data #5100

benhid opened this issue Oct 19, 2023 · 11 comments
Labels
feature-request All issues for new features that have not been committed to needs-discussion stale All issues that are marked as stale due to inactivity

Comments

@benhid
Copy link

benhid commented Oct 19, 2023

Proposal

Current functionality allows us to scale and execute jobs based on events but with a fixed job template. Instead of a fixed image specified in the ScaledJob, deciding the image to run dynamically would be beneficial.

e.g.,

Let's consider a scenario where a unified message queue contains events related to image processing and data validation. Each type of event demands its processing logic and, hence, its own Docker image / Job.

Currently, we'd have to set up two distinct ScaledJobs, each listening to the same queue but filtering events and scaling based on their specific criteria.

With the proposed functionality, a single ScaledJob could listen to the unified queue. Upon receiving an event, it would inspect the event type or payload and dynamically decide whether to use the Docker image for image processing or the one for data validation.

Use-Case

In our organization, we handle diverse event-driven tasks that require different processing logic. While these tasks share a common trigger mechanism (via the same message source, e.g., RabbitMQ), the processing logic, and thus the Docker image, differs depending on the specifics of the event.

As our system evolves and introduces new event types with distinct processing logic, this feature offers the flexibility to accommodate these changes without creating a separate ScaledJob.

Is this a feature you are interested in implementing yourself?

No

Anything else?

I'm unsure whether KEDA's current architecture can implement the proposed feature. A more straightforward approach might involve enabling the scaler to choose the target reference job based on the topic name. For instance, Kafka can structure topics using a dot (.) separator. Taking a topic pattern such as "events.*" as an example, the Kafka scaler could determine which job to trigger based on whether the event arrives at "events.busybox" or "events.python3".

@benhid benhid added feature-request All issues for new features that have not been committed to needs-discussion labels Oct 19, 2023
@JorTurFer
Copy link
Member

Hi
Nice proposal, but IMHO we shouldn't do this because it means that we get extra information about the messages and I think that we shouldn't cross the line of checking something from the messages. Currently, I'd say that you already could do this using multiple ScaledJobs, with different specs and filtering the topics, couldn't?

WDYT @kedacore/keda-contributors ?

@zroubalik
Copy link
Member

This would be very hard to implement in a generic way. We would need to cover all different technologies and transport protocols (I wish everyone use CloudEvents :) ). Also, inspecting the actual data brings concerns with security.

I think that the current approach with mulitple scaledjob is not such a big overheard and solves the problem. Or is there anything in particular?

@benhid
Copy link
Author

benhid commented Oct 19, 2023

Thank you for your feedback! We are indeed using multiple ScaledJobs to achieve this (>100 at the moment). However, our users have the flexibility to execute jobs using any base image, which is not feasible with the current setup. We don't know all the potential images they might choose in advance, making it challenging to predefine ScaledJobs for each one.

@zroubalik
Copy link
Member

@benhid gotcha. But even if we somehow implement this, you would still need to define the relation between images and message source, don't you?

@benhid
Copy link
Author

benhid commented Oct 20, 2023

@benhid gotcha. But even if we somehow implement this, you would still need to define the relation between images and message source, don't you?

In fact, having to define that relation is what I'm trying to avoid.

I keep thinking about this, and I can't come up with a proper solution. Perhaps KEDA isn't the right tool for this specific use case, and even if it is, implementing this feature might introduce too much complexity. 😟

Let me know what you think 👍

@SpiritZhou
Copy link
Contributor

Thank you for your feedback! We are indeed using multiple ScaledJobs to achieve this (>100 at the moment). However, our users have the flexibility to execute jobs using any base image, which is not feasible with the current setup. We don't know all the potential images they might choose in advance, making it challenging to predefine ScaledJobs for each one.

I have a question. Is it possible to create a scaledjob immediately according to your user's image choose?

@benhid
Copy link
Author

benhid commented Oct 20, 2023

I have a question. Is it possible to create a scaledjob immediately according to your user's image choose?

Yes I think so. It would involve automating the creation of the ScaledJobs using Kubernetes APIs. However, I'm concerned about users creating CRDs for one-off tasks and then not cleaning them up.

@zroubalik
Copy link
Member

zroubalik commented Oct 20, 2023

I have a question. Is it possible to create a scaledjob immediately according to your user's image choose?

Yes I think so. It would involve automating the creation of the ScaledJobs using Kubernetes APIs. However, I'm concerned about users creating CRDs for one-off tasks and then not cleaning them up.

That could be solved by a very simple operator imho or even a cron job.

@benhid
Copy link
Author

benhid commented Oct 22, 2023

That could be solved by a very simple operator imho or even a cron job.

I'm not entirely convinced that a cron job would be optimal. It doesn't seem like a particularly robust solution, especially if we start scaling and have a high volume of these jobs. The overhead of constantly creating, checking, and cleaning CRDs might be substantial. Could you elaborate more on how you envision this operator working?

Copy link

stale bot commented Dec 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Dec 22, 2023
Copy link

stale bot commented Dec 29, 2023

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All issues for new features that have not been committed to needs-discussion stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

4 participants