Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knative Eventing Trigger sends CE Responder to a loop #7356

Closed
khrm opened this issue Oct 12, 2023 · 14 comments
Closed

Knative Eventing Trigger sends CE Responder to a loop #7356

khrm opened this issue Oct 12, 2023 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@khrm
Copy link

khrm commented Oct 12, 2023

Describe the bug
Our CE Responder goes on a loop.
It's somewhat similar to this:
https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/samples/http/responder/main.go

Expected behavior
Broker shouldn't resend events coming trigger to itself again.
To Reproduce
Follow the tutorial.

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  annotations:
    eventing.knative.dev/creator: minikube-user
    eventing.knative.dev/lastModifier: minikube-user
  labels:
    eventing.knative.dev/broker: cloudevent
  name: cloudevents-trigger
  namespace: default
spec:
  broker: example-broker
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: ce-test
      namespace: default

Knative release version

Version:      v1.11.0
Build Date:   2023-07-27 07:42:56
Git Revision: b7508e67
Supported APIs:
* Serving
  - serving.knative.dev/v1 (knative-serving v1.11.0)
* Eventing
  - sources.knative.dev/v1 (knative-eventing v1.11.0)
  - eventing.knative.dev/v1 (knative-eventing v1.11.0)

Additional context
Add any other context about the problem here such as proposed priority

@khrm khrm added the kind/bug Categorizes issue or PR as related to a bug. label Oct 12, 2023
@khrm
Copy link
Author

khrm commented Oct 12, 2023

It seems to be an expected behavior. We have suggested a workaround here in Tekton Triggers:
tektoncd/triggers#1618 (comment)

Can this be resolved without using filters? Can broker and Trigger not go on loop somehow?

@matzew
Copy link
Member

matzew commented Oct 24, 2023

It is generally a good idea to not subscribe to all events, and provide explicit filters (for types and other CE attributes).

@matzew
Copy link
Member

matzew commented Oct 24, 2023

Besides that, there is also a way to control behavior a bit, using the ce-knativebrokerttl extension attribute:
https://github.com/knative/eventing/blob/main/pkg/broker/ttl.go#L29-L32

Regardless, not sure if all broker implementations supporting this.
So, IMO good practices are:

  • Use filter(s) on Triggers for CloudEvent attributes

    • Do not subscribe to all events (especially type)
  • Return application specific CloudEvents in response to the delivery

    • Success: return a new event (avoid filter loops on type)
    • Error: return new event for application level errors
      • provide context of failing payload, if possible
    • Use different types for return, to avoid filter loops

@matzew
Copy link
Member

matzew commented Oct 24, 2023

Here is a post I wrote: https://developers.redhat.com/articles/2023/05/02/how-set-event-driven-microservices-using-knative-eventing

Using application-specific events is a section that also recommends:

It is very important to not return the same CloudEvent type that goes into a function because this would cause a filter loop on the executing Knative Broker.

@matzew
Copy link
Member

matzew commented Oct 24, 2023

The new triggers are currently an experimental feature (alpha), but we are working on moving that towards BETA

@pierDipi
Copy link
Member

I think that generally we could offer a way of disabling the reply as it might help with integrations with legacy apps as well which might not be necessarily change-able

Something like:

spec:
  reply:
    discard: true

@pierDipi
Copy link
Member

This is also related to #6392

@matzew
Copy link
Member

matzew commented Oct 24, 2023

You mean legacy apps submitting CloudEvents?
Not sure on that spec.reply thing. I see it as a feature that the current implementations behave like they do. There is still the TTL attr

@pierDipi
Copy link
Member

pierDipi commented Oct 24, 2023

Receiving Cloudevents, here the reply is sent by a subscriber

@pierDipi
Copy link
Member

See this comment on TTL which I agree with #6392 (comment)

@ggallen
Copy link

ggallen commented Oct 24, 2023

It is generally a good idea to not subscribe to all events, and provide
explicit filters (for types and other CE attributes).

@matzew, I guess I can't argue with this, but I think it should work.

My question is more general. In our use case, which was eventually boiled down to the test case provided in this ticket by @khrm, the Tekton EL sends an "accepted" CloudEvent response to the broker. The broker then turns around and sends that accepted message back the the Tekton EL. And then, of course, the EL sends back an "accepted" CloudEvent response. And we get in an infinite loop.

This fundamentally seems wrong, and that there is a bug somewhere.

So are you saying that this is what should be happening? My (naive) expectation was that an "accepted" CloudEvent response would be a special case, and not broadcast everywhere.

@matzew
Copy link
Member

matzew commented Oct 26, 2023

So, I have a few questions:

  • how many triggers has your broker? Do they all have no filter ?

The use-case: Do I understand it correct?

  1. application sends an event to the Broker via HTTP (e.g. com.redhat.agogos.event....).
  2. Broker has one(?) Trigger (no filter), and the "sink" is Tekton EL.
  3. Tekton EL receives that com.redhat and replys a new CloudEvent.
  4. The Event from the Tekton EL goes of course to the broker.
  5. Since no Trigger is having a filter the Trigger from 2) receives this message and and sends it again to the "Tekton EL"?

Just for understanding. Or is the use-case different?

@matzew
Copy link
Member

matzew commented Oct 27, 2023

Instead of going the way to define a filters.not.exact, using the "experimental" feature, I'd suggest to use a more streamlined solution. Just watch for explicit desired events, for distribution. Like:

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: agogos-build-v1a1
  namespace: tests
spec:
  broker: agogos
  filter:
    attributes:
      type: com.redhat.agogos.event.component.build.v1alpha1
  subscriber:
    uri: http://el-agogos.tests.svc.cluster.local:8080
---
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: agogos-some-other
  namespace: tests
spec:
  broker: agogos
  filter:
    attributes:
      type: com.redhat.agogos.event.someother.event.type
  subscriber:
    uri: http://el-agogos.tests.svc.cluster.local:8080 (or some other serivce for this type)
...

If you than want to process some other event type (e.g. like that "accept" from Tekton), I'd add a new trigger for that.

This also prevents you from accidentally sending "wrong" events to the Tekton EL.
Using the experimental feature is possible too.

However, I do see the point for the loop and the TTL CE extension is not optimal. The referenced discussion from @pierDipi shows this too. We might do something there in the future.

but for now, I think the above suggestions are the best options, and they follow the design and best practices (E.g. to not subscribe to all events on a trigger)

Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants