Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuration of Dispatcher CircuitBreaker #757

Closed
JackSilk opened this issue Mar 23, 2021 · 15 comments · Fixed by #1034
Closed

Support configuration of Dispatcher CircuitBreaker #757

JackSilk opened this issue Mar 23, 2021 · 15 comments · Fixed by #1034

Comments

@JackSilk
Copy link

Problem
The HTTP WebClient in the Dispatcher exposes configuration, including various timeouts, via a config map (config-kafka-broker-data-plane, config-kafka-broker-webclient.properties). However the CircuitBreaker which wraps the WebClient doesn't expose its configuration. So even with an infinite HTTP timeout, if the request takes longer than (I think) 10 seconds it'll be killed. I think it would be nice if this, too, were configurable.

Persona:
Which persona is this feature for?

  • Producer

Exit Criteria
The CircuitBreaker can be modified (i.e. timeout set) such that Events that take longer than 10 seconds to complete will be sent correctly

Time Estimate (optional):
How many developer-days do you think this may take to resolve?

Additional context (optional)
The dispatcher gives me an error like this when it times out

{
    "@timestamp": "2021-03-15T09:27:42.609Z",
    "@version": "1",
    "message": "Failed to send event to subscriber topic=knative-broker-default-default partition=86 offset=0",
    "logger_name": "dev.knative.eventing.kafka.broker.dispatcher.RecordDispatcher",
    "thread_name": "vert.x-eventloop-thread-1",
    "level": "ERROR",
    "level_value": 40000,
    "stack_trace": "io.vertx.circuitbreaker.TimeoutException: operation timeout\n",
    "topic": "knative-broker-default-default",
    "partition": 86,
    "offset": 0
}

I believe the configuration is set here:
data-plane/dispatcher/src/main/java/dev/knative/eventing/kafka/broker/dispatcher/http/HttpConsumerVerticleFactory.java:createCircuitBreakerOptions

Since some of the options are currently set through the egressConfig I don't know how it would be best to make it configurable.

@pierDipi
Copy link
Member

Hi 👋! Thanks for reporting!

Do you plan to configure it for each trigger or a global configuration might be enough to start with? Or both?

@slinkydeveloper
Copy link
Contributor

slinkydeveloper commented Mar 24, 2021

I think solving this problem for a global configuration is doable, while for each trigger, I think we need to reason about it: because CircuitBreakerOptions at the end of the day is the implementation of DeliverySpec, If we want to have some specific configuration knob for each trigger, I would rather prefer to see if we can push it in DeliverySpec first.

@JackSilk
Copy link
Author

Hi! Thanks for the swift replies

I was planning on configuring it globally, I don't think I would need it to be more granular.

But certainly, I could see that it might make sense longer-term for it to be an option on the Trigger, or perhaps even both. Such as if the global configuration could be overridden on a per-trigger basis.

@slinkydeveloper
Copy link
Contributor

@JackSilk there is also the Broker.DeliverySpec to configure for each trigger the CircuitBreakerOptions, I'm worried that we might end up with 10 ways of configuring the same thing... What in particular you need to configure about CircuitBreakerOptions?

@JackSilk
Copy link
Author

@slinkydeveloper The thing I'm most concerned with configuring is the DEFAULT_TIMEOUT (i.e. https://vertx.io/docs/apidocs/io/vertx/circuitbreaker/CircuitBreakerOptions.html#DEFAULT_TIMEOUT)

@slinkydeveloper
Copy link
Contributor

From a quick look (but @pierDipi knows better than me), perhaps that should be -1? Given we already have a bound on the retries, it doesn't seem to me that we need that additional bound too...

@pierDipi
Copy link
Member

pierDipi commented Mar 24, 2021

I think it's totally reasonable to ask for this since a subscriber can take longer than DEFAULT_TIMEOUT, which is 10000L ms, to process an event and this is not related to retries or back off delays.
For example:

Trigger -> event -> Subscriber does stuff and it takes longer than 10s -> timeout -> retry (useless)

Given the schema above, for some use cases not having this configuration might be a blocker to use this component since you can't make progress due to this (annoying) timeout.

With that being said since we're discussing a global configuration, I'm pretty supportive of this feature and it should be relatively simple to implement and maintain:

Just like client options taken from the mounted ConfigMap:
https://github.com/knative-sandbox/eventing-kafka-broker/blob/a0bffc897a461401cfced958744b8c378f71ea36/data-plane/dispatcher/src/main/java/dev/knative/eventing/kafka/broker/dispatcher/Main.java#L118-L119

https://github.com/knative-sandbox/eventing-kafka-broker/blob/a0bffc897a461401cfced958744b8c378f71ea36/data-plane/dispatcher/src/main/java/dev/knative/eventing/kafka/broker/dispatcher/http/HttpConsumerVerticleFactory.java#L76

we can use them here (making sure we copy them before changing config based on egressConfig like we do for consumerConfigs):

https://github.com/knative-sandbox/eventing-kafka-broker/blob/a0bffc897a461401cfced958744b8c378f71ea36/data-plane/dispatcher/src/main/java/dev/knative/eventing/kafka/broker/dispatcher/http/HttpConsumerVerticleFactory.java#L221-L233

@slinkydeveloper
Copy link
Contributor

slinkydeveloper commented Mar 25, 2021

Ahhh gotcha so this timeout is the timeout of a single request, right? If that's the case, I think this should live in delivery spec itself, because it seems an important configuration to me, and it's also very specific to each subscriber: a default one might not fit most use cases where people need to tweak this parameter. It's not specific to this implementation at all and it's a requirement general enough to be valid for every delivery component.

@pierDipi
Copy link
Member

So, given the response on the upstream issue, do you agree that we can expose this configuration globally with a ConfigMap for now?

@slinkydeveloper
Copy link
Contributor

@pierDipi let's wait for it. We won't be able to push this in next release anyway, which is in 1 week, and TBH I'm reluctant to expose this configuration because it might break, given we had several times the discussion on wheter we should continue to use vertx-circuit-breaker or another library... I would say: let's give it 2 weeks and then we get back here and take a decision, ok?

Also please comment in the delivery spec timeout issue if you like it 😄

@pierDipi
Copy link
Member

I'm ok with that!

@pierDipi
Copy link
Member

Ping.

I'm ok with exposing vertx circuit breaker config given that is a global configuration and the circuit breaker library is working well.

@slinkydeveloper
Copy link
Contributor

slinkydeveloper commented Apr 26, 2021

Ok, but let's make sure we clarify both in the spec and in the CM itself that this is a super unstable and might break any time config

@slinkydeveloper
Copy link
Contributor

@JackSilk does the new DeliverySpec.Timeout field satisfies your use case?

@JackSilk
Copy link
Author

JackSilk commented Jul 9, 2021

@slinkydeveloper Sorry for the delay in the reply, yes I think that will work nicely for my use case. Thanks!

@JackSilk JackSilk closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants