Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthchecks aware of State Stores #139

Open
Fryuni opened this issue Apr 26, 2021 · 6 comments
Open

Healthchecks aware of State Stores #139

Fryuni opened this issue Apr 26, 2021 · 6 comments
Labels
question Further information is requested

Comments

@Fryuni
Copy link

Fryuni commented Apr 26, 2021

Not related to any particular environment

Is there any way to have a healthcheck that reports as unhealthy unless the application is actually processing messages? So unhealthy during rebalances and when rebuilding the state stores from the changelog?

What we managed to do is a liveness check for when azkarra is running and a readiness check for when all the applications have been instantiated, but ideally the readiness check should succeed only when the application is processing messages.

This causes a huge problem during scale ups/dows:

  1. An instance is added/removed and starts redistributing the partitions without downtime due to the standby replicas
  2. While the rebalance and state-store rebuild is happening another instance is added/removed

In this scenario, the entire consumer group would perform a full rebalance and discard all their state-stores, this causes up to 30 minutes of downtime

Even if we have every instance being a stand replica to every other instance (which is not scalable, but we tested as a workaround) it just solves the scale down problem. We need to expose to the orchestrator when the application is in a state where it can't be scaled.

I could not find anything in the docs related to this

@Fryuni Fryuni added the question Further information is requested label Apr 26, 2021
@fhussonnois
Copy link
Member

fhussonnois commented Apr 27, 2021

Hi @Fryuni, currently Azkarra exposes the /health endpoint that returns some information about the state of all KafkaStreams instances. In addition, it provides a basic HealthIndicator for Kafka Streams (StreamsHealthIndicator).

Basically, the service is considered to be:

  • UP if the KafkaStreams's state is RUNNING (/health returned with HTTP code 200).
  • DOWN if the KafkaStreams's state ERROR (/health returned with HTTP code 503).
  • Otherwhise, the status for the service is set to UNKNOWN (/health returned with HTTP code 200).

Internally, Azkarra encapsualtes the KafkaStreams instance into the KafkaStreamsContainer class which has a state() method aware of state store recovery. Since Azkarra 0.9, the container will return the the following states during state-store recovering process: STATE_RESTORE_START, STATE_RESTORE_IN_PROGRESS, STATE_RESTORE_COMPLETED.

So, to solve your issue I think you have two solutions :

  1. You can implement and register an additional HealthIndicator that will set the status DOWN whenever the Kafka Streams state is not RUNNING.

  2. You can implement and register a custom JAX-RS resource to expose your own REST endpoints for liveness and readyness. See Azkarra REST Extensions. When implementinb an extension you can get an access to the AzkarraStreamsService to retrieve all KafkaStreamsContainer instance.

Hope this help!

@marcospassos
Copy link

I just checked it out, and it seems easy to implement a custom health check. Thank you!

One more question: how to register a custom HealthIndicator?

@fhussonnois
Copy link
Member

You can just annotate your class with @Component so that the class will be registered to the AzkarraContext. Then, the ApiHealthRoutes will look for all registered components that implement the HealthIndicator interface to build an aggregate HealthIndicator.

@marcospassos
Copy link

Is there any way to register programmatically (I mean, using the builder)?

@fhussonnois
Copy link
Member

Yes, you can just use the AzkarraContext#registerComponent method. Here is an example: https://github.com/streamthoughts/azkarra-streams/blob/master/azkarra-examples/src/main/java/io/streamthoughts/examples/azkarra/noannotation/StreamsApplication.java

Why don't you use the component scan mechanism ?

@marcospassos
Copy link

Oh, I see. We're just not a big fan of annotations :)

I'll try to implement the health check and bring some feedback.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants