-
Notifications
You must be signed in to change notification settings - Fork 232
Open
Description
This is mostly a reality check if our custom bootstrap is using JOSDK as intended or not. Depending on the feedback we can close this issue and create more specific follow-up issues if needed. I just wanted to describe the context once.
// configure operator
final var customMetrics = new CustomResourceMetrics(config, meterRegistry);
final var eventSender = new EventSender(config, client);
final var reconciler = new HiveMQPlatformReconciler(config, customResourceMetrics, client, eventSender);
final var dependentResourceFactory = new HiveMQPlatformOperatorDependentResourceFactory<>(config, eventSender);
final var metrics = MicrometerMetrics.newPerResourceCollectingMicrometerMetricsBuilder(meterRegistry).build();
final var operator = new Operator(override -> override //
.withConcurrentReconciliationThreads(config.getConcurrentReconciliationThreads())
.withConcurrentWorkflowExecutorThreads(config.getConcurrentWorkflowThreads())
.withReconciliationTerminationTimeout(config.getReconciliationTerminationTimeout())
.withCacheSyncTimeout(config.getCacheSyncTimeout())
.withKubernetesClient(client)
.withCloseClientOnStop(closeClient)
.withDependentResourceFactory(dependentResourceFactory)
.withMetrics(metrics)
.withUseSSAToPatchPrimaryResource(useSSA)
.withSSABasedCreateUpdateMatchForDependentResources(useSSA));
final var reconcilerConfig = operator.getConfigurationService().getConfigurationFor(reconciler); // (1)
final var overrideConfig = ControllerConfigurationOverrider.override(reconcilerConfig);
if (operatorNamespaces.equals(Constants.WATCH_CURRENT_NAMESPACE)) {
overrideConfig.watchingOnlyCurrentNamespace();
} else if (operatorNamespaces.equals(Constants.WATCH_ALL_NAMESPACES)) {
overrideConfig.watchingAllNamespaces();
} else {
final var namespaces = Arrays.stream(operatorNamespaces.split(",")).collect(Collectors.toSet());
overrideConfig.settingNamespaces(namespaces);
}
if (!operatorSelector.isBlank()) {
overrideConfig.withLabelSelector(operatorSelector);
}
overrideConfig.withOnAddFilter(platform -> { // (3)
customResourceMetrics.register(platform);
return true;
});
final var controller = (Controller<HiveMQPlatform>) operator.register(reconciler, overrideConfig.build());
customResourceMetrics.setCache(controller.getEventSourceManager().getControllerEventSource()); // (2)HiveMQPlatformOperatorDependentResourceFactory implements DependentResourceFactory(it's the only place where we really had to replace Quarkus Arc, otherwise we don't need a CDI).closeClientistruein production and only set tofalsein tests to not close an injected K8s client (that is still used in the test).useSSAistruein production and only set tofalsein tests with the K8s mockserver.
- Are we creating the configuration and overrides as intended? I tried to reverse engineer how Quarkus and the
LocallyRunOperatorExtensionare configuring JOSDK, but we still get a warning on theoperator.getConfigurationService().getConfigurationFor(reconciler)invocation:It feels wrong to get that warning, but I found no other way to create a12:29:37.423 [main] WARN Default ConfigurationService implementation - Configuration for reconciler 'hivemq-controller' was not found. Known reconcilers: None.ControllerConfigurationOverrider. - How to properly implement custom metrics on all custom resources in an efficient way? For example, in
CustomResourceMetricswe have a globalGaugefor each state of our operator state machine. To calculate the values we count all custom resources that are in that state:Thatcache.list() .map(CustomResource::getStatus) .filter(Objects::nonNull) .filter(status -> status.getState().equals(state)) .count();
cacheis theControllerEventSourcethat we set viacustomResourceMetrics.setCache(controller.getEventSourceManager().getControllerEventSource()). This feels a bit illegal but works very fine for us.
In the previous implementation we kept a shadow copy of all custom resources in a CHM, but these were the cloned instances from the reconciliation loop. This impacted the GC and needed constant updates of the cached instances to retrieve the current state. Using the original instances from the cache solved this problem very elegantly for us, including the lifecycle management. - The same problem hits us now again with a new
Gaugefor an aggregated health status metric per custom resource. We need to register thisGaugeonce when the CR is added, de-register it when it's removed and in between have access to the currentCustomResource::getStatusto get the health state. So we're back to aMap<String, GaugeHolder>with<namespace>-<name>as key.
For theGaugecreation I'm usingoverrideConfig.withOnAddFilter(), which again feels very wrong, but seems to work fine. TheGaugeremoval is done in theCleaner::cleanupmethod of our reconciler.
Is there a better way for the lifecycle control and where to put thatGauge? It feels like we cannot put this into the custom resource itself, because of the cloning (theGaugeneeds to be a singleton) and to update the correct instance. This is how it looks like in theCustomResourceMetricsright now:private final @NotNull Map<String, GaugeHolder> healthMetricGauges = new ConcurrentHashMap<>(); // called from the bootstrap public void register(final @NotNull HiveMQPlatform platform) { healthMetricGauges.put(getKey(platform), new GaugeHolder(platform)); } // called from Cleaner::cleanup public void deregister(final @NotNull HiveMQPlatform platform) { if (healthMetricGauges.get(getKey(platform)) instanceof GaugeHolder gaugeHolder) { meterRegistry.remove(gaugeHolder.gauge); } } // called from the main reconciler when the health state has changed public void updateHealthMetric(final @NotNull HiveMQPlatform platform) { if (healthMetricGauges.get(getKey(platform)) instanceof GaugeHolder gaugeHolder) { gaugeHolder.value.set(platform.getStatus().getHealthStatus().getMetricsValue()); } } private static @NotNull String getKey(final @NotNull HiveMQPlatform platform) { return platform.getMetadata().getNamespace() + "-" + platform.getMetadata().getName(); } private class GaugeHolder { private final @NotNull AtomicInteger value = new AtomicInteger(HealthStatus.UNKNOWN.getMetricsValue()); private final @NotNull Gauge gauge; public GaugeHolder(final @NotNull HiveMQPlatform platform) { this.gauge = Gauge.builder("hivemq.platform.health.system.current", value::get) .tag("namespace", platform.getMetadata().getNamespace()) .tag("name", platform.getMetadata().getName()) .description("Custom resource health status") .register(meterRegistry); } }
Metadata
Metadata
Assignees
Labels
No labels