Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix origin reactivation logic for admin interface #561

Merged

Conversation

chrisgresty
Copy link
Contributor

(See #518 )
Reactivation of an origin should follow the existing logic as described here for the Origin Toggle: https://github.com/HotelsDotCom/styx/blob/master/docs/user-guide/admin-interface.md

i.e. if health checks are enabled then a reactivated origin goes into an unreachable state, and it is the health check service that moves it into an active state.

@@ -116,6 +116,11 @@ internal class HealthCheckMonitoringService(
futureRef.get().cancel(false)
}

fun isRunning() : Boolean {
val executor = futureRef.get() ?: return false
return !executor.isCancelled && !executor.isDone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but it is a ScheduledFuture that is stored in futureRef, not an executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar comment to below about taking a more functional approach can be made here, too, which removes the name of that val altogether.

&& (provider.styxService as HealthCheckMonitoringService).isRunning()
} ?: return false
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use more "functional" approach:

  • This eliminates three return statements
  • Therefore might be bit easier to follow?
  • Bit more idiomatic in relation to our existing Kotlin codebase?
private fun hasActiveHealthCheck2(origin: RoutingObjectRecord) = lbGroupTag
        .find(origin.tags)
        ?.let { appId ->
            serviceDb.entrySet()
                    .firstOrNull { (_, provider) ->
                        provider.type == HEALTH_CHECK_MONITOR
                                && provider.tags.contains(targetTag(appId))
                                && (provider.styxService as HealthCheckMonitoringService).isRunning()
                    }
        } == null

But this is just my personal preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some more info about Kotlin scope functions: https://kotlinlang.org/docs/reference/scope-functions.html

expectStateChange(STATE_INACTIVE, STATE_ACTIVE, STATE_UNREACHABLE, false, "app.origin.hc.running")
}

// TODO: Complete this test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3xTODOs here. Are these still work in progress? Or just old comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, these are old. Will remove.

// TODO: Complete this test
scenario("Activating a healthchecked (but the healthchecker is stopped), inactive origin results in an unreachable state") {
expectStateChange(STATE_INACTIVE, STATE_ACTIVE, STATE_ACTIVE, false, "app.origin.hc.stopped")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind me, when do we end up stopping a health checker (if that is active)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, only when the origins config is updated to remove healthchecking configuration for an origin altogether. Which does mean that this case is unlikely. But I was bothered by the fact that the service does support being stopped, and if it was, then there would be nothing to move the state of the origin from unreachable to active. So, straying into YAGNI territory a bit, I guess. Happy to excise this if you reckon it's not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, straying into YAGNI territory a bit, I guess. Happy to excise this if you reckon it's not needed.

No this is an important consideration. Just that this scenario didn't immediately occur to me.

Based on your response, I understand there are two possible scenarios:

  1. Health check configuration is removed from origins configuration.
  2. Origins configuration remains unchanged, but somebody stops the service over the admin interface?

Case 1: Removal of health check configuration only:

  • Looking at code, in this case the host proxy is re-tagged with STATE_ACTIVE. Which seems correct.
  • Looking at updateRoutingObjects in YamlFileConfigurationService, I do see a possible bug. The update function doesn't take the tags into account. Suppose the tags alone change, but object's YAML configuration remains unchanged. This code doesn't seem to update the underlying routing object record in this case: WDYT?
     objectDefs.forEach { objectDef ->
            routeDb.compute(objectDef.name()) { previous ->
                if (previous == null || changed(objectDef.config(), previous.config)) {
                    // Possible optimisation: check if the tags alone changed, or if the configuration changed too.
                    previous?.routingObject?.stop()
                    converter.routingObjectRecord(objectDef)
                } else {
                    previous
                }
            }
        }

Case 2: Looking at code, it removes the health check tag, and sets state=active.

Are there any other scenarios we'd have to take into account?

Copy link
Contributor Author

@chrisgresty chrisgresty Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update function doesn't take the tags into account.

One side-effect of doing so is that, even if you haven't actually changed the tags in the config file, but the tags have changed at runtime, then you will reset the tags to the configured set each time you reload the origins file. So if you've deactivated an origin via the admin interface, you will reactivate it by reloading the origins file, even if you didn't change anything about that origin in the origins file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation also resets the tags, but only if there has been a config change for that origin. Which seems a bit more reasonable, even if not perfect.

@chrisgresty chrisgresty merged commit 619e6fc into ExpediaGroup:master Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants