Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YamlFileConfigurationService fails to start health check monitors #624

Merged

Conversation

mikkokar
Copy link
Contributor

@mikkokar mikkokar commented Feb 12, 2020

Summary

This PR fixes a race condition in YamlFileConfigurationService that sometimes prevents it from being able to start health check monitoring services.

Also, yaml file configuration service now adds a backend service app name in the health check monitoring service name. This is useful as the application name will show up in possible error messages in case other bugs will surface in future.

User impact

Especially, this bug affects deployments several YamlFileConfigurationService providers (more than one). When the health check monitoring service is not started, the relevant HostProxy objects for the origin are not added to the load balancing group, and therefore they are always unreachable.

The issue doesn't affect, or is highly unlikely to affect deployments with only one YamlFileConfigurationService provider.

Root cause

Styx object database is a lockless concurrent in-memory database for storing routing/provider/etc objects. Its compute method takes a lambda that provides a new styx object that is stored in the database. The compute lambda action must idempotent because the database will call it again if it detects a concurrent modification to the database.

But YamlFileConfigurationService was not idempotent. It attempted to start the health check monitoring service in this lambda callback. Therefore, a retry during the concurrent modification caused the health check monitoring service to be started again, thus resulting in an IllegalStateException.

Fixed this by caching the return value from the lambda callback. Consider this as a work around until I figure out something better.

Add application name in the HealthCheckMonitoringService name.
@mikkokar mikkokar changed the title Fix race condition in YamlFileConfigurationService. YamlFileConfigurationService sometimes fails to start health check monitors Feb 12, 2020
@mikkokar mikkokar changed the title YamlFileConfigurationService sometimes fails to start health check monitors YamlFileConfigurationService fails to start health check monitors Feb 12, 2020
@mikkokar mikkokar merged commit d931f0a into ExpediaGroup:master Feb 12, 2020
@mikkokar mikkokar deleted the styx-object-store-race-condition branch February 12, 2020 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants