-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Resilient feature flags #74
Conversation
1. The database is down/unreachable: `decide` evaluation fails and returns 500 | ||
2. The servers are down/unreachable: requests from client libraries time out / error out. | ||
|
||
(2) seems hard to defend against without a distributed app distribution (and then a resolver to go to the 'correct' app), but (1) is possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a distributed app distribution and what is a 'correct' app?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talk about terrible wording 😅 . I basically mean multiple server deployments where one going down doesn't affect the other, like an edge server; and a load balancer than can appropriately link to healthy closest server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we should build an SDK making an assumption on the underlying infrastructure.
To build resilient software, use defensive programming: always hope for the best (uptime 100%) but always prepare for the worst (everything is down). AKA: a multi-geo deployment might help you, but you should not rely on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not assuming that at all, I'm just listing out what are ways things can go wrong here.
Even with defensive programming, the issue still remains that when servers go down, you stop getting flag information.
(a point below addresses this)
Draft for now, need to flesh this out further, just jotting down core points