-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to move Provider Status
field from provider to SDK, refine semantics around ContextChange events
#238
Comments
Provider Status
field from provider
I think, reading this and the spec as it currently is, that stale may have morphed as single-context was added. https://openfeature.dev/specification/types#provider-status
Which reads to me as different than a context transition for a single-context paradigm. In most LaunchDarkly SDKs we have a data source status, which is largely similar to the provider state currently. Independent of that we have So, in the case where we were not getting updates, the stream was interrupted, I was considering that stale. Example: https://github.com/launchdarkly/openfeature-java-server/blob/86eb1e2d1b47f80ac1016f06c6edd868d2fff49b/src/main/java/com/launchdarkly/openfeature/serverprovider/Provider.java#L154 My general feelings are this:
Some situations: Some other examples of stale versus fresh.
|
I see your 2nd point, and I was also a bit concerned about the unclear semantics of STALE. In the case of the react SDK (and likely any "front-end" SDKs) it's important to have some event that essentially means "context has changed, and my flags are being re-evaluated". During this time, it's desirable to have components "suspend", so that if application authors desire it, loading indicators can be shown while the new flag values are fetched (this is described in 5.3.4.x). Perhaps we're overloading STALE for this purpose. Might it be better to have a client-specific event type for this, in addition to |
It would simplify the status management if we could manage that in the SDK rather than in the provider. I think this would simplify when writing the providers. |
I like the proposal.
This sounds like a good thing to have (even though this time might most times be pretty short).
Totally think so too. This will be better than having complicated state handling in providers that is always "boilerplate". None of the points you mentioned is a blocker for me here @toddbaert and I see the value so I would go with this change. |
Something like flowchart
NOT_READY -->|initialize| READY
READY -->|shutdown| NOT_READY
READY -->|flags out of date| STALE
STALE -->|flags updated| READY
READY -->|context changed| CONTEXT_CHANGE
CONTEXT_CHANGE -->|flags updated| READY
READY -->|some error condition| ERROR
ERROR -->|condition clears| READY
STALE -->|context changed| CONTEXT_CHANGE
CONTEXT_CHANGE -->|fail to update flags| ERROR
I like it. I guess then, potentially out of scope of this specific conversation, remains the terminal error case. |
Ya, I think this is how things would work notionally, though I'd hope that provider-authors and especially application-authors would have to worry less about understanding these transitions if the SDK maintains the necessary state. One more thing probably worth mentioning is that ---
title: Provider context reconciliation
---
stateDiagram-v2
direction LR
NOT_READY --> READY:emit(PROVIDER_READY)
READY --> NOT_READY
READY --> STALE:emit(PROVIDER_STALE)
STALE --> READY:emit(PROVIDER_CONFIGURATION_CHANGED)
READY --> CONTEXT_CHANGE_PENDING:emit(PROVIDER_CONTEXT_CHANGE_PENDING)
CONTEXT_CHANGE_PENDING --> READY:emit(PROVIDER_CONTEXT_CHANGED)
READY --> ERROR:emit(PROVIDER_ERROR)
ERROR --> READY:emit(PROVIDER_READY)
STALE --> READY:emit(PROVIDER_CONTEXT_CHANGED)
I wonder if |
Would it go Or would it just specifically go to |
Thinking about it more, I think |
The simplest, and likely widely applicable, is that you rotate the credential. When that happens there isn't a reason for an SDK to continue to attempt requests with a bad credential. For LD we have a few more, but they may not be as applicable. |
I see. In that case, I'd expect an error is thrown during |
Outside of initialize this happens: 1.) Initialize your SDK and it works for some time. Or polling would be affected the same. It is outside initialize time. During initialization specifically it makes sense to handle it a little differently I agree.
|
@kinyoklion That makes sense, thanks. OK I'll wait for some more feedback, but I think adding something equivalent to |
In general looks good to me, thanks for this. Will have to see all the corner cases on providers implementation.
So to clarify, with this proposal, this check can move from the provider to the sdk.
|
Provider Status
field from providerProvider Status
field from provider to SDK
Provider Status
field from provider to SDKProvider Status
field from provider to SDK, refine semantics around ContextChange events
So it is the case for LaunchDarkly providers currently, our SDKs are init once, and when you shut one down it is done. What this is going to mean for us from a provider level though is that we have to re-structure such that the relationship between the vendor SDK and the provider isn't 1:1 as we initially developed. We will have to make new vendor SDK instances when we encounter that situation. If your vendor SDK was a singleton, and it also had this single direction behavior, then it could be somewhat harder to overcome that requirement. |
Yes. In general, this proposal makes things easier for providers. This is another good example.
I think we can encourage the idempotency in non-normative sections, and talk about that exception. I was mainly concerned about duplicated init calls. This is because since the state is no longer on the provider itself, it's possible the same provider instance might be set more than once, and therefore init could execute more than once. We could make sure the SDKs handle this situation though, by keeping track of which provider refs have already been initialized.
Done, good idea. |
As others had stated, this should benefit provider authors as this simplifies eventing implementation. From SDK view, we already maintain provider references, so storing the last known provider state should be straightforward. We may also keep 5.3.3 support.
Even now SDK implementations avoid re-initialization based on provider state. So once state management get transferred to SDK, I think we could enforce idempotency on init easily through SDK. However, I think we need to think more about shutdown behavior. Right now SDK shutdown propagates the shutdown to all registered providers. Along with that, I expect SDK internals to clear up making states and event handlers reset. So if a provider in shutdown state get registered, SDK won't be able to know its state. And I expect the provider to error when initialization is attempted 🤔 If we cannot differentiate terminal errors and errors (provider_error event), I don't think we need to introduce a new event. |
Ya, I think agree. I think we can keep the 5.3.3 behavior, and this will make the change smaller and less disruptive. We'll just have to make the idea of
I'm starting to think that it might be best to add some kind of additional field to provider errors that indicates if they are fatal or not. This has the benefit of compatibility with what @Kavindu-Dodan described above, and it also resolves a concern I have where application authors might attach |
Yes, I think OF API should expose provider state.
Agree. Unrecoverable errors could be built into the current error definition. We just need to figure out how |
We could do this, but honestly, I think it's more important to expose
👍 I have proposed a new error code for this. |
I've opened #241 |
I'm in support of these changes as well. One question I had thinking through this in the context of our web provider is recently, while implementing support for the new React SDK, we had to fix our implementation of However, our real-time update system is a two-step process, where the SSE event is received by our SDK, which then triggers an API call to fetch updated flag values. In this new status model, my understanding is that our SDK should emit a |
Yes I think that's in line with that's proposed. |
* moves provider state into SDK (SDK now maintains provider state/lifecycle; provider interface is now "stateless") * refine semantics around client state when context is pending/reconciled by using new events/states, instead of STALE * add PROVIDER_FATAL error code * add PROVIDER_RECONCILING event (client only) * add RECONCILING provider status status (client only) Resolves: #238 --------- Signed-off-by: Todd Baert <todd.baert@dynatrace.com> Co-authored-by: Michael Beemer <beeme1mr@users.noreply.github.com> Co-authored-by: Lukas Reining <lukas.reining@codecentric.de> Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com> Co-authored-by: Fabrizio Demaria <fabrizio.f.demaria@gmail.com>
Background
Currently, provider authors are required to maintain a field indicating their state: (
NOT_READY
/READY
/STALE
/ERROR
).This burdens provider authors in a few ways that are not particularly ergonomic, and creates potential foot-guns.
This is particularly concerning for SDKs which rely heavily on events (React, for example); it's quite possible for the "suspense" functionality to break and suspend indefinitely if a provider fails to correctly set their READY state after they are initialized. Additionally, we may need additional states/events to support emerging client use-cases.
Problem 1: State-management at Initialization
The SDK itself calls
initialize()
on any provider in stateNOT_READY
.The SDK then emits an
READY
orERROR
event based on the successful/unsuccessful termination ofinitialize()
.However, it's the provider's responsibility to then set the
state
correctly.This results in confusion, and frequently provider authors neglect to properly set the providers status (this is often caught in PRs).
Problem 2: State-management at context change (client paradigm only)
The static context paradigm (client-side paradigm) defines a
context change
method that can be implemented by the provider, which performs whatever operations are needed to reconcile the evaluated flags with the new context.Similar to problem above, the burden is put on the provider author to correctly update state (as well as emit the STALE event) when this function is called: https://openfeature.dev/specification/sections/events/#condition-534. Additionally, the semantics of PROVIDER_STALE event may not be ideal or sufficient for this use case; STALE was not originally intended to indicate the providers flags needed to be resolved due to a context change.
Solution:
To resolve the both the above, the provider state should be maintained by the SDK, and become only notional to the application author, and the SDK should emit all events that are part of the provider lifecycle.
This will make the provider lifecycle entirely the domain of the SDK; the provider would need only to implement the optional
initialize
andshutdown
, andcontext change
(client-only) methods, and the SDK should handle the rest.From a provider-author's perspective, providers would become stateless; the authors job is simply to implement the desired interfaces, while the SDK would manage the entire lifecycle as appropriate, based on the implemented methods.
SDKs would run initialize/onContextChange/shutdown if they are defined, and maintain any state if necessary.
Providers would only have to ensure initialize and shutdown are idempotent and re-entrant.
For the
onContextChange
, the SDK would also emit a new event (PROVIDER_CONTEXT_CHANGE_PENDING
?) when the context is updated as well.See the example below:
Current example of a compliant onContextChange:
Proposed simplification:
Considerations
ERROR
/READY
events when they disconnect/recover; these may still be "manually" emitted from providers since they originate with state changes specific to the provider implementation, and not lifecycle changes originating from the SDK.PROVIDER_STALE
, to indicate the specific circumstance that a provider is pending an update after a context change.Next steps
I've done a PoC in JS with this, which you can see here.
If we agree on this path forward, we can remove this requirement from the spec, which would entail the update and removal of a few points. Follow-up implementation in SDKs should be relatively simple and non-breaking.
@lukas-reining @kinyoklion @thomaspoignant @beeme1mr @liran2000 @luizgribeiro @jonathannorris @fabriziodemaria @vahidlazio if this sounds compelling I will open a spec PR.
The text was updated successfully, but these errors were encountered: