-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-75 dns record lifecycle notes #76
Conversation
If the record is received prematurely - the `ValidFor` + `QueuedAt` is more than the current time - we requeue it again for the `ValidFor` duration. | ||
|
||
|
||
When we encounter an error during the reconciliation we will not requeue the record and will put in an appropriate error message in the log and on the record. In order for it to reconcile again there must be a change to the DNS Record CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit final. Is that any error or an error from the dns provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any error: from provider failures to MZ not being in a ready state. We treat all the errors in the same way. To be honest, in reality, there is one case - if it tries to update the record and fails to write the status to the CR - when the error would make it reconcile immediately (and that is exactly what we want). In any other case, the error will be put in the Ready - false
condition and we would obey the noRequeueDuration
. The idea is that there is something wrong in your configuration and you should first fix that and it will trigger reconciliation again.
We have the power, however, to manipulate this behaviour through the returned duration right now: we could decide that for "managed zone not ready" we want to requeue the record in ~15 minutes but for an unknown policy type we might want to stop trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am just not sure we should always assume misconfiguration. There could be a network problem for example that means you can't reach the provider. So in my mind setting ready to false and requeuing but for RequeueDuration
in these cases might not be a bad thing. IE allowing for some external state to have been fixed (perhaps there was something wrong with the zone or the credential for example) and then trying again. I realise we are documenting how it works right now which is fine, but I think we should capture whether we want it to behave that way permanently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create an issue for this, or rather append it to #80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. Just wanted to clarify on the error handling
A simple .md for people way better at docs to reference when documenting DNS Record lifecycle