You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running choria 0.16.0 RPM under centos 7 with mcollective plugins, installed by puppet module
We recently moved our Choria brokers, and added new SRV records for the new hosts, removing the old ones. However, when the old brokers were shut down, forcing agents to reconnect to another available broker, agents did not reload the new SRV records and instead continued to try to connect to only the old hosts rather than the new ones, failing to connect as the old hosts were gone.
Restarting the agents caused the SRV to be re-read and connection continued to the new brokers as expected.
Choria agents should not cache the SRV records for the broker addresses, but should instead re-query DNS each time they attempt to reconnect, in case of changes to the list of brokers.
To duplicate
Create cluster of 2 choria brokers.
Set SRV records to identify only broker 1
Start choria agent; see it connects to broker 1
Restart choria broker 1; see that agent is disconnected,and reconnects to broker 1 when it is available
Change SRV record to point to broker 2 (only)
Shut down broker 1 service
Observe that choria agent restarts and attempts to connect to broker 1 but not to broker 2, indicating that it is not rechecking the SRV records defined
Possibly connected to the DNS TTL for the domain?
The text was updated successfully, but these errors were encountered:
Agree and this is a known issue. We use the NATS go package that does not let us update names like that :(
I will have a chat with the authors and see if there is something we can do but as it is it’s a bit orthogonal - the nats package doesn’t support SRV at all so choria does the lookup and configure the package but on reconnect we have no way to do so.
In your exact scenario I could perhaps improve the situation - nats can learn about new hosts on its own. So when you expanded the cluster to 2 nodes the connected ones could have known it’s there for reconnect purposes - but I don’t think we enable that behaviour.
Anyway it’s a long standing pain. Will try again with the authors if we can do something.
Running choria 0.16.0 RPM under centos 7 with mcollective plugins, installed by puppet module
We recently moved our Choria brokers, and added new SRV records for the new hosts, removing the old ones. However, when the old brokers were shut down, forcing agents to reconnect to another available broker, agents did not reload the new SRV records and instead continued to try to connect to only the old hosts rather than the new ones, failing to connect as the old hosts were gone.
Restarting the agents caused the SRV to be re-read and connection continued to the new brokers as expected.
Choria agents should not cache the SRV records for the broker addresses, but should instead re-query DNS each time they attempt to reconnect, in case of changes to the list of brokers.
To duplicate
Possibly connected to the DNS TTL for the domain?
The text was updated successfully, but these errors were encountered: