SRV records do not appear to be reloaded by agent before reconnecting to broker #1167

sshipway · 2021-03-15T02:23:00Z

Running choria 0.16.0 RPM under centos 7 with mcollective plugins, installed by puppet module

We recently moved our Choria brokers, and added new SRV records for the new hosts, removing the old ones. However, when the old brokers were shut down, forcing agents to reconnect to another available broker, agents did not reload the new SRV records and instead continued to try to connect to only the old hosts rather than the new ones, failing to connect as the old hosts were gone.

Restarting the agents caused the SRV to be re-read and connection continued to the new brokers as expected.

Choria agents should not cache the SRV records for the broker addresses, but should instead re-query DNS each time they attempt to reconnect, in case of changes to the list of brokers.

To duplicate

Create cluster of 2 choria brokers.
Set SRV records to identify only broker 1
Start choria agent; see it connects to broker 1
Restart choria broker 1; see that agent is disconnected,and reconnects to broker 1 when it is available
Change SRV record to point to broker 2 (only)
Shut down broker 1 service
Observe that choria agent restarts and attempts to connect to broker 1 but not to broker 2, indicating that it is not rechecking the SRV records defined

Possibly connected to the DNS TTL for the domain?

ripienaar · 2021-03-15T06:32:12Z

Agree and this is a known issue. We use the NATS go package that does not let us update names like that :(

I will have a chat with the authors and see if there is something we can do but as it is it’s a bit orthogonal - the nats package doesn’t support SRV at all so choria does the lookup and configure the package but on reconnect we have no way to do so.

ripienaar · 2021-03-15T06:34:09Z

In your exact scenario I could perhaps improve the situation - nats can learn about new hosts on its own. So when you expanded the cluster to 2 nodes the connected ones could have known it’s there for reconnect purposes - but I don’t think we enable that behaviour.

Anyway it’s a long standing pain. Will try again with the authors if we can do something.

sshipway · 2021-03-15T07:45:41Z

Thanks for the info on this - I know its a bit of an edge case but its a definite caveat that you need to be aware of when migrating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SRV records do not appear to be reloaded by agent before reconnecting to broker #1167

SRV records do not appear to be reloaded by agent before reconnecting to broker #1167

sshipway commented Mar 15, 2021

ripienaar commented Mar 15, 2021

ripienaar commented Mar 15, 2021

sshipway commented Mar 15, 2021

SRV records do not appear to be reloaded by agent before reconnecting to broker #1167

SRV records do not appear to be reloaded by agent before reconnecting to broker #1167

Comments

sshipway commented Mar 15, 2021

ripienaar commented Mar 15, 2021

ripienaar commented Mar 15, 2021

sshipway commented Mar 15, 2021