Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog get operation randomness problematic in heterogenous ACL environments #276

Closed
chemicL opened this issue Oct 30, 2017 · 1 comment

Comments

@chemicL
Copy link
Contributor

chemicL commented Oct 30, 2017

Currently, the read operation for list of services retries any cached agent in the cluster (obtained via local agent query for nodes).

The issue with that is in non-homogenous environments, where some agents are read-only and have no ACL configuration attached, and are handled by anonymous token.

When marathon-consul is equipped with a token, it can send it to a read-only agent, which will fail the request not knowing where to forward the ACL Token bound query.

Despite having a local agent defined (consul-local-agent-host), all read requests still go to a random node in the first place, which should be avoided, too.

Steps to reproduce:

  1. Setup consul server cluster with ACL policies enabled (default policy allow),
  2. Generate token for read operations for services,
  3. Setup one consul client without ACL-DC configured,
  4. Setup one consul client with ACL-DC pointing to the server cluster from step 1,
  5. Configure marathon-consul on another node with a local consul client agent (it doesn't matter whether it's client has ACL-DC configured) and consul-token with the value from step 2.

Running marathon-consul in this setup should at some point yield logs:

... "error":"Unexpected response code: 403 (rpc error: rpc error: ACL not found)","level":"error","msg":"An error occurred getting services from Consul, retrying with another agent" ...
@chemicL chemicL changed the title Catalog get operation should not touch any agent Catalog get operation randomness problematic in heterogenous ACL environments Oct 30, 2017
chemicL pushed a commit that referenced this issue Oct 31, 2017
If defined, the local consul agent is used to read list of services.
Otherwise, a random agent is picked, but only from the cache of agents
already syncing registrations.
This change prevents situations, where contacting a random agent in the
cluster fails, because it has no ACL-DC configured.
chemicL pushed a commit that referenced this issue Oct 31, 2017
If defined, the local consul agent is used to read list of services.
Otherwise, a random agent is picked, but only from the cache of agents
already syncing registrations.
This change prevents situations, where contacting a random agent in the
cluster fails, because it has no ACL-DC configured.
janisz pushed a commit that referenced this issue Oct 31, 2017
If defined, the local consul agent is used to read list of services.
Otherwise, a random agent is picked, but only from the cache of agents
already syncing registrations.
This change prevents situations, where contacting a random agent in the
cluster fails, because it has no ACL-DC configured.
@janisz
Copy link
Contributor

janisz commented Oct 31, 2017

Fixed by #277

@janisz janisz closed this as completed Oct 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants