Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acl_enforce_version_8 denies for node deregister reconcile despite acl "management" permissions being set #2792

Closed
nathanwebsterdotme opened this issue Mar 9, 2017 · 14 comments
Labels
type/bug Feature does not function as expected
Milestone

Comments

@nathanwebsterdotme
Copy link

consul version for both Client and Server

Server: 0.7.5

consul info for both Client and Server

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 1
build:
	prerelease =
	revision = '21f2d5a
	version = 0.7.5
consul:
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 10.2.5.96:8300
	server = true
raft:
	applied_index = 445
	commit_index = 445
	fsm_pending = 0
	last_contact = 0
	last_log_index = 445
	last_log_term = 9
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:10.2.5.159:8300 Address:10.2.5.159:8300} {Suffrage:Voter ID:10.2.5.96:8300 Address:10.2.5.96:8300} {Suffrage:Voter ID:10.2.5.5:8300 Address:10.2.5.5:8300}]
	latest_configuration_index = 346
	num_peers = 2
	protocol_version = 1
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 9
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 76
	max_procs = 1
	os = linux
	version = go1.7.5
serf_lan:
	encrypted = true
	event_queue = 0
	event_time = 9
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 1
	member_time = 23
	members = 4
	query_queue = 0
	query_time = 1
serf_wan:
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 5
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Ubuntu 16 running on AWS EC2
All required Security Group ports are open

Description of the Issue (and unexpected/desired result)

Consul Server config:

{
  "bind_addr": "10.2.5.96",
  "bootstrap": false,
  "bootstrap_expect": 2,
  "retry_join_ec2": {
    "region": "eu-west-1",
    "tag_key": "service",
    "tag_value": "consul"
  },
  "server": true,
  "datacenter": "dc1",
  "data_dir": "/opt/consul/data",
  "log_level": "TRACE",
  "enable_syslog": true,
  "client_addr": "0.0.0.0",
  "dns_config": {
    "allow_stale": true,
    "max_stale": "30s",
    "node_ttl": "30s",
    "service_ttl": {
      "*": "30s"
    },
    "enable_truncate": true,
    "only_passing": true
  },
  "domain": "int.discovery",
  "ui": true,
  "acl_datacenter": "dc1",
  "acl_master_token": "master-token-uuid",
  "acl_default_policy": "deny",
  "acl_down_policy": "deny",
  "acl_enforce_version_8": true,
  "acl_agent_token": "server-agent-token-uuid",  # has management permissions
  "leave_on_terminate" : true,
  "encrypt": "encrypt-key",
  "disable_remote_exec": true,
  "ca_file": "/opt/consul/consul.d/ssl/ca.cert",
  "cert_file": "/opt/consul/consul.d/ssl/consul.cert",
  "key_file": "/opt/consul/consul.d/ssl/consul.key",
  "verify_incoming": true,
  "verify_outgoing": true
}

We are having problems when trying to deregister a member from the cluster when "acl_enforce_version_8": true,

Errors in consul monitor are:

2017/03/09 15:15:34 [INFO] consul: member 'consul-server-123' left, deregistering
2017/03/09 15:15:34 [ERR] consul: failed to reconcile member: {consul-server-123 10.2.5.8 8301 map[vsn:2 role:consul id:3f42bfce-918d-4bbe-a466-3fa9c7e3518c expect:2 dc:dc1 vsn_min:2 build:0.7.5:'21f2d5a port:8300 vsn_max:3] left 1 5 2 2 5 4}: Permission denied

Reproduction steps

  • Create a consul cluster with 4 nodes.. acl_enforce_version_8 is enabled and acl_agent_token has management acl permissions
  • Log on to one of the nodes and do consul leave command
  • run consul monitor and see node failing to be deregistered fully.
  • node is still returned as healthy with dig commands and in UI, which is incorrect.

Log Fragments or Link to gist

2017/03/09 14:56:04 [INFO] serf: EventMemberLeave: consul-server-123 10.2.5.8
2017/03/09 14:56:04 [INFO] consul: Removing LAN server consul-server-123 (Addr: tcp/10.2.5.8:8300) (DC: dc1)
2017/03/09 14:56:04 [INFO] raft: Updating configuration with RemoveServer (10.2.5.8:8300, ) to [{Suffrage:Voter ID:10.2.5.159:8300 Address:10.2.5.159:8300} {Suffrage:Voter ID:10.2.5.96:8300 Address:10.2.5.96:8300}]
2017/03/09 14:56:04 [INFO] raft: Removed peer 10.2.5.8:8300, stopping replication after 304
2017/03/09 14:56:04 [INFO] raft: aborting pipeline replication to peer {Voter 10.2.5.8:8300 10.2.5.8:8300}
2017/03/09 14:56:04 [INFO] consul: member 'consul-server-123' left, deregistering
2017/03/09 14:56:04 [ERR] consul: failed to reconcile member: {consul-server-123 10.2.5.8 8301 map[vsn_max:3 dc:dc1 vsn_min:2 build:0.7.5:'21f2d5a port:8300 vsn:2 role:consul id:3f42bfce-918d-4bbe-a466-3fa9c7e3518c expect:2] left 1 5 2 2 5 4}: Permission denied

dig

ubuntu@consul-server-122:~$ dig consul.service.int.discovery @10.2.5.159 -p 8600 +short
10.2.5.96
10.2.5.5
10.2.5.159
10.2.5
@nathanwebsterdotme nathanwebsterdotme changed the title acl_enforce_version_8 denies node deregister despite acl "management" permissions being set acl_enforce_version_8 denies for node deregister reconcile despite acl "management" permissions being set Mar 9, 2017
@slackpad slackpad added this to the 0.8.0 milestone Mar 9, 2017
@slackpad
Copy link
Contributor

slackpad commented Mar 9, 2017

Thanks for the report @nathanwebsterdotme - the acl_agent_token for the Consul servers should have been the right thing here, so we will look into that bug. For 0.8 we are also going to set things up so the servers can always make catalog changes during a reconcile so you don't need the acl_agent_token set up at all for the servers, which will also fix this.

@slackpad slackpad added the type/bug Feature does not function as expected label Mar 9, 2017
@nathanwebsterdotme
Copy link
Author

Thanks for the quick response. Will you look to fix this prior to v0.8 or will we need to wait for that? This effectively makes acl_enforce_version_8 unusable right? Or is there a possible workaround here that you can think of?

@slackpad
Copy link
Contributor

slackpad commented Mar 9, 2017

I'll take a quick look and see why the acl_agent_token isn't working - the code path is there to pick it up. Can you double check that's a management token?

@nathanwebsterdotme
Copy link
Author

Yep 100%

@nathanwebsterdotme
Copy link
Author

nathanwebsterdotme commented Mar 9, 2017

[
    {
        "CreateIndex": 5,
        "ID": "master-token-uuid",
        "ModifyIndex": 5,
        "Name": "Master Token",
        "Rules": "",
        "Type": "management"
    },
    {
        "CreateIndex": 6,
        "ID": "acl-agent-token-uuid",
        "ModifyIndex": 6,
        "Name": "Consul Server",
        "Rules": "",
        "Type": "management"
    },
    {
        "CreateIndex": 4,
        "ID": "anonymous",
        "ModifyIndex": 4,
        "Name": "Anonymous Token",
        "Rules": "",
        "Type": "client"
    }
]

@slackpad
Copy link
Contributor

slackpad commented Mar 9, 2017

Thanks for checking that - I'll see what's going on. I don't think we'd cut a release to fix this before 0.8, which is our next milestone, but I'll see if I can get a fix in so you can try with a local build, if possible.

@slackpad
Copy link
Contributor

slackpad commented Mar 9, 2017

Master has a fix - if you can please give that a try. Thanks!

@nathanwebsterdotme
Copy link
Author

Thanks for the speedy resolution - will try this today and feedback.

@nathanwebsterdotme
Copy link
Author

@slackpad Just for curiosities sake, when might this be released as a binary? We'll put in work arounds for our playbooks for now, but would like to revert these changes asap. Thanks

@slackpad
Copy link
Contributor

We are working on the 0.8 release which is a few weeks out.

@wclarke1
Copy link

@slackpad thanks for the update. I work with Nathan and can confirm that the change worked as expected. Will there be a 0.7.x version with these changes in them? or is it best to wait for the 0.8 release? Thanks again for the quick resolution on this issue.

@slackpad
Copy link
Contributor

We don't have any more 0.7.x releases planned so 0.8 will be the next release with the fix unless we need to push a build for some other reason. I know this isn't great, but 0.7.5 released from a branch (https://github.com/hashicorp/consul/tree/v0.7.5-rel) so it would be easy to cherry pick just this one line fix into that branch if you wanted to run with this sooner.

@wclarke1
Copy link

Thats fine thanks @slackpad , we will wait for 0.8 release.

@johndistasio
Copy link

For anyone else on 0.7.5 running into this problem and finding this issue - in case it's not obvious, this isn't specific to whatever token is set as acl_agent_token, rather whatever token is used by the agent with the appropriate node policy, thus filling the role of the agent token.

In my case, I'm applying node policy on my agent's acl_token that has regular client permissions and still ran into this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants