Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better logging of ACL token misusage #4706

Closed
jakubdyszkiewicz opened this issue Sep 21, 2018 · 5 comments
Closed

Better logging of ACL token misusage #4706

jakubdyszkiewicz opened this issue Sep 21, 2018 · 5 comments
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature

Comments

@jakubdyszkiewicz
Copy link

Hello,

we've got a cluster of 1k+ consul nodes on our production environment with ACLs enabled. Today, one of the nodes started sending requests with invalid ACL token due to misconfiguration.
All we could see in master logs were those messages

Sep 21 13:20:28 consul1 consul[7811]:     2018/09/21 13:20:28 [ERR] consul: RPC failed to server XX.XX.XXX.X:XXXX in DC "dc1": rpc error: rpc error: ACL not found

(XX.XX.XXX.X:XXXX - IP of the master node)

How should we debug which node is sending requests with invalid ACL?
We don't keep logs from our agents centralised, because it's just too much information.

It would be nice to see in logs which host is causing a problem.

@mkeeler mkeeler added the theme/acls ACL and token generation label Sep 21, 2018
@mkeeler
Copy link
Member

mkeeler commented Sep 21, 2018

@jakubdyszkiewicz Currently there isn't a good way to debug which agent this is coming from. The current ACL system doesn't assign any identifier to ACL tokens other than the secret and for security reasons we cannot log or include that in telemetry.

That log is emitted when a server is forwarding an RPC to another DC so the IP addresses in it don't help much. The only place to figure this out would be the logs of each consul node. The error while logged on the server that is forwarding the RPC will also be propagated back to the node that originated the RPC in the first place. This is not a particularly good solution but its all that can be done now.

I have been working on a new version of Consul ACLs which should address many of the usability issues. I think the best fix here would be to simply log the IP address/ node name of where the RPC came from.

@pearkes
Copy link
Contributor

pearkes commented Feb 6, 2019

#5319 is also related and has a similar ask:

After update from Consul version 0.8.4 to version 1.3.1 we have errors in the consul monitor logs:
[DEBUG] consul: dropping service "service" from result due to ACLs
How can I determine which Consul client try to get this service and why this doesn't allow?

@mkeeler
Copy link
Member

mkeeler commented Feb 6, 2019

I have a little more context to add to this issue.

The new ACL system assigns accessor ids to tokens which we could use for logging. For permission denied errors or in the case where we filter results we could include the accessor id of the token used in the request. However in the ACL not found case the only data we have is the tokens secret which doesn't match any currently valid token. Secret data cannot go into the logs, so in these cases we really need to log some network information about where the request came from.

Currently our RPC infrastructure uses the builtin golang net/rpc server for handling RPC requests. Unfortunately the net/rpc server does not allow you to retrieve any context regarding the requests originator (IP Address or otherwise). In order to implement better tracking of failed requests its going to require replacing that rpc server with an alternative.

So the final fix for this should contain two things:

  1. Log Accessor ID of the token used for the request when data is filtered or permission is denied completely.
  2. Replace our RPC server with one where we can retrieve request information and log it appropriately.

@drawks
Copy link
Contributor

drawks commented Nov 13, 2019

This ticket is now old enough to eat solid food, can we perhaps get some traction on resolving this most fundamental ask. I'm currently staring at consul logs and still am unable to actually come up with an actionable task based on ACL denies. This is beyond unreasonable that a service intended to be central to modern infrastructure would have an internal blind spot this bad for this long.

@mkcp
Copy link
Contributor

mkcp commented Jan 27, 2020

I believe this ticket is resolved by #7117, implementing support asks in #7010.

I hope that this indeed helps operators by giving you all a clear breadcrumb trail to follow in your clusters.

@drawks Please refrain from such negativity on Consul issues and pull requests. It is not an effective way to raise attention on the functionality you want. In the future, you can bring attention to issues that affect you by offering clear bug reports and support request with your use case and pain points. I assure you we can and do take user asks into account when planning features.

Closing as implemented

@mkcp mkcp closed this as completed Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

5 participants