-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better logging of ACL token misusage #4706
Comments
@jakubdyszkiewicz Currently there isn't a good way to debug which agent this is coming from. The current ACL system doesn't assign any identifier to ACL tokens other than the secret and for security reasons we cannot log or include that in telemetry. That log is emitted when a server is forwarding an RPC to another DC so the IP addresses in it don't help much. The only place to figure this out would be the logs of each consul node. The error while logged on the server that is forwarding the RPC will also be propagated back to the node that originated the RPC in the first place. This is not a particularly good solution but its all that can be done now. I have been working on a new version of Consul ACLs which should address many of the usability issues. I think the best fix here would be to simply log the IP address/ node name of where the RPC came from. |
#5319 is also related and has a similar ask:
|
I have a little more context to add to this issue. The new ACL system assigns accessor ids to tokens which we could use for logging. For permission denied errors or in the case where we filter results we could include the accessor id of the token used in the request. However in the ACL not found case the only data we have is the tokens secret which doesn't match any currently valid token. Secret data cannot go into the logs, so in these cases we really need to log some network information about where the request came from. Currently our RPC infrastructure uses the builtin golang So the final fix for this should contain two things:
|
This ticket is now old enough to eat solid food, can we perhaps get some traction on resolving this most fundamental ask. I'm currently staring at consul logs and still am unable to actually come up with an actionable task based on ACL denies. This is beyond unreasonable that a service intended to be central to modern infrastructure would have an internal blind spot this bad for this long. |
I believe this ticket is resolved by #7117, implementing support asks in #7010. I hope that this indeed helps operators by giving you all a clear breadcrumb trail to follow in your clusters. @drawks Please refrain from such negativity on Consul issues and pull requests. It is not an effective way to raise attention on the functionality you want. In the future, you can bring attention to issues that affect you by offering clear bug reports and support request with your use case and pain points. I assure you we can and do take user asks into account when planning features. Closing as implemented |
Hello,
we've got a cluster of 1k+ consul nodes on our production environment with ACLs enabled. Today, one of the nodes started sending requests with invalid ACL token due to misconfiguration.
All we could see in master logs were those messages
(XX.XX.XXX.X:XXXX - IP of the master node)
How should we debug which node is sending requests with invalid ACL?
We don't keep logs from our agents centralised, because it's just too much information.
It would be nice to see in logs which host is causing a problem.
The text was updated successfully, but these errors were encountered: