-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka consumers disconnect after a time #20
Comments
I decided to reduce the TTL to 10 minutes instead and see what happens. Here are our logs:
So in short we see success messages pop up roughly every 10m, but at the one hour mark something fails irrecoverably. I currently assume this is some AWS session time limit that is set at 1h? |
I dug a tad deeper; I see the following chain of events: This Kafka on the other hand has code in
In other words: If the principal were to change then reauth would be rejected from Kafka side. So I started digging a bit how to get the AWS SDK to not give me different principal names when requesting sessions.
I will try setting |
Setting I feel there might be other users like me running around that do not have this variable set. |
I'm also running into this. I get this @jmaver-plume I would greatly appreciate any guidance you might have here. How I understand @florian-besser's analysis, this is happening because of logic inside of the SASL server. So setting this environment variable makes it so that the session name will always be the same, side stepping this particular exception. It almost feels like this library should accept this as a parameter (possibly even generate a static unique default value?) if possible, or at the very least document you should set it as an environment variable like @florian-besser mentioned. |
I have this same issue, though |
AWS_ROLE_SESSION_NAME worked for me aswell, it should be added to the mechanism |
@florian-besser
|
@anilspydra did you check the security group? |
|
@anilspydra I mean, a timeout could be the security group of the MSK cluster, make sure to add port 9098 to it, and set your brokers env var or value to broker-1:9098, since 9096 is for SCRAM. |
@jfr992 |
@anilspydra (https://docs.aws.amazon.com/msk/latest/developerguide/port-info.html), in that case I dont know, something is not letting the connection go trough. |
yeah that is for private endpoint, if the cluster is made to public the port is showing 9198 |
@jfr992 |
Yes, please create a PR. |
@florian-besser @jmaver-plume can you please tell where and how to set |
@cion-yatindra it should be an env var, you can set it if you are running the code in a container. In a local env it can be set with : export AWS_ROLE_SESSION_NAME='session-name' |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hey, Is there any detailed explanation why the AWS_ROLE_SESSION_NAME is solves the issue? I had the same, and now it works fine, but I need an exact answer why it is the only way. Thank you for your help in advance! |
@akospaska I believe I provided more details in my first comment: #20 (comment) If there is anything unclear in there I can gladly help point it out; the comment dives into the internals of the Kafka codebase as well as AWS quirks. |
Hell @florian-besser , First of all Iam super glad you have commented :-) We want to use this solution in production environment, I just want to gather every information why the client has to set the AWS_ROLE_SESSION_NAME to a static value. As far as i saw the fromNodeProviderChain calls in our case the fromWebToken. Which is not clear why we receives the principal error message. If we don't set an AWS_ROLE_SESSION_NAME the fromWebToken will set that env var? I am just looking a clear explanation what I can provide for the other teams why we have to set the AWS_ROLE_SESSION_NAME environment variable. Thank you for your help in advance! |
@akospaska I believe my comment answered that:
In short To get static |
We are using the new MSK IAM mechanism, and when we start our app it connects correctly and consumes messages successfully.
After a time it prints the following log message and disconnects from the consumer group:
I figure this is due to some permissions issue. Our app runs in a K8s environment and uses a
serviceaccount
. This is the policy to assume roles:The actual role then has the following permissions:
As the app correctly can consume messages on startup I assume the permissions are OK at first. Are there additional permissions needed for this re-authentication flow? If so do you have some documentation showing what is needed?
Lastly, this is our client code:
The text was updated successfully, but these errors were encountered: