Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupCoordinator: *.*.*.*:9092: 1 request(s) timed out: disconnect with Azure Backed Service #197

Open
TsuyoshiUshio opened this issue Nov 20, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@TsuyoshiUshio
Copy link
Contributor

The problem here is that Azure network loadbalancing components silently drop idle network connections after 4 minutes.

I upgrade to Confluent 1.5.2. to solve this issue, however, it still remains. It looks solved by 1.6.0-PRE3+.

confluentinc/librdkafka#3109
#193

I can reproduce the issue with EventHubs with 5 minutes delay with KafkaTrigger.
I also make sure the new version solves.

mitigation

We provide pre-release for fixing this issue. This is not the official release, however, you can test if it help to resolve your issue.

https://www.nuget.org/packages/Microsoft.Azure.WebJobs.Extensions.Kafka/3.3.1-PRE1

@TsuyoshiUshio TsuyoshiUshio added bug Something isn't working and removed Needs: triage (functions) labels Nov 20, 2020
@amotl
Copy link

amotl commented Nov 20, 2020

Dear Tsuyoshi,

can you confirm this is really coming from the infamous idle network connection drops by Azure LBs? Have you been able to reproduce it with librdkafka 1.6.0-PRE3 or even 1.6.0-PRE4?

From reading at the librdkafka issue tracker, you might want to run the client with debug=all in order to get more detailed insights.

While I can't say for sure this is related, I am also referencing confluentinc/librdkafka#2739 and confluentinc/librdkafka#2944 here. Please investigate both issues thoroughly and check if you can make any correlations with your observations.

With kind regards,
Andreas.

@TsuyoshiUshio
Copy link
Contributor Author

Thank you for your comment. @amotl . I mean I reproduced with 1.5.2. 1.6.0-PRE4 looks good. How can we confirm the issue happens that you mentioned?

@amotl
Copy link

amotl commented Nov 20, 2020

Dear Tsuyoshi,

ah, I see.

Some of [our] users [tripped into] this issue. However, I can't have a confidence.
How can we confirm the issue happens that you mentioned [in order to gain more confidence]?

I want to apologize that I can't contribute much to your question, with respect to pinpointing to a specific aspect. However, I tried to share more details about our environment and respective observations at #193 (comment).

As outlined there, we have been approaching to mitigate this issue in a trial-and-error manner and just shared our observations.

With kind regards,
Andreas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants