-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase consumer group test timeout #187
base: master
Are you sure you want to change the base?
Increase consumer group test timeout #187
Conversation
In my opinion, it's slower than it used to be. If you re-run the tests, it eventually passes but it's still strange. I'm suspecting something internally with GitHub's CI changed but I need to review things further. |
I started to observe this behavior after merging #184 but I can't imagine anything in there being causally related. Maybe I should rethink how brokers are spun up for each CI test. |
Test failed again after increasing the timeout. This time I reverted |
Hmm, seems like the revert didn't fix it. Thanks for checking, though. I'm baffled this is now an issue. |
I have a suspicion that the test failure might be related to the CPU time available in github workflow. So I have started a Ubuntu 24.04 VM on GCP using VM type n2d-standard-4 (4 vCPU, 2 core, 16G RAM), all tests can pass (master branch). But if I limit the CPU to 0.1 core (using cgroup), the following tests will fail.
I am still investigating but I don't have enough evidence at the moment. I also started testing using e2-micro (0.25~2 vCPU, 1 shared core, 1G RAM), will update result here later. |
n2d-standard-4 limit to 1 core
n2d-standard-4 limit to 0.5 core: passed twice e2-micro 1st time
e2-micro 2nd time
Most of them raised Result: Inconclusive. I'm not familiar with tox but Is there a way to run a single tests? I tried something like |
I am using this PR as a testing ground, but I have submitted #192, I assume it will fix part of the problem. |
This PR still fail because |
I haven't figure out why java dies in I guess that GitHub runner may not have enough memory, and I can reproduce slowliness on resource constrained VMs when there are too many Kafka running. |
I agree. My goal has been to run one Kafka instance per test in order to conserve memory, as a solution. I was planning on tinkering with concurrency groups in https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-concurrency to improve this issue. |
This reverts commit f76b6d4.
7d8bfac
to
1608037
Compare
^ I cannot reproduce the timeout in my environment (Kafka 0.8.2.2, Python 3.12, after test/test_partitioner.py, before test/test_producer.py). I have updated this branch to trigger the test to find out... |
Thank you for your meticulous investigation, I really do appreciate it. It's been troubling me as to why it's been an issue the past month. Possibly Microsoft scaling down runner resources as a cost-cutting measure? |
No worries. I am not sure. On paper, public repo runner have plenty of resource. But I am not sure what's behind the scene, and I never tried to benchmark it... Note: |
Test
test/test_consumer_group.py::test_group
failed. Increase timeout to find out if it is just slow or real failure.