-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-produce various issues #2240
Comments
I think it should be possible to run it with |
And how you generate workload - so we test the same. I will try to run it with Strimzi. |
Related to #2202 |
To send events to your Kafka instance topic: Clone this repo https://github.com/steven0711dong/KafkaProducer and the readme file contains instructions on how to send events Were you able to utilize this? |
I have got this setup ready to test with Strimzi topic with 30 partitions: https://github.com/aslom/repdroduce-kafka-source I am now working to get the Kafka producer to run with Strimzi setup |
@steven0711dong what parameters do you use for KafkaProducer when testing different scenarios? I got basic to work with Strimzi:
|
It works now end-to-end with setup and kafka workoad job by simply doing
and we can add more scenarios such as |
@aslom The KafkaProducer only sends events, if you want to change the number of events sent, you can modify the EVENTCT. It will then sends events in async manner. Does this answer your question? |
Test setup to re-produce duplicate issue: 1. use a topic of 50 partitions 2. set the sink delay to 10 seconds or above 6 seconds 3. send 5000 events |
For the sink delay to 10 seconds environment variable delay set to 10? To send 5000 events EVENTCT? I have duplicates scenario for with those parameeters: https://github.com/aslom/repdroduce-kafka-source/tree/main/duplicates1 Run with
|
So far not seeing duplicates however my kafka source setup is a bit behind main and maybe missing configmap parmeters @steven0711dong ? |
steven0711dong@d8e4773 <-- Steven's configmap changes |
running tests wiht latest gtihub main branch without rate-limiting enabled - the setup used and how to run https://github.com/aslom/repdroduce-kafka-source/tree/main/duplicates1
got
and in logs:
|
I did second run now getting those errors and lot of duplictes
It seems all events were consumed
|
Investigating strange problem with Strimzi - I wonder if anybody saw anythign like that before? see delete topic and then topic is still there:
or
and its status looks ok:
|
Using new topic name (topic50a) with replication factor 3 (Which is default) and then everything works fine getting 50 partitions ....
only problem is that data plane seems ot be stuck and delivers no events - nothing is read from any of those partitions
|
One strange things is that the last message in log is about not finding event-display and then logs is not moving for last 15 minutes - nothing more in log
|
Thanks for the follow-up, @aslom can you please attach all the logs from start to finish and remove serving from the reproducer [1] since it just adds noise and I cannot run it? |
Updated reproduce-kafka-source to run eventdisplay as k8s svc and runnign now tests with the latest code updated from main branch and deployed with |
Initially I was getting 5 concurrent connections to sink After running
Kafka source
I will upload logs when finished |
Started to see
So far
|
And here snapshot of consumer group form kafka broker:
|
Then I scaled to what I think is maximum for 50 partitions
|
Fir short time seeing 15 concurrent connections and then back to 10
It seems rebalancing can take some time
|
Interestingly receiving events as seen in eventdisplay log even when rebalancing is happening. |
Seeing also new exception
|
After rebalancing finished
|
Short summary: it took about 2 hours (started about 2022-06-09T15:57 UTC finished about 18:00) to process to send 5000 events. If all 50 partitions were used with 50 outbound connections it would be about 20 minutes (with delay 10 seconds it is about 5 events/second throughput so 1000 seconds about 16 min + some time for re-balancing and initial scaling). All events were received but almost 80% were duplicates:
I have uploaded logs into this git repo: https://github.com/aslom/reproduce-kafka-source-results/tree/main/duplicates1-run1 @steven0711dong @aavarghese do you see similiar results? All events were processed:
|
Can we monitor what the scheduler is doing? |
I scaled up twice there were two rebalances - started test about 11:55am finished about 2pm
I should be able to install and run old golang source so we have data to compare. |
@aslom @pierDipi Scheduler logs (plus autoscaler) are all in the controller logs. Grepping for these two statements (especially the second statement
|
It looks reasonable and what I expected - anything else to look related to Kafka rebalancing?
|
Described how reproduce results I was getting: https://github.com/aslom/reproduce-kafka-source-results/blob/main/duplicates1-run1/README.md |
Doing new run based on Ansu setup https://github.com/aslom/reproduce-kafka-source-results/blob/main/duplicates1-run2/README.md |
For record here is lag-offsets I see for partitions during test
|
Anybody else seeing - I see lot of those all time ...
|
When starting test
Then I did scaling
Towards the end of test
|
Towards end seems imbalance increased and got lot of duplicates:
|
Towards very end
|
Finished duplicates1-run2. Interestingly there is still small LAG but all events were delivered:
Logs: https://github.com/aslom/reproduce-kafka-source-results/tree/main/duplicates1-run2 |
10 minutes later LAGs were still there
|
I have re-run tests with updated config and setting sink delay to 1 second (sink sleep for one second before replying). I ran tests twice each time sending 5000 messages and there no duplicates:
The test took about 2 minutes. There was no scaling so no re-balancing. Logs and details of configuration (rate-limiting, max.partition.fetch.bytes, auto.commit.interval.ms) used in https://github.com/aslom/reproduce-kafka-source-results/tree/main/duplicates2-run1 Partition were consumed fast:
|
I re-rerun original test with sink delay 10 seconds and got:
It took 42 minutes to send 5000 messages with sink delay 10 seconds and 50 partitions. I did two scaling up. First about 1 minute into test:
And then few minutes later scaled again:
Logs are in https://github.com/aslom/reproduce-kafka-source-results/tree/main/duplicates1-run3 Here is what I saw as far receiving messages after scaling:
And I observed that partitions were not consumed with the same speed - middle of test:
toward end of test:
|
Looking into logs I saw that exception
|
Other interesting exception
|
@aavarghese @pierDipi Short summary results so far: when using current code configured with rate-limiting, CooperativestickyAssignor and change max.partition.fetch.bytes and auto.commit
Compare: one month ago #2240 (comment) with current results #2240 (comment) |
Here #1417 is an actual go test. if you see the report in https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative-sandbox_eventing-kafka-broker/1417/reconciler-tests_eventing-kafka-broker_main/1549420232664158208, something interesting is that the metric |
This issue is stale because it has been open for 90 days with no |
Create a service that collects events from sink and make sure you set the maximum and minimum pod count to 1. User stevendongatibm/scraper:2 as the image or build your own by modifying using https://github.com/steven0711dong/KafkaScraper, you can build using ko.
If you are using ibmcloud code engine
Here, I'm creating the service as a ksvc but the image should work with regular service as well. Get the eventscollector service address from the above, it will be used when creating sink.
Create the sink by following the instruction from https://github.com/steven0711dong/customizeEventDisplay.git
I am using ibmcloud code engine and created the service as a ksvc but it should work with regular k8s service as well.
See the explanation for each env vars in https://github.com/steven0711dong/customizeEventDisplay.git
After you have these 2 services, you can create a Kafka source and reference the sink you created, I recommend to use a non strimzi Kafka instance and use a topic of more than 30 partitions.
When you deploy Kafka source dispatcher, you should tweak the following config variables.
To send events to your Kafka instance topic:
Clone this repo https://github.com/steven0711dong/KafkaProducer and the readme file contains instructions on how to send events
The text was updated successfully, but these errors were encountered: