-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: beam_LoadTests_Python_Combine_Dataflow_Streaming failing #23904
Comments
This is a known issue, I believe this test has not been working successfully since almost when it was added. |
Yeah, I found two similar bugs #22692, #22436. I checked the console logs for few jobs in last 15 days and all of them had a same pattern where it is initializing SDK harness and a cancel request is committed within 5 minutes of it. Unfortunately, there is not much information apart from this. Memory and CPU utilization looks alright. |
maybe we need to increase the timeout? |
This is not due to timeout; this is a streaming pipeline it does not end until SyntheticSource has emitted all results (200,000,000). However from Dataflow UI it shows this counter has never reached. Then the pipeline runs to 4 hour until the test task timeout reached. GBK Streaming load test runs a similar pipeline but the pipeline ends properly. Not sure where is the difference. |
Took a look and reached same conclusion, looks like the job gets stuck in Synthetic source. Test configuration: https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_Combine_Python.groovy#L26-L50 For reference: GBK test configuration: beam/.test-infra/jenkins/job_LoadTests_GBK_Python.groovy Lines 82 to 102 in 7d69493
Synthetic source: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/synthetic_pipeline.py |
next step would be to investigate performance of synthetic source with different parameters. Seems like it may fail when running at a bigger scale. |
After many hours of running, we can see active bundles with threads in Synthetic source code:
|
Ok. It's a resource starvation issue. The job struggles for hours to fit in memory for hours, but doesn't oom. Takes ~10 minutes on n1-highmem-2. |
actually, that's not right. Turns out, I ran a batch job, and it finished fast, but running this test in streaming mode results in poor performance and memory pressure. |
Bump to using 50 workers the test passed. It tokes 2 h to run. Throughput is like this: input/output PCollection of GBK: However, using 5 worker the test isn't a matter of not finish in time, the pipeline just stucks after some time: input/output PCollection of GBK: and there were worker crash happened throughout the pipeline ran: number of worker: memory usage: In summary, what happens is
|
THe memory profile (pipeline option
or
or
|
What happened?
Jenkins test has been failing since October
Issue Priority
Priority: 1
Issue Component
Component: test-failures
The text was updated successfully, but these errors were encountered: