You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when running our python model tests with GHA, we run multiple tests at the same time. With Dataproc(Cluster or serverless), tests would fail due to underlying infra is overloaded. See this as an example.
We should skip python model tests in normal workflows, but create a scheduled run to run python model tests everyday so that we can still catch regression
When turning the test on, we also need to include the later added PySpark Dataframe Test(dbt-labs/dbt-core#5906)
The text was updated successfully, but these errors were encountered:
More color on the test fail due to Dataproc not scalable enough.
There are two ways to submit dataproc job: Cluster vs Serverless, for Cluster we run a always on cluster, for serverless GCP would spin up a short lived server to just run one job.
When we submit too much jobs together to the Dataproc always on cluster, we run into the risk of jobs on that cluster stuck and not finishing
When we submit too many serverless job request, we will hit the rate limit for serverless jobs, so that python model will fail(Yes we don't have retry logic built in here)
We've looked into a few ways of doing this (other than scheduling) and didn't really come up with a good work around. Even if we schedule this, we would still want to check it on PRs. I think some of the retry work addresses the hung processes. So I'm closing this for now.
Currently when running our python model tests with GHA, we run multiple tests at the same time. With Dataproc(Cluster or serverless), tests would fail due to underlying infra is overloaded. See this as an example.
We should skip python model tests in normal workflows, but create a scheduled run to run python model tests everyday so that we can still catch regression
When turning the test on, we also need to include the later added PySpark Dataframe Test(dbt-labs/dbt-core#5906)
The text was updated successfully, but these errors were encountered: