Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAP-25 Run python model test based on schedule #306

Closed
ChenyuLInx opened this issue Sep 12, 2022 · 2 comments
Closed

ADAP-25 Run python model test based on schedule #306

ChenyuLInx opened this issue Sep 12, 2022 · 2 comments

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Sep 12, 2022

Currently when running our python model tests with GHA, we run multiple tests at the same time. With Dataproc(Cluster or serverless), tests would fail due to underlying infra is overloaded. See this as an example.

We should skip python model tests in normal workflows, but create a scheduled run to run python model tests everyday so that we can still catch regression

When turning the test on, we also need to include the later added PySpark Dataframe Test(dbt-labs/dbt-core#5906)

@github-actions github-actions bot changed the title Run python model test based on schedule [CT-1156] Run python model test based on schedule Sep 12, 2022
@mikealfare mikealfare changed the title [CT-1156] Run python model test based on schedule ADAP-25 Run python model test based on schedule Mar 6, 2023
@ChenyuLInx
Copy link
Contributor Author

More color on the test fail due to Dataproc not scalable enough.
There are two ways to submit dataproc job: Cluster vs Serverless, for Cluster we run a always on cluster, for serverless GCP would spin up a short lived server to just run one job.

  • When we submit too much jobs together to the Dataproc always on cluster, we run into the risk of jobs on that cluster stuck and not finishing
  • When we submit too many serverless job request, we will hit the rate limit for serverless jobs, so that python model will fail(Yes we don't have retry logic built in here)

@mikealfare mikealfare added enhancement New feature or request tech_debt support and removed tech_debt enhancement New feature or request labels Feb 13, 2024
@mikealfare
Copy link
Contributor

We've looked into a few ways of doing this (other than scheduling) and didn't really come up with a good work around. Even if we schedule this, we would still want to check it on PRs. I think some of the retry work addresses the hung processes. So I'm closing this for now.

@mikealfare mikealfare closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants