ADAP-25 Run python model test based on schedule #306

ChenyuLInx · 2022-09-12T22:25:11Z

Currently when running our python model tests with GHA, we run multiple tests at the same time. With Dataproc(Cluster or serverless), tests would fail due to underlying infra is overloaded. See this as an example.

We should skip python model tests in normal workflows, but create a scheduled run to run python model tests everyday so that we can still catch regression

When turning the test on, we also need to include the later added PySpark Dataframe Test(dbt-labs/dbt-core#5906)

ChenyuLInx · 2023-03-09T05:03:00Z

More color on the test fail due to Dataproc not scalable enough.
There are two ways to submit dataproc job: Cluster vs Serverless, for Cluster we run a always on cluster, for serverless GCP would spin up a short lived server to just run one job.

When we submit too much jobs together to the Dataproc always on cluster, we run into the risk of jobs on that cluster stuck and not finishing
When we submit too many serverless job request, we will hit the rate limit for serverless jobs, so that python model will fail(Yes we don't have retry logic built in here)

mikealfare · 2024-12-13T22:30:25Z

We've looked into a few ways of doing this (other than scheduling) and didn't really come up with a good work around. Even if we schedule this, we would still want to check it on PRs. I think some of the retry work addresses the hung processes. So I'm closing this for now.

ChenyuLInx added the python_models label Sep 12, 2022

github-actions bot changed the title ~~Run python model test based on schedule~~ [CT-1156] Run python model test based on schedule Sep 12, 2022

leahwicz added the tech_debt label Sep 15, 2022

mikealfare changed the title ~~[CT-1156] Run python model test based on schedule~~ ADAP-25 Run python model test based on schedule Mar 6, 2023

mikealfare added enhancement New feature or request tech_debt support and removed tech_debt enhancement New feature or request labels Feb 13, 2024

mikealfare removed the support label Jul 17, 2024

mikealfare closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADAP-25 Run python model test based on schedule #306

ADAP-25 Run python model test based on schedule #306

ChenyuLInx commented Sep 12, 2022 •

edited

Loading

ChenyuLInx commented Mar 9, 2023

mikealfare commented Dec 13, 2024

ADAP-25 Run python model test based on schedule #306

ADAP-25 Run python model test based on schedule #306

Comments

ChenyuLInx commented Sep 12, 2022 • edited Loading

ChenyuLInx commented Mar 9, 2023

mikealfare commented Dec 13, 2024

ChenyuLInx commented Sep 12, 2022 •

edited

Loading