-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows CI test is extremely long (~3 hours) #8696
Comments
From logs in https://github.com/apache/arrow-datafusion/actions/runs/7358744456/job/20032462383
Seems would be a good start point for investigating this, as accounts for almost half that runtime. Skimming through the logs, can't see any other part that significantly contributes to the runtime Edit: might be an issue with the windows runner itself, see actions/runner-images#9085 First started occurring here: https://github.com/apache/arrow-datafusion/actions/runs/7350344666 But don't think it's issue on datafusion side, considering contents of that PR Edit2: Some other references:
Seem to suggest switching to Edit3: for comparison, the CI used to take no more than 30min |
CI is still taking extremely long after switching to windows-2019 runner |
Some debugging thoughts: Due to image change Unlikely. 27min runtime: https://github.com/apache/arrow-datafusion/actions/runs/7347591698/job/20004278779#step:1:1
3hr 20min run: https://github.com/apache/arrow-datafusion/actions/runs/7351057731/job/20013780063#step:1:1
Runner image versions are the exact same. Some regression in Datafusion code Probably unlikely? I don't have a Windows machine to test to be certain that it's not only CI. Given example run: https://github.com/apache/arrow-datafusion/actions/runs/7390088229/job/20104324232
The tests that have extreme runtime are:
Rest of the run doesn't have any major anomalies, so these are the offending tests. Some other runner issue Not sure how to debug this, might need to raise another issue with runner repo? Edit: pretty certain its not due to code change on DataFusion side, as this PR run was the first to exhibit >3hr win runtime: https://github.com/apache/arrow-datafusion/actions/runs/7349660365/job/20009991842 And it happens independently of the next commit to main which also exhibits slowness |
Actually, another thing that changed was Rust version from 1.74.1 to 1.75.0 (which also aligned exactly with the slowdown) Testing switching back to 1.74.1 to see if this is the root cause: #8728 |
@Jefffrey thanks for creating this, do you have windows machine to run tests locally? |
I don't have access to a Windows machine easily unfortunately. Can try get access to one if necessary, will see how latest CI test pans out |
I'm also trying to find a solution in #8730 |
good finding, on 1.75.0 even the compilation is way slower |
After a bunch of debugging, I found this: Tests run fast on toolchain: nightly-2023-10-29 (rust e5cfc5547)
Tests run extremely slow on toolchain: nightly-2023-10-30 (rust 608e9682f)
I might raise an issue on rust lang repo, after checking if a similar issue hasn't already been raised
|
Raised issue on Rust repo: rust-lang/rust#119560 |
Describe the bug
Some examples:
To Reproduce
No response
Expected behavior
No response
Additional context
Not sure if it was like this in the past too? But a 3hr runtime seems odd
The text was updated successfully, but these errors were encountered: