-
Notifications
You must be signed in to change notification settings - Fork 36.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Bump timeouts in feature_index_prune and wallet_importdescriptors #29791
Conversation
Timeout issues where encountered when running functional tests with `--jobs=16 --extended`. Line in `feature_index_prune.py` took 101.6s, 96.6s, 103.0s across 3 runs on my machine, default limit is 60. Line in the `wallet_importdescriptors.py --descriptors` took 5.4s, 5.7s, 6.0s across 3 runs.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
lgtm ACK 49c0b8b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @cbergqvist. The code changes lgtm.
It may be worth investigating why these tests are taking longer to run, and if that is expected or a potential performance regression that went unnoticed until now. I'm assuming that the tests were passing at some point with the existing timeout values (and that those values were chosen for a reason).
I don't have all the history on this, so I could be missing something. Perhaps it's dependent on how the tests are run (e.g. --jobs
chosen), and if that's the case, this PR addresses that.
Might be worth running git blame to find the previous PR that is associated with the previous timeouts to see if there is rationale (a stretch, but worth a quick check).
Thanks for having a look @tdb3! As mentioned in the summary, running without Your questions prompted me to perform the tests on older release branches to see if behavior changed recently. (No other tests failed but I didn't let 27.x - Only 26.x - Only 25.x - Only The fact that these two tests fail so consistently while no other tests fail helped increase my confidence in this change. The major reason
Agree that it might be fruitful to investigate the history of |
@itornaza subtly prompted me to do some tests to find the sweet spot for the number of jobs on my hardware, which luckily helps motivate why running with 16 jobs is not unreasonable. This is wall time running on a RAM disk without
(These happened to be run on |
Ok, this gives me a bit more confidence that this may just be a simple case of resource contention from high concurrency slowing things down. Omitting
Good work gathering data. Since these tests only seem to be failing with high levels of concurrency, I'm ok with an ACK. I don't have the time currently to further performance profile this, but maybe someone else does. ACK for 49c0b8b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 49c0b8b
approach ACK 49c0b8b It is indeed frustrating to watch some functional tests failing by just using 12-16 jobs in parallel. This PR addresses the issue with minimal footprint in the test code. However, the timeout of some tests are even greater on my machine without me understanding the reason yet. Until this issue is resolved, running the extended tests with the default number of jobs seems the best approach to me. On Master branchReplicated the errant behavior with the
and another one not covered in this PR:
Tests on PR Branch
which is a test not in the scope of this PR anyway.
and again the same test that is not a subject of this PR:
Note: All test run on Apple M2 Max chip and macos 14.4.1 (23E224) |
Thanks for trying it out!
Searched closed issues and it seems like others running on Mac are also seeing Edit:
Wait.. the modified tests still fail with the commit in this PR. Hm.. good find! I could understand maybe bumping the first one from 60 s all the way to 200 s (233% increase). But bumping the second one from 10 s to 150 s is more than 10x. :\ |
@itornaza if you have the inclination, it might be helpful if you could provide combined logs for the |
@cbergqvist I re-run today the First run
wallet_importdescriptors log
wallet_transactiontime_rescan log
Second run
wallet_importdescriptors log
wallet_transactiontime_rescan log
feature_index_pruneThis test passed today on both runs on the PR branch, so I only include the log file from the failure I encountered in yesterday's tests (again on the PR branch).
feature_index_prune log file
|
Out of frustration I took slightly different approach and run the functional tests with the support of a big enough ramdisk of 12 GiB inspired by the PR #29335 that was meant for a completely different use case. Not only all the extended tests pass on both the master and PR branches with 16 jobs and even 64 jobs but a 2x speed up is achieved down to 572 seconds. My box has 96 GiB of unified memory so there is clearly some operating systems specifics that make the discussed tests to fail. Under the much smaller but more controlled environment of the ramdisk, everything runs smoothly. On that note I would consider suggesting that the tests are advised to be run with a big enough ramdisk support in the README.md and at least for the mac users. |
cc @m3dwards, as I think you had been seeing these timeouts as well. |
Happy for my first merged PR! :) Post-merge post-mortemWorking stupidly, not smart or hardI have spent too many evenings focusing on this:
In over >110 runs, I have observed how the chance of In hindsight, instead of hoping to get lucky in manually finding the offending commit, I should have been "lazy" and written an automated script to perform a larger number of runs to actually be able to get a large enough sample size to provide confidence. I think the threshold in Build configurationFor posterity, I was building with Clang 17.0.6, C & C++ compiler flags Future directions
|
Timeout issues where encountered when running functional tests with
--jobs=16 --extended
.Note that running
--extended
without--jobs=16
does not trigger the issues.Tested under NixOS on a Xeon CPU with 16 logical cores.
(A few tests are skipped locally as I haven't enabled BPF and a few other things).
Measurements
Line in
feature_index_prune.py
took 101.6s, 96.6s, 103.0s across 3 runs on my machine.Default limit is 60, suggested to increase limit to 150 seconds.
Line in the
wallet_importdescriptors.py --descriptors
took 5.4s, 5.7s, 6.0s across 3 runs.Suggested to increase from 5 to 10 seconds.
Logs
Output slightly modified by separate change that lets code run past given timeouts and the provides more information - "Took 101.6 seconds to complete, 69.4% over the given limit.".
Click to expand.
feature_index_prune.py
wallet_importdescriptors.py --descriptors
Related
Almost identical timeout in
feature_index_prune.py
in #27091 on MacOS, and forwallet_importdescriptors.py --descriptors
in #27282 on Alpine & CI.