Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default thread timeout too short for CI #1339

Closed
sroet opened this issue Sep 10, 2021 · 6 comments · Fixed by #1340
Closed

default thread timeout too short for CI #1339

sroet opened this issue Sep 10, 2021 · 6 comments · Fixed by #1340

Comments

@sroet
Copy link
Contributor

sroet commented Sep 10, 2021

Since this morning our CI started failing with:

Run python devtools/autorelease_check.py
Traceback (most recent call last):
  File "devtools/autorelease_check.py", line 24, in <module>
    repo_path='.'
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/autorelease/check_runners.py", line 52, in __init__
    repo_path=repo_path
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/autorelease/git_repo_checks.py", line 28, in __init__
    self.repo.remotes.origin.fetch()
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/git/remote.py", line 856, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/git/remote.py", line 727, in _get_fetch_info_from_stderr
    handle_process_output(proc, None, progress_handler, finalizer=None, decode_streams=False)
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/git/cmd.py", line 151, in handle_process_output
    raise RuntimeError(f"Thread join() timed out in cmd.handle_process_output(). Timeout={timeout} seconds")
RuntimeError: Thread join() timed out in cmd.handle_process_output(). Timeout=10.0 seconds
Fatal Python error: could not acquire lock for <_io.BufferedReader name=4> at interpreter shutdown, possibly due to daemon threads

Thread 0x00007fbb2f2d7700 (most recent call first):
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/git/cmd.py", line 103 in pump_stream
  File "/usr/share/miniconda/envs/test/lib/python3.7/threading.py", line 870 in run
  File "/usr/share/miniconda/envs/test/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/share/miniconda/envs/test/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007fbb3332d180 (most recent call first):
  File "/usr/share/miniconda/envs/test/lib/python3.7/site-packages/git/cmd.py", line 406 in __del__
/home/runner/work/_temp/ff824c06-ff56-4ee4-9e2a-e65258688cf4.sh: line 1:  1922 Aborted                 (core dumped) python devtools/autorelease_check.py
Error: Process completed with exit code 134.

This seems a side effect of the timeout introduced by #1318

Could this time-out default to 60 s instead (or be taken and propagated by the fetch function)?

I could make a PR if needed, with either of the two solutions

@sroet
Copy link
Contributor Author

sroet commented Sep 10, 2021

cc: @dwhswenson

@Byron
Copy link
Member

Byron commented Sep 10, 2021

Thanks for reporting and sorry for the hassle. Does that timeout not also means that clones or fetches can't take longer than that before assuming a hang? CC @Yobmod

@Byron
Copy link
Member

Byron commented Sep 10, 2021

@sroet A PR would definitely be welcome, even though it would be good to understand it fully first.

@LasseMenzel
Copy link

LasseMenzel commented Sep 10, 2021

Hi, same on my CI-System, where fetches takes long time and 10 sec time-out are definitive to short. A default value of 60 sec for time-out are also fine for me.

@dwhswenson
Copy link

The failure @sroet linked to in OpenPathSampling is raised from within Autorelease, which includes a tool to test that the version listed in the code is reasonable based on the versions implied by tags in the repo. Autorelease uses GitPython (thanks for a great package for interacting with git from Python!), and the problem here is when it fetches the upstream to ensure it knows all tags.

I use Autorelease on all my projects, and OpenPathSampling is the only one where this is a problem. I suspect this is because OpenPathSampling is a relatively large repository (both in clone size and in number of commits) -- though there are plenty of repos out there that are larger in each.

In previous passing nightly builds, total time for the "Autorelease check" step in the OpenPathSampling GitHub actions runs varies widely: I've seen as long as 1m27s, but that seems like an outlier. ~15s is standard. For my smaller repos, the "Autorelease check" step takes more like 2-5 seconds.

So a few thoughts:

  • There's probably no default value of timeout that will satisfy all users. If you're working with a sufficiently massive repo, this will be a problem. OpenPathSampling is large, but not an extreme outlier.
  • I see the timeout added in #1318, but I don't see any way for client code like Autorelease to actually make use of this. Either surfacing the the timeout parameter in Remote.fetch (and everything else that uses handle_process_output) or using an approach similar to updating the cmd.execute_kwargs dict would make it easier for me to implement a workaround. (For now, the easy solution is going to be pinning to an old version of GitPython.)
  • Is there a reason that the timeout in handle_process_output would ever be less than execute_kwargs['kill_after_timeout']? Is there a reason not to reuse the value of execute_kwargs['kill_after_timeout'] in handle_process_output? This would default to no timeout, but would give client code a straightforward way to change this without adding confusion due to different kinds of timeouts.

@Byron
Copy link
Member

Byron commented Sep 10, 2021

Thanks for the elaborate explanation and for sharing your thoughts, I found them very insightful and helpful.

[…](thanks for a great package for interacting with git from Python!)[…]

I am not so sure about that anymore. Even though I don't know every possible usage of it, I think that it could have been at its best if it would have stuck to executing git and parsing its output, but do that very, very well. GitPython today is a deeply flawed hybrid which wants to be git, and probably beyond fixable.

For a start, I have yanked this release as the timeout, no matter how long, represents a breaking bug.

Screenshot 2021-09-10 at 22 16 59

It looks like there is a PR by now, maybe @dwhswenson could take a look at #1340 as well. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants