-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement pygit2 backend for fetch_refspec #169
implement pygit2 backend for fetch_refspec #169
Conversation
Setting the default backend is controlled here: scmrepo/src/scmrepo/git/__init__.py Lines 39 to 41 in 339fdba
But what we probably need is a way to just force preferring a specific backend in our scmrepo/src/scmrepo/git/__init__.py Line 282 in 339fdba
so we could do something like
which would force rotating the backends list until |
Instead of a wrapper, we could also use it directly as following in dvc: scm.pygit2.fetch_refspecs(...) |
I had considered before, but the problem is that the backends will be determined the first time we call it. So if we want to make sure a function using a special backend, we need to add this prefer to all place we call it. So maybe it's better to add the argument in scmrepo/src/scmrepo/git/__init__.py Line 282 in 339fdba
|
Isn't using it directly also an issue with backend switching? We did have issues in the past related to that (iterative/dvc#5641, iterative/dvc#7458) |
Using the backends directly can cause issues due to caching of the git index, but fetch does not touch the git index at all so it shouldn't be a problem here |
Does this work on Windows? AFAIK there is no ssh support in pygit2 for Windows? |
0a484f0
to
ec6f5ec
Compare
Codecov ReportBase: 82.29% // Head: 81.36% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #169 +/- ##
==========================================
- Coverage 82.29% 81.36% -0.94%
==========================================
Files 25 25
Lines 3383 3466 +83
Branches 583 601 +18
==========================================
+ Hits 2784 2820 +36
- Misses 517 547 +30
- Partials 82 99 +17
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
d08d198
to
c5607b1
Compare
From the test result look like it works for Windows. |
src/scmrepo/git/backend/pygit2.py
Outdated
except KeyError: | ||
raise SCMError(f"'{url}' is not a valid Git remote or URL") | ||
|
||
if os.name == "nt" and remote.url.startswith("file://"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if should I put this code here or in the tests? Looks like pygit2
doesn't support this format on the Windows platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should support it https://github.com/libgit2/libgit2/blob/1327dbcf2a4273a8ba6fd978db5f0882530af94d/src/libgit2/transport.c#L35
But the test result is that on Mac or Linux it supports it, but not on Windows. I guess it's a bug of pygit2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that libgit2 or pygit2 expect windows slashes instead of posix ones after the file://
protocol, or it might be that the file://
prefix is supposed to be stripped as you are doing here. But in regular CLI git there is actually different behavior depending on whether you provide a file://
URL or a raw local path (without file://
), so we should double check what the problem is.
Please try to take a look into what pygit2 is doing in this situation and see if there is anything we need to fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadn't found any issues related to it. Let me create a issue for them libgit2/pygit2#1187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could still be due to not creating a proper Remote()
object for the raw URL. As we discussed, we should be using git_remote_create_anonymous
to generate our remote here.
remote_name = "fetch"
self.repo.remotes.set_url(remote_name, url)
remote = self.repo.remotes[remote_name]
This is likely going to cause undefined behavior because we are trying to call set_url
on a remote that doesn't actually exist (unless the user has already defined their own remote named fetch
, in which case we are incorrectly modifying an existing git remote instead of creating our own temporary one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like git_remote_create_anonymous is not implemented in pygit2
, I just made a temporary solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as we discussed though we should really consider adding it to pygit
libgit2 supports ssh but only when it is compiled with libssh2. The libgit2 build bundled with the pygit2 wheels does not include libssh2, so We should be checking to see if we are passed an ssh URL and then raise NotImpelementedError in pygit2 fetch_refspecs for ssh. |
38db2c2
to
5a505bb
Compare
src/scmrepo/git/backend/pygit2.py
Outdated
remote_name = uuid() | ||
self.repo.remotes.set_url(remote_name, url) | ||
try: | ||
yield self.repo.remotes[remote_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has the same issue where you will call set_url
on a remote that doesn't actually exist which will likely end up causing errors and/or undefined behavior in the underlying libgit C library, especially when you try to delete it later.
If you are going to implement it this way and create a full named remote (instead of adding the anonymous remote function call to pygit), you need to be using self.repo.remotes.create()
here
src/scmrepo/git/__init__.py
Outdated
func = getattr(backend, name) | ||
result = func(*args, **kwargs) | ||
self._last_backend = key | ||
self.backends.move_to_end(key, last=False) | ||
return result | ||
except NotImplementedError: | ||
except (NotImplementedError, KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What KeyError
is being hit here? If someone tries to call this function with backends=[...]
that contains an invalid backend we should really be raising that exception and not catching/ignoring it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's related to the tests when we initialize the backends
with selected
argument, I had put the logic into __getitem__
to raise different exceptions for the nonexistent backend names and the not initialized backend names.
08a1cf7
to
cd64c82
Compare
if os.name == "nt" and remote.url.startswith("file://"): | ||
url = remote.url[len("file://") :] | ||
self.repo.remotes.set_url(remote.name, url) | ||
remote = self.repo.remotes[remote.name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed even after the remotes.create()
change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it still fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this change in the PR then, but we probably need to open another issue for it in scmrepo (this likely means that a regular git remote with file://
URL in it is also broken for us on windows). This may still be related to us not setting the path separators or formatting the URL in a way that libgit2 expects and probably needs some more investigation.
24d1bf0
to
cd64c82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karajan1001 please open scmrepo issues for the pygit follow ups, we don't have to do them right away but it would be good for us to handle them whenever we have the bandwidth
- expose
git_remote_create_anonymous
inpygit2.Remote.create_anonymous
- investigate windows
file://
behavior in pygit/libgit
fix iterative#168 1. Add order select for `_backend_func`. 2. Raise exception for fetch_refspec for ssh:// repo on Windows. 3. Add order select for _backend_func
cd64c82
to
702fabd
Compare
@karajan1001 is this waiting on anything else or can it be merged? |
I'll do some final check today together with DVC. |
implement pygit2 backend for fetch_refspec
fix #168
Compare to the
cprofile
in iterative/dvc#8477.The new greatly reduced the time cost in fetching.
Making the total
dvc exp show
query finished in 1s. And brings a much smoother performance in vs-code extension. And also solves the hanging problem in iterative/dvc#87622023-01-17.18-57-45.mp4
But we still had one problem remaining to solve, how to make
scmrepo
use thepygit2
as the default backend. In the test and the demo, I simply raiseNotImplementError
in Dulwich to disable it.