-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use git ls-remote to resolve refs for git provider #895
Conversation
Shouldn't unresolved ref default to |
It already defaults to master in the front-end javascript if no ref is specified. |
Test failures seem relevant; also we need to test the new cases. |
If it works as advertised, this is awesome! In the future, it could also mean that we could greatly reduce our needed git provider API support, since all we'd need to do is turn provider-specific-url into git url, which should generally be doable purely in our code and without making any API requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks wow this is a serious upgrade to the UX! Thanks @manics and @hugokerstens for your work on this!
1 - unresolved_ref
renamed for clarity?
I was thinking from the naming of self.unresolved_ref
that we assume this to be an unresolved reference (like a branch name: master, or a tag name: 0.1.0). I were about to suggest we should be able to handle both. But then I noted that you are already doing a check later within a try catch where it is also allowed for this to be a resolved reference:
try:
self.sha1_validate(unresolved_ref)
self.resolved_ref = unresolved_ref
So, perhaps we can name this variable to indicate it not as an unresolved reference, but simply a reference or unparsed reference, provided reference or input reference etc?
2 - a delayed parsing of the provided git reference?
NOTE: These considerations are not important if an additional webrequest isn't needed, and these considerations are also not essential to merge/not merge of this PR.
The __init__
call to the GitRepoProvider does not trigger the parsing of the reference that could be for example "master"
or "0.1.0"
to a resolved reference, being a 40 char string. Instead, it will be made when get_resolved_ref()
is called. Perhaps this is good, perhaps it isn't, I can't tell because I don't understand the usage pattern of RepoProvider objects yet - will this cause a lag that could be prevented, or does it actually prevent a lag? Could we return __init__
but start the fetch of a resolved reference ahead of time? Hmm...
I also figure that if we call get_resolved_ref
multiple times before its return, it may cause multiple invokations of the git ls-remote
command while we only want to do one, I think. Can we make it only call once? Is it important? I don't know fully, but that is stuff I considered when I looked through this code!
To summarize:
- should we async- or sync- invoke the resolution of the reference logic on init, or not do it at all but wait until
get_resolved_ref
explicitly is called later? - should we ensure that only one lookup is done with
git ls-remote
?
binderhub/repoproviders.py
Outdated
self.sha1_validate(unresolved_ref) | ||
self.resolved_ref = unresolved_ref | ||
except ValueError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things here which aren't obvious from reading it a few times so maybe we can add some comments.
Why are we now swallowing exceptions here?
On L196 calling self.sha1_validate(unresolved_ref)
should fail a lot because users can pass in a value like master
so why call it now and not after we resolve the reference?
It seems weird to set self.resolved_ref
to the value of unresolved_ref
. I think we should delay that to when we have a resolved ref (and it it to None
in the meantime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass on an error since we postpone resolving the ref to the getter of resolved_ref
if the ref is not a commit SHA. It makes sense to move this sha1 check to the getter for the resolved_ref
and not set the resolved_ref
at all in the constructor. This is in line with the other repo providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also appreciate managing all logic within the getter (get_resolved_ref
) as there is one. It was confusing to me the first time I read this through that some of it was found there but not all of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the current implementation satisfactory?
The only thing I'm not 100% sure about is setting the resolved_ref
in the else
block. It is a good practice to only have code in the try
block that produces the error you want to catch. However, it could be more readable if the resolved_ref
is set directly after validating it in the try
block. I tried to make it more readable by adding some comments, but still assign it in the else
block.
Co-Authored-By: Tim Head <betatim@gmail.com>
That's indeed a lot more readable. If we move the sha1 check to the getter (as discussed here), renaming it is not needed anymore. We can just speak of an unresolved reference in the constructor, and only bother with validating in the getter for the resolved ref. |
It is this change that led me to consider the variable name in the first place I guess. Previously, a resolved reference was required, but now it cannot be a resolved reference? Well the code is currently meant to support that, but I got confused due to this documentation that led me to think it was no longer supported. Hence, my thinking of changing this documentation snippet and than the variable name felt natural to change along with it. Users must provide a spec of the following form.
- <url-escaped-namespace>/<resolved_ref>
+ <url-escaped-namespace>/<unresolved_ref> |
raise ValueError("The specified branch, tag or commit SHA ('{}') was not found on the remote repository." | ||
.format(self.unresolved_ref)) | ||
resolved_ref = result.stdout.split(None, 1)[0] | ||
self.sha1_validate(resolved_ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is validating the SHA that is returned by git ls-remote
too much? I think this safe-guard is nice to have for unexpected output of git ls-remote
.
I added some extra documentation for the |
This fixes #675 if I'm not misunderstanding something. A great improvement. |
Thanks for bumping this @chicocvenancio. Deployed this locally and tested it with some GitHub and GitLab repositories (pretending they were just git repos). Seems to work 🎉. Merging! Thanks for this cool feature! |
Implements the approach specified in #843.