-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteration on parseHostSpec refactor #4954
Conversation
@KnVerey: This PR has multiple commits, and the default merge method is: merge. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: KnVerey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
api/internal/git/repospec.go
Outdated
return scheme, n | ||
} | ||
|
||
username, n := parseUsername(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not always immediately obvious how n
is changing throughout this code, so some communication that parts of n are being removed as we parse through it may be helpful. Wydt about renaming this to trimUsername
(and renaming the above parseScheme
to trimScheme
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the existing occurrences of Trim[Thing]
in this file, I think we have three operations we can distinguish:
TrimThing(input string)(rest string)
(orTrimThing(input string)(rest string, trimmed bool)
) -- "trim" is used for this in stdlib functions too, likestrings.TrimPrefix
, which also removes a known quantity and returns the result (not the thing being trimmed)ParseThing(input string)(thing string)
-- returns the target without reference to the remainder of the input??Thing(input string)(thing string, rest string)
<-- this case. Since "trim" usually removes the thing in question without returning it, I'm leaning towards a third verb here. I'm thinking "extract" but I'm not set on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract sgtm!
if strings.HasPrefix(scheme, "ssh://") || username == gitUsername { | ||
return normalizedSCPGithub | ||
} | ||
return normalizedHTTPGithub | ||
} | ||
|
||
func normalizeGitHostSpec(host string) string { | ||
s := strings.ToLower(host) | ||
if strings.Contains(s, "github.com") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One changed code path is that gh:
could previously end up here, so gh:github.com/org/repo/foo
would get normalized... but that's non-sensical based on Anna's comment that gh:
is assumed to be a gitconfig shorthand that gets swapped for the github URL, right? (same with file://
, which is definitely non-sensical)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote that comment to the best of my knowledge. I couldn't find any official documentation for gh:
on the internet, other than Github CLI. I also tried looking at code history to no avail. The only sites it popped up on denoted a .gitconfig
shorthand: https://gist.github.com/lauhakari/a48a94b60dc66f6c84e2#file-gitconfig-L143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. What a mystery! I tried looking at the original PR too, and it doesn't help much... but it definitely shows the "url" is expected to look like gh:org/repo
. So I think it is reasonable to consider gh:github.com
non-sensical.
Alternatively, maybe we can remove support for this. Your research indicates this is not a standard, and we do not have it documented anywhere. Furthermore, I don't see ANY usage of it in public Kustomizations on Github: https://github.com/search?q=%22gh%22+filename%3AKustomization.yaml&type=Code&ref=advsearch&l=&l=.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be down to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code paths are so easy to read! Please feel free to ignore the nits.
api/internal/git/repospec.go
Outdated
} | ||
|
||
func findPathSeparator(hostPath string, isSCP bool) int { | ||
if strings.Contains(hostPath, "://") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small chance that the url will contain ://
, but not to denote the scheme. According to RFC 3986, uri paths can contain :
. The //
could be delimit the path inside the repo or maybe it's an empty path segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. This was previously handled by gating all the path parsing blocks on having already parsed a recognized scheme. I'll try validating the username+scheme combo ahead of this function so that it won't need this weird check.
api/internal/git/repospec.go
Outdated
return normalizedGithubHost(scheme, username), rest | ||
} | ||
|
||
sepIndex := findPathSeparator(n, username == gitUsername) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this to scheme == ""
&& username != ""
?
- If
scheme
isssh://
andusername
isgitUsername
, we'd be wrongly passingtrue
. - If we decide to allow other usernames, we'd be wrongly passing
false
.
nit: Maybe we could define an isSCP
variable in extractHost
? Then, we could pass it to functions like normalizedGithubHost
, findPathSeparator
, and validHostSpecParsed
. This would help limit references to gitUsername
in helper functions for when we generalize allowed usernames.
api/internal/git/repospec.go
Outdated
func validHostSpecParsed(scheme, username, host string) bool { | ||
isSCP := username == gitUsername | ||
return len(host) > 0 && | ||
(scheme != "" || isSCP) // scheme may be blank for URLs like git@github.com:org/repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we change this comment to a non-github example since github urls would have been already been returned by the normalizedGithubHost
code path?
if strings.HasPrefix(scheme, "ssh://") || username == gitUsername { | ||
return normalizedSCPGithub | ||
} | ||
return normalizedHTTPGithub | ||
} | ||
|
||
func normalizeGitHostSpec(host string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern for future PR: I find normalizeGitHostSpec
hard to reason about because it checks for target strings out of context (with strings.Contains
). For example, the repo url ssh://git@ssh.github.com:443/YOUR-USERNAME/YOUR-REPOSITORY.git
with the following use case: https://docs.github.com/en/authentication/troubleshooting-ssh/using-ssh-over-the-https-port
would be normalized to git@github.com:YOUR-USERNAME/YOUR-REPOSITORY.git
, which might come as a surprise to the user.
Maybe we could reuse some of the logic in the existing extractHost
to replace the call to normalizeGitHostSpec
in parseGitURL
.
At the very least, I'm glad we no longer use this function in extractHost
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my next PR is getting rid of this entirely in favour of handling all host parsing with extractHost
! That's a great example of changed behaviour in this PR though, and I added it as a test.
}, | ||
}, | ||
{ | ||
name: "complex github ssh url from docs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this on master shows that the current result is even worse than you thought. Here's what passes there:
{
name: "complex github ssh url from docs",
input: "ssh://git@ssh.github.com:443/YOUR-USERNAME/YOUR-REPOSITORY.git",
cloneSpec: "ssh://git@ssh.git",
absPath: notCloned.String(),
repoSpec: RepoSpec{
Host: "ssh://git@",
OrgRepo: "ssh",
Path: "",
GitSuffix: ".git",
},
},
Totally mangled clonespec!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future PR: Maybe we could check that delimiters like .git
are succeeded by a /
: https://github.com/kubernetes-sigs/kustomize/blob/master/api/internal/git/repospec.go#L136? I believe this test case reveals that we're looking for such delimiters out of context.
api/internal/git/repospec_test.go
Outdated
}, | ||
}, | ||
{ | ||
name: "supported protocol after username (invalid and will be rejected by git)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two cases have slightly different parsing results on master. Basically a slash moves. But the clonespec stays the same, and git will reject it anyway, so I don't think this matters a ton.
{
name: "unsupported protocol after username",
input: "git@scp://github.com/org/repo.git//path",
cloneSpec: "git@scp://github.com/org/repo.git",
absPath: notCloned.Join("path"),
repoSpec: RepoSpec{
Host: "git@scp:/",
OrgRepo: "/github.com/org/repo",
Path: "/path",
GitSuffix: ".git",
},
},
{
name: "supported protocol after username",
input: "git@ssh://github.com/org/repo.git//path",
cloneSpec: "git@ssh://github.com/org/repo.git",
absPath: notCloned.Join("path"),
repoSpec: RepoSpec{
Host: "git@ssh:/",
OrgRepo: "/github.com/org/repo",
Path: "/path",
GitSuffix: ".git",
},
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not request for change: This test case made me ponder the prioritization of delimiters in parseGitURL
. Currently, we always prioritize .git
over //
. Given that we advertise //
as the repo path delimiter, we could alternatively look for .git
, only if it isn't preceded by //
. Otherwise, we could also document the .git
delimiter more clearly in https://github.com/kubernetes-sigs/kustomize/blob/master/examples/remoteBuild.md.
Regardless, this'd be work for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my next PR is tackling parseGitURL
, so I'll consider this then.
api/internal/git/repospec_test.go
Outdated
"gh:org/repo", | ||
"url lacks orgRepo", | ||
}, | ||
"username unsupported with http": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that all three of these would be accepted on master, so the errors here are a behaviour change. I'm not 100% sure about this yet; will ponder more before we merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, I believe file
git urls are technically allowed to have @
in their repo paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I updated the code and tests to allow it. As commented elsewhere, I also decided against this validation more generally.
Thank you for the excellent feedback @annasong20 . I've made some moderate changes in response. PTAL. |
api/internal/git/repospec.go
Outdated
// although the git protocol (which is insecure and unsupported on many platforms, including Github) | ||
// will not actually be used as intended. | ||
func ignoreForcedGitProtocol(n string) string { | ||
n, _ = trimPrefixIgnoreCase(n, "git::") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: we could deprecate this behaviour by printing a warning here now. It's probably not harmful, but technically speaking, the user is requesting that the git protocol be forced, and we're not doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Fix ssh parsing in issue 4847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really just 1 request for change. Sorry I didn't catch it the 1st round.
Everything else looks great!
if colonIndex := strings.Index(hostPath, ":"); colonIndex > 0 && colonIndex < sepIndex { | ||
sepIndex = colonIndex | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return -1
as the index otherwise? SCP-like urls don't accept /
as a separator even if there isn't a :
(not preceded by a /
). git fetch
will fail for such a case.
The error message here: https://github.com/kubernetes-sigs/kustomize/pull/4954/files#diff-2a7284bc651dfb0465d49f2270ed590fce191d14bba769751e4d16d9184a198cR97 should be url lacks host
.
This change will require some accompanying changes in extractHost
for github.com/path/to/repo.git
to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When thinking about this and the question of whether to throw an error on http(s)+username, I came back to this question: what is the point of all this parsing code? In the end, we have a URL that we're going to pass to git, and it will throw an error. Why don't we always let it do its job?
I wasn't here for the original authorship, but my answer would be: because we support some invalid formats (e.g. the git::
prefix we strip, github URLs we "fix") and the URLs we are given ultimately point to a Kustomization root, whereas git needs the repo root. So the point is to identify the repo root, i.e the clonespec, and to fix any oddities we specifically allow for historical reasons.
That led me to conclude that we should only throw errors in this code when the anomaly prevents us from achieving those two goals. When we are able to parse the intended clone spec (valid or not), we should do so, and let git throw the error (if any).
With that in mind, I think it is clear that the http(s)+username case should be allowed to fall through to git, and I've restored that behaviour. I think that is also true for the case you're highlighting here: we've found a schemeless, non-Github url that starts with a username and is missing the colon that should delimit the host. The following test passes on master, and currently on this branch:
{
name: "non-github_scp incorrectly using slash (invalid but currently passed through to git)",
input: "git@bitbucket.org/company/project.git//path?ref=branch",
cloneSpec: "git@bitbucket.org/company/project.git",
absPath: notCloned.Join("path"),
repoSpec: RepoSpec{
Host: "git@bitbucket.org/",
RepoPath: "company/project",
KustRootPath: "/path",
Ref: "branch",
GitSuffix: ".git",
},
},
I think we should continue to let it pass though.
After reading the above, do you agree? Are there other cases to consider around SCP-like with missing colons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense! Thank you for writing such a great explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially hoped looking for :
before /
would decrease our chances of mistaking a relative local file path that contains an @
in the first directory for a repo url, but I realized that users can prefix such file paths with .
to look like ./path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, and also the "a relative local file path that contains an @ in the first directory" scenario is likely vanishingly rare, so I'm not too worried.
@@ -226,10 +226,6 @@ func TestLocRootPath_URLComponents(t *testing.T) { | |||
urlf: "file:///var/run/repo//%s?ref=value", | |||
path: simpleJoin(t, FileSchemeDir, "var", "run", "repo", "value"), | |||
}, | |||
"gh_shorthand": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing this!
api/internal/git/repospec_test.go
Outdated
"gh:org/repo", | ||
"url lacks orgRepo", | ||
}, | ||
"username unsupported with http": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, I believe file
git urls are technically allowed to have @
in their repo paths.
api/internal/git/repospec_test.go
Outdated
"bad_scp": { | ||
"git@local/path:file/system", | ||
"url lacks repoPath", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this into TestNewRepoSpecFromUrl_Smoke
? This test originally checked that extractHost
failed on not being able to find a host delimiter because :
is behind a /
. Now that we use /
as the delimiter instead, this test no longer makes sense.
If we still want to document the behavior that we only detect :
s not preceded by /
, the TestNewRepoSpecFromUrl_Smoke
test would look something like this:
{
name: "colon behind slash not scp delimiter",
input: "git@gitlab.com/user:name/YOUR-REPOSITORY.git",
cloneSpec: "git@gitlab.com",
absPath: notCloned.String(),
repoSpec: RepoSpec{
Host: "git@gitlab.com/",
OrgRepo: "user:name/YOUR-REPOSITORY",
Path: "",
GitSuffix: ".git",
},
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that's much better! Updated.
/lgtm This is awesome, @KnVerey! |
🎉 thank you for the diligent reviews @annasong20 ! |
v1.27.1 No CLI change. v1.27.0 API Change * Adds feature gate NodeLogQuery which provides cluster administrators with a streaming view of logs using kubectl without them having to implement a client side reader or logging into the node. Feature * Added "general", "baseline", and "restricted" debugging profiles for kubectl debug. * Added "netadmin" debugging profiles for kubectl debug. * Added --output plaintext-openapiv2 argument to kubectl explain to use old openapiv2 explain implementation. * Added e2e tests for kubectl --subresource for beta graduation * Changed kubectl --subresource flag to beta (#116595, @MadhavJivrajani) * Enable external plugins can be used as subcommands for kubectl create command if subcommand does not exist as builtin only when KUBECTL_ENABLE_CMD_SHADOW environment variable is exported. * Kubectl will now display SeccompProfile for pods, containers and ephemeral containers, if values were set. * Kubectl: added e2e test for default container annotation * Made kubectl-convert binary linking static (also affects the deb and rpm packages). * Promoted whoami kubectl command. * Since Kubernetes v1.5, kubectl apply has had an alpha-stage --prune flag to support deleting previously applied objects that have been removed from the input manifest. This feature has remained in alpha ever since due to performance and correctness issues inherent in its design. This PR exposes a second, independent pruning alpha powered by a new standard named ApplySets. An ApplySet is a server-side object (by default, a Secret; ConfigMaps are also allowed) that kubectl can use to accurately and efficiently track set membership across apply operations. The format used for ApplySet is set out in KEP 3659 as a low-level specification. Other tools in the ecosystem can also build on this specification for improved interoperability. To try the ApplySet-based pruning alpha, set KUBECTL_APPLYSET=true and use the flags --prune --applyset=secret-name with kubectl apply. * Switched kubectl explain to use OpenAPIV3 information published by the server. OpenAPIV2 backend can still be used with the --output plaintext-openapiv2 argument * Upgrades functionality of kubectl kustomize as described at https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv5.0.0 and https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv5.0.1. This is a new major release of kustomize, so there are a few backwards-incompatible changes, most of which are rare use cases, bug fixes with side effects, or things that have been deprecated for multiple releases already: - kubernetes-sigs/kustomize#4911: Drop support for a very old, legacy style of patches. patches used to be allowed to be used as an alias for patchesStrategicMerge in kustomize v3. You now have to use patchesStrategicMerge explicitly, or update to the new syntax supported by patches. See examples in the PR description of kubernetes-sigs/kustomize#4911. - kubernetes-sigs/kustomize#4731: Remove a potential build-time side-effect in ConfigMapGenerator and SecretGenerator, which loaded values from the local environment under some circumstances, breaking kustomize build's side-effect-free promise. While this behavior was never intended, we deprecated it and are announcing it as a breaking change since it existed for a long time. See also the Eschewed Features documentation. - kubernetes-sigs/kustomize#4985: If you previously included .git in an AWS or Azure URL, we will no longer automatically remove that suffix. You may need to add an extra / to replace the .git for the URL to properly resolve. - kubernetes-sigs/kustomize#4954: Drop support for using gh: as a host (e.g. gh:kubernetes-sigs/kustomize). We were unable to find any usage of or basis for this and believe it may have been targeting a custom gitconfig shorthand syntax. * [alpha: kubectl apply --prune --applyset] Enabled certain custom resources (CRs) to be used as ApplySet parent objects. To enable this for a given CR, apply the label applyset.kubernetes.io/is-parent-type: true to the CustomResourceDefinition (CRD) that defines it. * kubectl now uses HorizontalPodAutoscaler v2 by default. Documentation * kubectl create rolebinding -h Bug or Regression * Added (dry run) and (server dry run) suffixes to kubectl scale command when dry-run is passed * Changed the error message of kubectl rollout restart when subsequent kubectl rollout restart commands are executed within a second * Changed the error message to cannot exec into multiple objects at a time when file passed to kubectl exec contains multiple resources * Kubectl: enabled usage of label selector for filtering out resources when pruning for kubectl diff * kubectl port-forward now exits with exit code 1 when remote connection is lost
@annasong20 this iterates on your great work in #4932. I find this code difficult to wrap my head around without diving in and working with it (hopefully we're now fixing that step by step!). Feel free to review this PR or to cherry-pick my commit onto your branch (in which case we can continue on #4932), whatever is easier for you. I have more ideas for RepoSpec, but I'll make follow-up PRs for them. This attempts to keep the same scope as your PR (additional test cases aside).
Functional changes:
git::
prefix. This prefix originally signalled (to go-getter) that the git protocol should be used rather than whatever would be used by default based on the rest of the URL. However, Kustomize has been stripping it unconditionally since v2.0. So Kustomize has long since not being doing what the user intended.gh:
prefix. We were unable to find any basis for this (it is not part of the protocol, nor any other standard), nor any usage at all in public Kustomization files. We believe it may have been targeting someone's custom gitconfig shorthand syntax.Part of #4866
Fixes #4847
/cc @annasong20 @natasha41575