Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to clone gitlab private GitRepository with libgit2 #433

Closed
Tracked by #2593
jjlakis opened this issue Aug 13, 2021 · 24 comments
Closed
Tracked by #2593

Unable to clone gitlab private GitRepository with libgit2 #433

jjlakis opened this issue Aug 13, 2021 · 24 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jjlakis
Copy link

jjlakis commented Aug 13, 2021

Hello.
I have a GitRepository poitning to a private repo with private keys (with write permissions) included:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
  name: cnm-dep
spec:
  interval: 24h
  url: ssh://git@my.gitlab/jj/cnm-dep.git
  secretRef:
    name: gitlab-key
  ref:
    branch: master

This works as expected, reconciles itself, no problem with updating corresponding Kustomization resource. However, image automation controller is unable to clone this repository when ImagePolicy is met. ImageUpdateAutomation resource is the following:

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: cnm-dep
spec:
  interval: 30s
  sourceRef:
    kind: GitRepository
    name: cnm-dep
  git:
    checkout:
      ref:
        branch: master
    commit:
      author:
        email: ml3k@interia.pl
        name: fluxcd
      messageTemplate: |
        (FluxCD) Update images
        {{range .Updated.Images}}
        - {{ . }}
        {{end}}
    push:
      branch: master
  update:
    path: ./
    strategy: Setters

Logs from image-automation-controller:

{"level":"error","ts":"2021-08-13T12:49:40.933Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"cnm-dep","namespace":"cnm","error":"unable to clone 'ssh://git@my.gitlab/jj/cnm-dep.git', error: Certificate"}

Identical configuration works as expected for github private repositories in the same cluster. Version i run is ghcr.io/fluxcd/image-automation-controller:v0.14.0.

Thank you in advance.

@jjlakis jjlakis changed the title ImageUpdateAutomation is unable to clone private GitRepository ImageUpdateAutomation is unable to clone gitlab private GitRepository Aug 13, 2021
@squaremo
Copy link
Member

This is pretty baffling -- image-automation-controller uses substantially the same code as source-controller for cloning the repo. The only difference I can see is that source-controller assigns auth to a local:

	// determine auth method
	auth := &git.Auth{}
	if repository.Spec.SecretRef != nil {
		authStrategy, err := strategy.AuthSecretStrategyForURL(
			repository.Spec.URL,
			git.CheckoutOptions{
				GitImplementation: repository.Spec.GitImplementation,
				RecurseSubmodules: repository.Spec.RecurseSubmodules,
			})
		if err != nil {
			return sourcev1.GitRepositoryNotReady(repository, sourcev1.AuthenticationFailedReason, err.Error()), err
		}

		name := types.NamespacedName{
			Namespace: repository.GetNamespace(),
			Name:      repository.Spec.SecretRef.Name,
		}

		var secret corev1.Secret
		err = r.Client.Get(ctx, name, &secret)
		if err != nil {
			err = fmt.Errorf("auth secret error: %w", err)
			return sourcev1.GitRepositoryNotReady(repository, sourcev1.AuthenticationFailedReason, err.Error()), err
		}

		auth, err = authStrategy.Method(secret)
		if err != nil {
			err = fmt.Errorf("auth error: %w", err)
			return sourcev1.GitRepositoryNotReady(repository, sourcev1.AuthenticationFailedReason, err.Error()), err
		}
	}

	checkoutStrategy, err := strategy.CheckoutStrategyForRef(
		repository.Spec.Reference,
		git.CheckoutOptions{
			GitImplementation: repository.Spec.GitImplementation,
			RecurseSubmodules: repository.Spec.RecurseSubmodules,
		},
	)
	if err != nil {
		return sourcev1.GitRepositoryNotReady(repository, sourcev1.GitOperationFailedReason, err.Error()), err
	}

	gitCtx, cancel := context.WithTimeout(ctx, repository.Spec.Timeout.Duration)
	defer cancel()

	commit, revision, err := checkoutStrategy.Checkout(gitCtx, tmpGit, repository.Spec.URL, auth)
	if err != nil {
		return sourcev1.GitRepositoryNotReady(repository, sourcev1.GitOperationFailedReason, err.Error()), err
	}

while image-automation-controller assigns into a struct, then uses the value from the struct:

type repoAccess struct {
	auth *git.Auth
	url  string
}

func (r *ImageUpdateAutomationReconciler) getRepoAccess(ctx context.Context, repository *sourcev1.GitRepository) (repoAccess, error) {
	var access repoAccess
	access.auth = &git.Auth{}
	access.url = repository.Spec.URL

	authStrat, err := gitstrat.AuthSecretStrategyForURL(access.url, git.CheckoutOptions{GitImplementation: sourcev1.LibGit2Implementation})
	if err != nil {
		return access, err
	}

	if repository.Spec.SecretRef != nil && authStrat != nil {

		name := types.NamespacedName{
			Namespace: repository.GetNamespace(),
			Name:      repository.Spec.SecretRef.Name,
		}

		var secret corev1.Secret
		err = r.Client.Get(ctx, name, &secret)
		if err != nil {
			err = fmt.Errorf("auth secret error: %w", err)
			return access, err
		}

		access.auth, err = authStrat.Method(secret)
		if err != nil {
			err = fmt.Errorf("auth error: %w", err)
			return access, err
		}
	}
	return access, nil
}

func (r repoAccess) remoteCallbacks() libgit2.RemoteCallbacks {
	return libgit2.RemoteCallbacks{
		CertificateCheckCallback: r.auth.CertCallback,
		CredentialsCallback:      r.auth.CredCallback,
	}
}

// cloneInto clones the upstream repository at the `ref` given (which
// can be `nil`). It returns a `*gogit.Repository` since that is used
// for committing changes.
func cloneInto(ctx context.Context, access repoAccess, ref *sourcev1.GitRepositoryRef, path string) (*gogit.Repository, error) {
	checkoutStrat, err := gitstrat.CheckoutStrategyForRef(ref, git.CheckoutOptions{GitImplementation: sourcev1.LibGit2Implementation})
	if err == nil {
		_, _, err = checkoutStrat.Checkout(ctx, path, access.url, access.auth)
	}
	if err != nil {
		return nil, err
	}

	return gogit.PlainOpen(path)
}

@squaremo
Copy link
Member

The only difference [between image-automation-controller and source-controller, when cloning]

Well that's barring the difference that image-automation-controller always uses libgit2, rather than the default go-git. @jjlakis If you set the GitRepository object's .spec.gitImplementation field to libgit2, does source-controller still succeed?

@jjlakis
Copy link
Author

jjlakis commented Aug 21, 2021

@squaremo Good point, source-controller indeed fails with gitImplementation set to libgit2, same error.

{"level":"error","ts":"2021-08-21T16:27:37.925Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"cnm-dep","namespace":"cnm","error":"unable to clone 'ssh://git@gl.lakis.io/jj/cnm-dep.git', error: Certificate"}

Should I consider the Git server to be a problem then maybe? Ideally image automation controller uses the git implementation from GitRepository resource. Thanks.

@squaremo
Copy link
Member

Ideally image automation controller uses the git implementation from GitRepository resource.

I switched to using libgit2 regardless of the GitRepository object .spec.gitImplementation in fluxcd/image-automation-controller#177. This was because the go-git code in source-controller/pkg unconditionally does a shallow clone, and this caused problems for the branching (see fluxcd/image-automation-controller#164 (comment) and the comments following). Another way to fix it would be to alter the code in source-controller/pkg so that it can optionally not do a shallow clone when using go-git. But there would still be reasons to specify libgit2 as the implementation, so it's still useful to get to the bottom of this issue.

@squaremo
Copy link
Member

I'm going to move this to source-controller, since it fails there too (and image-automation-controller uses that code).

@squaremo squaremo transferred this issue from fluxcd/image-automation-controller Aug 24, 2021
@squaremo
Copy link
Member

A bit of info that might help: @jjlakis Can you tell us what data fields are in the secret you use for the GitRepository? (not the contents obviously! just which fields have values, in other words, the keys)

@squaremo squaremo self-assigned this Aug 24, 2021
@squaremo squaremo changed the title ImageUpdateAutomation is unable to clone gitlab private GitRepository Unable to clone gitlab private GitRepository with libgit2 Aug 24, 2021
@jjlakis
Copy link
Author

jjlakis commented Aug 24, 2021

@squaremo Here's the gitlab-key secret I use with gitRepository:

data:           
  identity: private_key
  identity.pub: public_key
  known_hosts: known_hosts

I just realized that I didn't try http user/token access to GitLab. Not sure if this would make any difference though

@squaremo
Copy link
Member

In https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/transport.go#L173, error: Certificate is returned when it can't find a match for the host in known_hosts. Is it possible that there's no entry for the private gitlab host in known_hosts, @jjlakis? (Or if there is, maybe it isn't being recognised by the matching code; if you post what you think is a match, I can confirm that ..)

@hiddeco
Copy link
Member

hiddeco commented Oct 9, 2021

@jjlakis if your known_hosts has an ECDSA* entry, the solution is to replace this with an entry of another type, as the current version of libgit2 we make use of does not have support for ECDSA* types. Support for this will be added once we have a working libgit2 version >=1.2.0.

For other folks looking for authentication related issues: the latest release of the image-automation-controller (v0.15.0) contains libgit2 linked against OpenSSL and LibSSH2, which based on my research and extensive testing, should solve most issues around private key formats (not host keys).

@ilya-git
Copy link

ilya-git commented Oct 14, 2021

I have installed latest flux with v0.15.0 image-automation-controller and get this error: unable to clone 'ssh://bitbucket.org/<myrepo>', error: callback returned unsupported credentials type. This is my known_hosts entry:

bitbucket.org ssh-rsa AAAA...

Is there any workaround? I have upgraded to the latest flux and image automation stopped working after this...

Switching GitRepository to libgit2 makes it throw the same error

@hiddeco
Copy link
Member

hiddeco commented Oct 14, 2021

@ilya-git can you provide a pseudo example of your GitRepository, and more information about the private key type you are making use of?

@ilya-git
Copy link

ilya-git commented Oct 14, 2021

Hi, thanks for the fast reply, I have actually just figured out what was wrong, I was using a "wrong" format that was nonetheless accepted by the previous library:

ssh://bitbucket.org/my-repo

After I upgraded, the new libgit2 library is used in the image automation controller that does not support this format apparently, so I have fixed it to:

ssh://git@bitbucket.org/my-repo

I don't know if the first format is even correct, but since it was supported it is probably worth noting in the changelog as a breaking change (unless it was totally wrong)?

Is it a good idea to implement validation of the url maybe to throw an error if the format is wrong?

P.S. The documentation actually provides an example of the correct format quite explicitly, I don't know how it slipped in in the first place, and ironically enough was working "fine"

@squaremo
Copy link
Member

Is it a good idea to implement validation of the url maybe to throw an error if the format is wrong?

The user part of the URL is technically optional -- so you were getting the username wrong, rather than the format wrong. Possibly newer versions of libgit2 consider leaving out the username as a probable mistake, if they warn about it.

@ilya-git
Copy link

If the user part is technically optional, then I assume this is an unintended breaking change, so might be a good idea to update the changelog with it.

@ChrisJBurns
Copy link

I get this, but I'm using a https instead of ssh clone link.

@flelain
Copy link

flelain commented Dec 31, 2021

Hello,

I came across an issue very similar to that described here. Some details about my setup and observations:

  • I'm using a private repository (GCP), that I access with SSH authentication
  • works fine with source / GitRepository, using a secret carved out of SSH public/private keys and known_hosts
  • fails for image update automation, this latter telling me it's unable to clone the repo because of a malformed URL.

As mentioned hereabove, I applied libgit2 to .spec.gitImplementation in source and observed that repo cloning was failing too, with the very same error.

I tried to work around this by changing the ECDSA key that Google provides for the host source.developers.google.com:2022 but ... looks like the ECDSA is the only public key Google exposes (using ssh-keyscan).

So, reaching out to you guys here to have your opinion or update about this issue :)

Thanks for the support you're offering us all here!

@nniehoff
Copy link

I ran into this same issue today after generating a new secret. Originally I created the known_hosts with ssh-keyscan -H github.com which caused this error, replacing the known_hosts with the ssh-keyscan github.com seems to work. So for me, it seems hashing the hostnames in known_hosts caused this issue.

@flelain
Copy link

flelain commented Jan 12, 2022

Hi there,

Thanks for your feedback @nniehoff . On my end, I'm not hashing the hostnames but still, it's not working.

See the ssh-keyscan and output below:

$ ssh-keyscan -p 2022 source.developers.google.com | tee known_hosts
# source.developers.google.com:2022 SSH-2.0-Go
# source.developers.google.com:2022 SSH-2.0-Go
# source.developers.google.com:2022 SSH-2.0-Go
[source.developers.google.com]:2022 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBB5Iy4/cq/gt/fPqe3uyMy4jwv1Alc94yVPxmnwNhBzJqEV5gRPiRk5u4/JJMbbu9QUVAguBABxL7sBZa5PH/xY=

@pjbgf
Copy link
Member

pjbgf commented Mar 22, 2022

We have recently upgraded our libgit implementation to 1.3.0 which should resolve this issue. Can you please try again with source-controller version v0.22.0 and let us know how you get on?

This version also includes an experimental libgit2 transport. If the vanilla installation does not work for you, would you mind to give it a try enabling the experimental transport and trying again too please? Instructions on how to enable it on the link above.

@pjbgf pjbgf added the bug Something isn't working label Mar 22, 2022
@pjbgf
Copy link
Member

pjbgf commented May 27, 2022

@flelain can you please try again with the RC we have just released:
ghcr.io/fluxcd/source-controller:rc-4b3e0f9a

I have recently tested this version against GCP and it worked fine, so would be keen to understand whether that fixes your issue.

@pjbgf pjbgf added this to the GA milestone May 27, 2022
@pjbgf pjbgf self-assigned this May 27, 2022
@flelain
Copy link

flelain commented Jun 2, 2022

Hello @pjbgf ,

Thanks for your feedback. I tested it out once again and it's still failing the same way (I used ghcr.io/fluxcd/source-controller:rc-4b3e0f9a for source-controller manager container).

But unless I'm totally mistaken, I doubt the issue is due to source-controller. Wouldn't it rather come from the image-automation-controller? Does it make sense? What I observe is that the source-controller handles my app repo very well with my ssh creds. On the contrary, image update automation still fails with this error (the exact same repo source-controller uses without any issue):

unable to clone 'ssh://florent.lelain@orange.com@source.developers.google.com:2022/p/cto-gcp-cloud-ops/r/gcp-cloud-ops-app': malformed URL 'ssh://florent.lelain@orange.com@source.developers.google.com:2022/p/cto-gcp-cloud-ops/r/gcp-cloud-ops-app

We're in source-controller issues here though :-/ Should I go create an issue under image-automation-controller repo?

Thank you for your support!

@pjbgf
Copy link
Member

pjbgf commented Jun 8, 2022

@flelain I managed to reproduce your issue and I believe this is now fixed when using the latest versions of both source-controller and image-automation-controller. Can you please try again with the images below and let us know how you get on?

ghcr.io/fluxcd/source-controller:v0.25.5
ghcr.io/fluxcd/image-automation-controller:v0.23.2

@pjbgf pjbgf moved this to In Progress in Maintainers' Focus Jun 9, 2022
@flelain
Copy link

flelain commented Jun 13, 2022

@pjbgf Just tested it out with the last two versions you pointed out above. It's working fine now: flux manages to amend the k8s manifests I have on my Google Cloud source private repo - access via SSH.

Thank you for your follow-up!

@pjbgf
Copy link
Member

pjbgf commented Jun 13, 2022

@flelain thank you for confirming. 🙇

@pjbgf pjbgf closed this as completed Jun 13, 2022
Repository owner moved this from In Progress to Done in Maintainers' Focus Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

8 participants