Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ssh to clone repos does not work #16

Closed
H4dr1en opened this issue May 11, 2020 · 9 comments
Closed

Use ssh to clone repos does not work #16

H4dr1en opened this issue May 11, 2020 · 9 comments

Comments

@H4dr1en
Copy link
Contributor

H4dr1en commented May 11, 2020

Context

If I don't specify git user and git pass in the config, it should automatically use SSH.

Problem

  • It is not clear which ssh key is used
  • When running the following commands during startup in trains-agent machine:
eval "$(ssh-agent -s)"
ssh-add /home/h4dr1en/.ssh/id_rsa

I can see in the logs that it was well added (at startup user is root)

May 11 10:01:00 GCEMetadataScripts[711]: 2020/05/11 10:01:00 GCEMetadataScripts: startup-script: Agent pid 1373
May 11 10:01:00 GCEMetadataScripts[711]: 2020/05/11 10:01:00 GCEMetadataScripts: startup-script: Identity added: /home/h4dr1en/.ssh/id_rsa (/home/h4dr1en/.ssh/id_rsa)

If I connect to the trains-agent, I can also test the connection to github:

$ ssh -T git@github.com
Warning: Permanently added the RSA host key for IP address '140.82.118.3' to the list of known hosts.
Hi H4dr1en! You've successfully authenticated, but GitHub does not provide shell access.

But for some reason trains-agent does not succeed to clone the repo of the experiment (from the logs):

cloning: https://github.com/h4dr1en/training-repo
fatal: could not read Username for 'https://github.com': terminal prompts disabled
Repository cloning failed: Command '['clone', 'https://github.com/h4dr1en/training-repo', '/root/.trains/vcs-cache/training-repo.pytorch.65c96545aef218c67580e8307b6d0267/training-repo', '--quiet', '--recursive']' returned non-zero exit status 128
trains_agent: ERROR: Failed cloning repository.
1) Make sure you pushed the requested commit:
(branch='master', tag='', repository='https://github.com/h4dr1en/training-repo', commit_id='', entry_point='src/cli.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]

It looks like trains-agent still tries to clone using HTTP and fails because I did not specify creds in trains.conf file

Note: I could solve the problem by adding (from here):

git config --global --add url."git@github.com:".insteadOf "https://github.com/"

at startup, but I think this should be handled by trains-agent, right?

EDIT:

git config --system --add url."git@github.com:".insteadOf "https://github.com/"
@H4dr1en H4dr1en changed the title Use ssh to clone repos Use ssh to clone repos does not work May 11, 2020
@bmartinn
Copy link
Member

bmartinn commented May 11, 2020

Hi @H4dr1en ,
trains-agent will switch to https if it has git_user/git_pass defined in ~/trains.conf, example

It looks like you already made sure git_user/git_pass are empty, in which case, trains-agent counts on git to resolve the credentials.

What is the repository link you have in the experiment itself, is it starting with https:// or git@github.com ?

Note: I could solve the problem by adding (from here):
git config --global --add url."git@github.com:".insteadOf "https://github.com/"

Kudos for quickly sorting it out!

It's a good question if we want to have trains-agent run the command for us automatically.
I think that in general we should avoid this kind of thing, as it changes the entire system configuration without you knowing about it...
How about we add it to the error message ? So at least you know, you have to configure it.
Do you think that would have made your life easier ?

EDIT:
Notice that the git configuration needs to run once, not every startup. It will change the git configuration on the entire machine.

@H4dr1en
Copy link
Contributor Author

H4dr1en commented May 11, 2020

What is the repository link you have in the experiment itself, is it starting with https:// or git@github.com ?

This is a link starting with https://

How about we add it to the error message ? So at least you know, you have to configure it.
Do you think that would have made your life easier ?

I think my confusion comes from the fact that in the configuration file is written:

# leave blank for GIT SSH credentials

So I assumed that if I leaved blank, it would always use ssh with whatever ssh key was configured in the machine for any git action.

Maybe it would be more clear for the users if there was a parameter named git_protocol that could be either HTTPS or SSH, coupled with a second parameter named git_ssh_key_path that would be the path to the local SSH key to use:

agent {

    # Either HTTPS or SSH
    git_protocol="SSH"

    # Only considered if git_protocol = "HTTPS"
    git_user=""
    git_pass=""

    # Only considered if git_protocol = "SSH"
    git_ssh_key_path=""

    ...

And then internally, if git_protocol="SSH", trains-agent would do for me:

eval "$(ssh-agent -s)"
ssh-add {git_ssh_key_path}
ssh -T git@github.com  # Make sure the key is registered in github
git config --add url."git@github.com:".insteadOf "https://github.com/"  # Local to experiment

Something like that would remove the confusion around this topic, WDYT?

EDIT:

EDIT:
Notice that the git configuration needs to run once, not every startup. It will change the git configuration on the entire machine.

In my case, I don't mind since trains-agent is running on a instance dedicated to trains-agent.

EDIT 2:
Typo in the solution I mentionned, it actually uses --system, not --global:

git config --system --add url."git@github.com:".insteadOf "https://github.com/"

@bmartinn
Copy link
Member

bmartinn commented May 11, 2020

@H4dr1en I see now... the reason you had an issue in the first place is the fact the trains detected the repository with https (which means that on the machine that was running the initial code, the git was not configured with ssh), while the trains-agent machine was configured with SSH key.

So what I would like is for trains-agent to automatically build "git@" link if it has no git user/pass, (or as you suggested, configured with git_protocol=SSH). My only fear is that if for example we also have public repositories, where the links are https, changing them to "git@" will break git clone/pull.
I'll check it, if it passes, then I think adding git_protocol is a great idea :)

Regrading implementation, just replacing "https://github.com/" with "git@github.com:" should actually be enough. Adding the git domain to the SSH key should be done by the user, as we do not want to add any arbitrary domain to a user's ssh certified machine list :)

@H4dr1en
Copy link
Contributor Author

H4dr1en commented May 18, 2020

My only fear is that if for example we also have public repositories, where the links are https, changing them to "git@" will break git clone/pull.
I'll check it, if it passes, then I think adding git_protocol is a great idea :)

This is not blocking, I did on my machine and I can still pull any public repository.

clearml-bot pushed a commit that referenced this issue Jun 17, 2020
…:// (issue #16)

Add git user/pass credentials for pip git packages (git+http and  git+ssh) (issue #22)
@pedropalb
Copy link

I am having trouble with a similar issue.
First, the base experiment had the remote address store as https. As reported above, the agent tried to clone through the https address regardless the empty git credentials.

Then I changed the remote to the ssh pattern. It is working when I clone manually. But when I try to build for this base task through trains-agent, I get a similar message to the reported above:

1) Make sure you pushed the requested commit:
(repository='git@<github_enterprise_repo>', branch='my_branch', commit_id='12345', tag='', entry_point='runner.py', workin
g_dir='working_dir')
2) Check if remote-worker has valid credentials [see worker configuration file]

(I masked the real information but the important part is that the repo is git@...)
The commit is pushed for sure. The second warn does not apply since I'm using the ssh protocol.

How can I print a more detailed error message to check what is really going on? The --log-level DEBUG or ERROR options don't show any further information than the default.

Here is the command I am running:

trains-agent build --id 239fc698ec7643aa97d86e1d2bddebcc --log-level DEBUG

Thanks!

@bmartinn
Copy link
Member

Hi @pedropalb
The agent.force_git_ssh_protocol was added to the latest trains-agent RC, you can test it by following these two steps:

  • In your ~/trains.conf add/change "agent.force_git_ssh_protocol: true"
  • Upgrade trains-agent to the latest RC with pip install trains-agent==0.15.2rc0

It should solve the issue.

BTW:
If you want to manually execute a Task using the trains-agent, on any machine you can always do:
trains-agent execute --id my_task_id_here
Extra debug info can be added with:
trains-agent --debug execute --id my_task_id_here
And if you want to test inside a docker:
trains-agent execute --id my_task_id_here --docker default_docker_name

@212792736
Copy link

Hi all,

I think I have a related problem as it is related to ssh and trains.
So I can only use ssh for cloning the repo from the company git repository. However, that requires a combination of adding public key to the ssh agent.

git@githubbuildcompany.com: Permission denied (publickey).

How can I add my credentials for the trains to be able to clone the repos? It might be written somewhere, but I was unable to find it myself

@bmartinn
Copy link
Member

Hi @212792736
In the machine running the trains-agent make sure you have your ssh key (i.e. under ~/.ssh).
trains-agent will make sure the git clone uses that ssh key (even when running in docker mode).

@H4dr1en
Copy link
Contributor Author

H4dr1en commented Oct 27, 2020

@212792736 I believe the issue your are facing is not related to the original issue I described. Could you please open a separate issue or get help on the trains Slack channel? I will close this issue because as of trains version 0.16 the original issue has been fixed 👍

@H4dr1en H4dr1en closed this as completed Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants