Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added GitAuthError #6493

Merged
merged 25 commits into from
Sep 13, 2021
Merged

Added GitAuthError #6493

merged 25 commits into from
Sep 13, 2021

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Aug 26, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

To match updates in docs. From iterative/dvc.org#2731 (review)

@daavoo daavoo requested a review from a team as a code owner August 26, 2021 08:06
@daavoo daavoo requested a review from karajan1001 August 26, 2021 08:06
@pmrowla
Copy link
Contributor

pmrowla commented Aug 26, 2021

It doesn't have to be an SSH URL in every case though.

The SSH requirement is specifically for pushing (writing) to a remote git server that requires authentication (meaning pushing to github/gitlab). exp pull and exp list will work with any git URL (including HTTP) as long as the git server is public. If you are hosting a personal git server somewhere that was configured to allow pushing over unauthenticated HTTP, you could use the HTTP git URL format (and would not be required to use SSH).

And yes, read operations that require authentication (pull/list for private github/gitlab repos) would require using SSH.

@daavoo
Copy link
Contributor Author

daavoo commented Aug 26, 2021

It doesn't have to be an SSH URL in every case though.

The SSH requirement is specifically for pushing (writing) to a remote git server that requires authentication (meaning pushing to github/gitlab). exp pull and exp list will work with any git URL (including HTTP) as long as the git server is public. If you are hosting a personal git server somewhere that was configured to allow pushing over unauthenticated HTTP, you could use the HTTP git URL format (and would not be required to use SSH).

And yes, read operations that require authentication (pull/list for private github/gitlab repos) would require using SSH.

Not sure if there is a way of synthesizing those cases for the CLI help or if we should just say any valid URL and leave the detailed info for the docs.

cc @jorgeorpinel

@pmrowla
Copy link
Contributor

pmrowla commented Aug 27, 2021

IMO the CLI help text should just use a valid Git URL, and a more detailed explanation about the SSH requirement when server auth is required belongs in the online docs.

@skshetry
Copy link
Member

@pmrowla, is it possible to catch the exception and provide the hint (maybe with the link to the proper documentation)?

@pmrowla
Copy link
Contributor

pmrowla commented Aug 27, 2021

@pmrowla, is it possible to catch the exception and provide the hint (maybe with the link to the proper documentation)?

Yes, in the dulwich backend we need to check for these exceptions when we do a GitClient operation
https://github.com/dulwich/dulwich/blob/5559440256dc231637896d97f1016f9052cc7dc3/dulwich/client.py#L2101

@daavoo
Copy link
Contributor Author

daavoo commented Aug 27, 2021

IMO the CLI help text should just use a valid Git URL, and a more detailed explanation about the SSH requirement when server auth is required belongs in the online docs.

So I just added a minor change here including the word valid. At the very least it implies that there some URLs could be invalid. The proper definition of what valid implies would be in the docs.

@pmrowla, is it possible to catch the exception and provide the hint (maybe with the link to the proper documentation)?

This would be a good improvement

@daavoo daavoo marked this pull request as ready for review August 27, 2021 10:29
@dberenbaum
Copy link
Collaborator

Agree that giving a hint seems much more useful. Seems like we can just add another exception handler for HTTPUnauthorized in

?

Even better would be to prompt for username and password and try again. Is there a reason we wouldn't want to do that?

@pmrowla
Copy link
Contributor

pmrowla commented Aug 30, 2021

Even better would be to prompt for username and password and try again. Is there a reason we wouldn't want to do that?

Using prompts won't work in automated contexts (like the CML auto-push feature) but it would be possible for us to support it. The main issue will just be implementing proper support for git HTTP auth (see dulwich tests for examples: https://github.com/dulwich/dulwich/blob/76dd8356f9d8870f44226e4232c24fbc1e573e93/dulwich/tests/test_client.py#L565)

But it should be noted that even if we add the prompts and username/password HTTP support for git operations, it won't work for pushing to github (https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/)

@dberenbaum
Copy link
Collaborator

Using prompts won't work in automated contexts (like the CML auto-push feature) but it would be possible for us to support it.

Right, if we make it easier in non-automated scenarios, there could be confusion if people try to migrate to something like CML. Hopefully they would realize any manual steps would need to be automated? Is it better to give flexibility or require the preferred workflow? It seems like generally we try to let users make these decisions themselves. On the other hand, we don't prompt for remote credentials, but that may be because it would get too complex to handle.

The main issue will just be implementing proper support for git HTTP auth (see dulwich tests for examples: https://github.com/dulwich/dulwich/blob/76dd8356f9d8870f44226e4232c24fbc1e573e93/dulwich/tests/test_client.py#L565)

Can we do this:

  1. Catch the HTTPUnauthorized exception
  2. Prompt for the username and password
  3. Recreate the git client with those keyword args
  4. Retry
  5. Throw if there is still an exception

Does dvc need to worry about the details of http auth beyond asking for username and password?

But it should be noted that even if we add the prompts and username/password HTTP support for git operations, it won't work for pushing to github (https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/)

From a quick test, it looks like users should be able to push to github with http if they replace their passwords with their tokens.

@daavoo
Copy link
Contributor Author

daavoo commented Sep 6, 2021

Can we do this:

1. Catch the `HTTPUnauthorized` exception

2. Prompt for the username and password

3. Recreate the git client with those keyword args

4. Retry

5. Throw if there is still an exception

Does dvc need to worry about the details of http auth beyond asking for username and password?

This would be prompted on each command run (i.e. dvc exp push), right? I feel like that could be somehow annoying.

Not sure if that would be a good user experience or if it would be more helpful to just raise an exception with a link to the docs.

@daavoo daavoo marked this pull request as draft September 6, 2021 18:08
@daavoo daavoo changed the title Specify SSH protocol for git_remote Added GitRemoteAuthError Sep 6, 2021
@dberenbaum
Copy link
Collaborator

dberenbaum commented Sep 7, 2021

This would be prompted on each command run (i.e. dvc exp push), right? I feel like that could be somehow annoying.

Not sure if that would be a good user experience or if it would be more helpful to just raise an exception with a link to the docs.

It could be annoying, but is it any more annoying than doing git push via HTTP? SSH seems like the way to go, but I worry that it will cause trouble for some users (for example, if SSH port is blocked).

We could start with the warning and keep the HTTP authorization open as a separate issue to see if users do encounter it (or wait to see if someone opens that issue themselves).

Edit: I meant we can start with raising an exception for unauthorized http for now.

@skshetry
Copy link
Member

skshetry commented Sep 7, 2021

For autopush, we may need some kind of memoization like the way we do for http(s) filesystems.

dvc/dvc/fs/http.py

Lines 12 to 18 in 6b582bd

@wrap_with(threading.Lock())
@memoize
def ask_password(host, user):
return prompt.password(
"Enter a password for "
"host '{host}' user '{user}'".format(host=host, user=user)
)

@dberenbaum
Copy link
Collaborator

Not sure if there is a way of synthesizing those cases for the CLI help or if we should just say any valid URL and leave the detailed info for the docs.

Is "any valid URL" correct? It seems more like it's "any supported URL" since I think the HTTP URL is also valid according to Git. I also think we could probably summarize behavior as not supporting HTTP authentication. Cc @jorgeorpinel

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’…

See also iterative/dvc.org#2731

dvc/command/experiments.py Outdated Show resolved Hide resolved
dvc/command/experiments.py Outdated Show resolved Hide resolved
dvc/command/experiments.py Outdated Show resolved Hide resolved
dvc/scm/base.py Outdated Show resolved Hide resolved
@jorgeorpinel
Copy link
Contributor

Is "any valid URL" correct? It seems more like it's "any supported URL"
summarize behavior as not supporting HTTP authentication

Good points @dberenbaum. I updated my suggestions ☝️

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
@daavoo
Copy link
Contributor Author

daavoo commented Sep 8, 2021

Need to add tests catching the exception

@daavoo daavoo changed the title Added GitRemoteAuthError Added GitAuthError Sep 9, 2021
@dberenbaum
Copy link
Collaborator

Pending discussion in iterative/dvc.org#2731 (comment).

@karajan1001
Copy link
Contributor

So when is it ready? Still a draft one?

@daavoo
Copy link
Contributor Author

daavoo commented Sep 13, 2021

So when is it ready? Still a draft one?

I guess that we can now consider this ready (docs PR was also approved iterative/dvc.org#2731 (comment)) and have #6586 as a follow-up.

@pmrowla pmrowla merged commit 5c8a642 into master Sep 13, 2021
@pmrowla pmrowla deleted the ssh-git-url branch September 13, 2021 08:56
@efiop efiop added the enhancement Enhances DVC label Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants