Skip to content

Conversation

@gudmundur-heimisson
Copy link

Updates to WebHDFS remote documentation to reflect changes in:
treeverse/dvc#6936

Also see dvc issue:
treeverse/dvc#6935

@gudmundur-heimisson
Copy link
Author

This has now been released with 2.8.3.

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein shcheklein merged commit 234d11a into treeverse:master Nov 8, 2021
Comment on lines -866 to +868
- `user` - user name to access the remote, can be empty in case of using `token`
or if using a `HdfsCLI` cfg file. May only be used when Hadoop security is
off. Defaults to current user as determined by `whoami`.
Only provide the `user` parameter if you are not using `kerberos` or `token`
authentication, since those authentication methods already contain the user
information.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused. So there's no more user parameter? This removes that bullet completely.

I see it's removed from the explicit schema in https://github.com/iterative/dvc/pull/6936/files but it's still used on the test so it must be somewhere I think.

@gudmundur-heimisson @efiop thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. for now I'm reinstating it in #3019.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test it is read out of the URL string. It feels confusing to me to be able to provide the user in both the URL string and also as an explicit parameter so I just removed it as a parameter, but I don't feel that strongly about it.

As a side-note for WebHDFS if the cluster is secured you will never be using this parameter anyway because you must use kerberos or the token (unless you have a Cloudera cluster which can allow LDAP, but that is not part of the Apache standard).

If my understanding is correct, because of this line if we allow it as an explicit parameter as well it would override whatever is in the URL string, which we should probably document somewhere:

https://github.com/iterative/dvc/blob/e0fd0790fc77b19694aabef3c94b52e7cd0135f1/dvc/fs/__init__.py#L120

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Still confused though, so is the user param deprecated currently? @efiop

jorgeorpinel added a commit that referenced this pull request Nov 29, 2021
* ref: copy edit Kerberos info in `remote add` (WebHDFS)
rel. #3009

* ref: re-explain Kerberos info in `remote modify` (WebHDFS)

* typos & CE

* ref: separate HDFS and WebHDFS config notes
per #3019 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants