Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhdfs: migrate to fsspec #6662

Merged
merged 2 commits into from
Oct 7, 2021
Merged

Conversation

isidentical
Copy link
Contributor

Resolves #6583, awaiting an fsspec release.

@isidentical isidentical added the fs: webhdfs Related to the WebHDFS filesystem label Sep 21, 2021
@isidentical isidentical requested a review from efiop September 21, 2021 13:32
Comment on lines 41 to 43
try:
return hdfs.config.Config(self.hdfscli_config).get_client(
self.alias
)
except hdfs.util.HdfsError as exc:
exc_msg = str(exc)
errors = (
"No alias specified",
"Invalid configuration file",
f"Alias {self.alias} not found",
)
if not any(err in exc_msg for err in errors):
raise

http_url = f"http://{self.host}:{self.port}"
logger.debug("URL: %s", http_url)

if self.token is not None:
client = hdfs.TokenClient(http_url, token=self.token, root="/")
else:
client = hdfs.InsecureClient(
http_url, user=self.user, root="/"
config = Config(path)
except HdfsError:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: discussed that we might just drop hdfscli config, since we are no longer using it as a backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical Wdyt, removing it or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can.

dvc/fs/webhdfs.py Outdated Show resolved Hide resolved
@efiop
Copy link
Contributor

efiop commented Oct 6, 2021

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Oct 6, 2021

Command rebase: success

Branch has been successfully rebased

Comment on lines 41 to 43
try:
return hdfs.config.Config(self.hdfscli_config).get_client(
self.alias
)
except hdfs.util.HdfsError as exc:
exc_msg = str(exc)
errors = (
"No alias specified",
"Invalid configuration file",
f"Alias {self.alias} not found",
)
if not any(err in exc_msg for err in errors):
raise

http_url = f"http://{self.host}:{self.port}"
logger.debug("URL: %s", http_url)

if self.token is not None:
client = hdfs.TokenClient(http_url, token=self.token, root="/")
else:
client = hdfs.InsecureClient(
http_url, user=self.user, root="/"
config = Config(path)
except HdfsError:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical Wdyt, removing it or not?

@isidentical isidentical marked this pull request as ready for review October 7, 2021 08:44
@isidentical isidentical requested a review from a team as a code owner October 7, 2021 08:44
@isidentical
Copy link
Contributor Author

It seems like the docker is randomly failing on some runs with timeouts. I've increased the timeout from 30 to 60 seconds, hopefully that will solve it. (In the past we were mocking hdfs-cli's InsecureClient, so we weren't using the real hdfs server).

dvc/fs/webhdfs.py Outdated Show resolved Hide resolved
@efiop efiop merged commit 4f3b28e into iterative:master Oct 7, 2021
skshetry added a commit to endremborza/dvc that referenced this pull request Oct 8, 2021
* master: (47 commits)
  make dependabot updates work (iterative#6760)
  move defaults to the class level (iterative#6762)
  build(deps): Bump dvclive from 0.3.0 to 0.4.0 (iterative#6753)
  move requirements to setup.cfg (iterative#6758)
  gha: cancel previous workflows (iterative#6756)
  pyinstaller: hooks: remove webhdfs
  webhdfs: migrate to fsspec (iterative#6662)
  gha: benchmarks: fix pytest flags (iterative#6749)
  hdfs: migrate to fsspec (iterative#6604)
  Pin dvclive to 0.3.0
  command: Should not consume 'print' to output default remote info (iterative#6650)
  command: output prettify json in tty (iterative#6743)
  gha: run benchmarks on PRs (iterative#6733)
  Run on Python3.10 (iterative#6745)
  Fix dvclive running subdir (iterative#6740)
  exp init: only ask for that are not provided in an interactive mode (iterative#6739)
  move exp init logic to dvc.repo.experiments.init (iterative#6738)
  build(deps): Bump jaraco-windows from 5.6.0 to 5.7.0 (iterative#6735)
  build(deps): Bump pytest-cov from 2.12.1 to 3.0.0 (iterative#6736)
  Bump typing_extensions to >=3.7.4 (iterative#6731) (iterative#6732)
  ...
@efiop efiop added the enhancement Enhances DVC label Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC fs: webhdfs Related to the WebHDFS filesystem
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

webhdfs: migrate to fsspec
2 participants