Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Contrib] Added default non-verbose to download_testdata(), pass to download() #8533

Merged
merged 3 commits into from
Jul 29, 2021

Conversation

Lunderberg
Copy link
Contributor

Intent is to make this utility easier to use in one-off debugging scripts, where extra print statements may make it harder to track down bugs. Did a small bit of cleanup as well, while I was in the file.

  • Use tempfile.TemporaryDirectory instead of explicit cleanup.

  • Pass through verbose/retries arguments if replacing a corrupted copy.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing great work @Lunderberg!

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, @Lunderberg. A few comments below.

verbose: int, optional
Verbose level
verbose: bool, optional
If status messages should be printed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If status messages should be printed.
If status messages should be printed. Defaults to false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, and I would add it if not for the change/removal of verbose following your suggestion to use logging.

print("exist file got corrupted, downloading %s file freshly..." % path)
download(url, path, True, False)
if verbose:
print(f"Existing file {path} has incorrect size, downloading fresh copy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of print() intentional? as opposed to using the use of logging module, similar to other parts of our code base?

One advantage of logging is that it can set various log levels, removing the need for the blocks like if verbose: print("message"), and instead, we could use verbose to set log level into e.g. DEBUG in this module.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason for me, other than it being the way the file was previously written. Looking through the history, there aren't any commits that suggest a need for print statements, so I agree that logging is the better route to go.

With logging, I think that we can drop the verbose flag altogether. The user can still specify whether they want the verbose output by setting that logger's log level, but it doesn't need to be part of the main function call. For the information in these print statuses, I think they should be at INFO level, except for the download progress bar which can be at DEBUG level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with the log levels you proposed. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! I ended up using WARNING level a couple of times as well, for the corrupted file and error-retry paths.

else:
TEST_DATA_ROOT_PATH = Path(Path("~").expanduser(), ".tvm_test_data")
TEST_DATA_ROOT_PATH.mkdir(parents=True, exist_ok=True)


def download_testdata(url, relpath, module=None, overwrite=False):
def download_testdata(url, relpath, module=None, overwrite=False, verbose=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add overwrite to the docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and can do.

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Happy to merge after CI

…ownload().

Minor cleanup as well, while in the file

- Using tempfile.TemporaryDirectory instead of explicit cleanup.

- Pass through verbose/retries arguments if replacing a corrupted
  copy.
Initial implementation using tempfile.TemporaryDirectory assumed that
the tempdir and output location were on the same drive, and could be
renamed.  This update falls back to copying from the temporary
directory, in case the tempdir is on a different drive.
@Lunderberg Lunderberg force-pushed the download_testdata_nonverbose branch from 36b1479 to 44fa935 Compare July 28, 2021 13:06
@masahi masahi merged commit 6b2cbfe into apache:main Jul 29, 2021
@Lunderberg Lunderberg deleted the download_testdata_nonverbose branch July 29, 2021 10:21
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…ownload() (apache#8533)

* [Contrib] Added default non-verbose to download_testdata(), pass to download().

Minor cleanup as well, while in the file

- Using tempfile.TemporaryDirectory instead of explicit cleanup.

- Pass through verbose/retries arguments if replacing a corrupted
  copy.

* [Contrib] Switched download.py from print statements to logging

* [Contrib] Added shutil.copy2 fallback after downloading file.

Initial implementation using tempfile.TemporaryDirectory assumed that
the tempdir and output location were on the same drive, and could be
renamed.  This update falls back to copying from the temporary
directory, in case the tempdir is on a different drive.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…ownload() (apache#8533)

* [Contrib] Added default non-verbose to download_testdata(), pass to download().

Minor cleanup as well, while in the file

- Using tempfile.TemporaryDirectory instead of explicit cleanup.

- Pass through verbose/retries arguments if replacing a corrupted
  copy.

* [Contrib] Switched download.py from print statements to logging

* [Contrib] Added shutil.copy2 fallback after downloading file.

Initial implementation using tempfile.TemporaryDirectory assumed that
the tempdir and output location were on the same drive, and could be
renamed.  This update falls back to copying from the temporary
directory, in case the tempdir is on a different drive.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants