Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up utils.hub using the latest from hf_hub #18857

Merged
merged 4 commits into from
Sep 2, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@
"fugashi>=1.0",
"GitPython<3.1.19",
"hf-doc-builder>=0.3.0",
"huggingface-hub>=0.8.1,<1.0",
"huggingface-hub>=0.9.0,<1.0",
"importlib_metadata",
"ipadic>=1.0.0,<2.0",
"isort>=5.5.4",
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/dependency_versions_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"fugashi": "fugashi>=1.0",
"GitPython": "GitPython<3.1.19",
"hf-doc-builder": "hf-doc-builder>=0.3.0",
"huggingface-hub": "huggingface-hub>=0.8.1,<1.0",
"huggingface-hub": "huggingface-hub>=0.9.0,<1.0",
"importlib_metadata": "importlib_metadata",
"ipadic": "ipadic>=1.0.0,<2.0",
"isort": "isort>=5.5.4",
Expand Down
59 changes: 15 additions & 44 deletions src/transformers/utils/hub.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import sys
import traceback
import warnings
from contextlib import contextmanager
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Union
from uuid import uuid4
Expand Down Expand Up @@ -249,28 +248,6 @@ def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_h
return cached_file if os.path.isfile(cached_file) else None


# If huggingface_hub changes the class of error for this to FileNotFoundError, we will be able to avoid that in the
# future.
LOCAL_FILES_ONLY_HF_ERROR = (
"Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co "
"look-ups and downloads online, set 'local_files_only' to False."
)


# In the future, this ugly contextmanager can be removed when huggingface_hub as a released version where we can
# activate/deactivate progress bars.
@contextmanager
def _patch_hf_hub_tqdm():
"""
A context manager to make huggingface hub use the tqdm version of Transformers (which is controlled by some utils)
in logging.
"""
old_tqdm = huggingface_hub.file_download.tqdm
huggingface_hub.file_download.tqdm = tqdm
yield
huggingface_hub.file_download.tqdm = old_tqdm


def cached_file(
path_or_repo_id: Union[str, os.PathLike],
filename: str,
Expand Down Expand Up @@ -375,20 +352,19 @@ def cached_file(
user_agent = http_user_agent(user_agent)
try:
# Load from URL or cache if already cached
with _patch_hf_hub_tqdm():
resolved_file = hf_hub_download(
path_or_repo_id,
filename,
subfolder=None if len(subfolder) == 0 else subfolder,
revision=revision,
cache_dir=cache_dir,
user_agent=user_agent,
force_download=force_download,
proxies=proxies,
resume_download=resume_download,
use_auth_token=use_auth_token,
local_files_only=local_files_only,
)
resolved_file = hf_hub_download(
path_or_repo_id,
filename,
subfolder=None if len(subfolder) == 0 else subfolder,
revision=revision,
cache_dir=cache_dir,
user_agent=user_agent,
force_download=force_download,
proxies=proxies,
resume_download=resume_download,
use_auth_token=use_auth_token,
local_files_only=local_files_only,
)

except RepositoryNotFoundError:
raise EnvironmentError(
Expand Down Expand Up @@ -421,13 +397,8 @@ def cached_file(
return None

raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
except ValueError as err:
# HuggingFace Hub returns a ValueError for a missing file when local_files_only=True we need to catch it here
# This could be caught above along in `EntryNotFoundError` if hf_hub sent a different error message here
if LOCAL_FILES_ONLY_HF_ERROR in err.args[0] and local_files_only and not _raise_exceptions_for_missing_entries:
return None

# Otherwise we try to see if we have a cached version (not up to date):
except ValueError:
# We try to see if we have a cached version (not up to date):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here to be more precise you can use except LocalEntryNotFoundError.

And since the newly introduced exception is defined as (EntryNotFoundError, FileNotFoundError, ValueError), you will never catch it because you catch EntryNotFoundError line 382 above.

    except EntryNotFoundError:
        if not _raise_exceptions_for_missing_entries:
            return None
        if revision is None:
            revision = "main"
        raise EnvironmentError(
            f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
            f"'https://huggingface.co/{path_or_repo_id}/{revision}' for available files."
        )

In the end, I suggest to use except LocalEntryNotFoundError: instead of except ValueError: and to reorder the except (...).

(Here is a link to the doc.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will adapt.

resolved_file = try_to_load_from_cache(cache_dir, path_or_repo_id, full_filename, revision=revision)
if resolved_file is not None:
return resolved_file
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/utils/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@

from tqdm import auto as tqdm_lib

import huggingface_hub.utils as hf_hub_utils


_lock = threading.Lock()
_default_handler: Optional[logging.Handler] = None
Expand Down Expand Up @@ -336,9 +338,11 @@ def enable_progress_bar():
"""Enable tqdm progress bar."""
global _tqdm_active
_tqdm_active = True
hf_hub_utils.enable_progress_bars()


def disable_progress_bar():
"""Disable tqdm progress bar."""
global _tqdm_active
_tqdm_active = False
hf_hub_utils.disable_progress_bars()
18 changes: 6 additions & 12 deletions tests/utils/test_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@

import os
import unittest
from unittest.mock import patch

import transformers.models.bart.tokenization_bart
from transformers import AutoConfig, logging
from huggingface_hub.utils import are_progress_bars_disabled
from transformers import logging
from transformers.testing_utils import CaptureLogger, mockenv, mockenv_context
from transformers.utils.logging import disable_progress_bar, enable_progress_bar

Expand Down Expand Up @@ -126,14 +126,8 @@ def test_advisory_warnings(self):


def test_set_progress_bar_enabled():
TINY_MODEL = "hf-internal-testing/tiny-random-distilbert"
with patch("tqdm.auto.tqdm") as mock_tqdm:
disable_progress_bar()
_ = AutoConfig.from_pretrained(TINY_MODEL, force_download=True)
mock_tqdm.assert_not_called()
disable_progress_bar()
assert are_progress_bars_disabled()

mock_tqdm.reset_mock()

enable_progress_bar()
_ = AutoConfig.from_pretrained(TINY_MODEL, force_download=True)
mock_tqdm.assert_called()
enable_progress_bar()
assert not are_progress_bars_disabled()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not manage to adapt this test with mock since:

  • the tqdm from huggingface_hub.file_download is always called
  • the calls to tqdm.auto.tqdm are not detected anymore
    If someone has a genius idea, happy to hear!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In huggingface_hub I test that by using the capsys fixture from pytest that catch the output while running the tests. Then you can check if the output stayed quiet or if you have progress bars in it.

You can have a look here for the setup and here for a test where tqdm is enabled.

However from transformers's point of view, this is a job done by an external library (huggingface_hub) so testing it here would be a duplicate of what is already tested. It's would not be wrong to test it but the current test you wrote is good enough I think 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in general, are transformers's _tqdm_active and _tqdm_cls used anywhere now ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they're not used anywhere now (we do use tqdm in other places, so should switch to the one in logging, but probably for another PR).