Skip to content

Commit

Permalink
Use HF_HUB_OFFLINE + fix has_file in offline mode (huggingface#31016)
Browse files Browse the repository at this point in the history
* Fix has_file in offline mode

* harmonize env variable for offline mode

* Switch to HF_HUB_OFFLINE

* fix test

* revert test_offline to test TRANSFORMERS_OFFLINE

* Add new offline test

* merge conflicts

* docs
  • Loading branch information
Wauplin authored May 29, 2024
1 parent bfe6f51 commit c3044ec
Show file tree
Hide file tree
Showing 16 changed files with 148 additions and 76 deletions.
4 changes: 2 additions & 2 deletions docs/source/de/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE`

## Offline Modus

Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `TRANSFORMERS_OFFLINE=1`, um dieses Verhalten zu aktivieren.
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `HF_HUB_OFFLINE=1`, um dieses Verhalten zu aktivieren.

<Tip>

Expand All @@ -179,7 +179,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Führen Sie das gleiche Programm in einer Offline-Instanz mit aus:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/hu

## Offline mode

Run 🤗 Transformers in a firewalled or offline environment with locally cached files by setting the environment variable `TRANSFORMERS_OFFLINE=1`.
Run 🤗 Transformers in a firewalled or offline environment with locally cached files by setting the environment variable `HF_HUB_OFFLINE=1`.

<Tip>

Expand All @@ -178,7 +178,7 @@ Add [🤗 Datasets](https://huggingface.co/docs/datasets/) to your offline train
</Tip>

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/es/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ Los modelos preentrenados se descargan y almacenan en caché localmente en: `~/.

## Modo Offline

🤗 Transformers puede ejecutarse en un entorno con firewall o fuera de línea (offline) usando solo archivos locales. Configura la variable de entorno `TRANSFORMERS_OFFLINE=1` para habilitar este comportamiento.
🤗 Transformers puede ejecutarse en un entorno con firewall o fuera de línea (offline) usando solo archivos locales. Configura la variable de entorno `HF_HUB_OFFLINE=1` para habilitar este comportamiento.

<Tip>

Expand All @@ -171,7 +171,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Ejecuta este mismo programa en una instancia offline con el siguiente comando:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/fr/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Les modèles pré-entraînés sont téléchargés et mis en cache localement dan

## Mode hors ligne

🤗 Transformers peut fonctionner dans un environnement cloisonné ou hors ligne en n'utilisant que des fichiers locaux. Définissez la variable d'environnement `TRANSFORMERS_OFFLINE=1` pour activer ce mode.
🤗 Transformers peut fonctionner dans un environnement cloisonné ou hors ligne en n'utilisant que des fichiers locaux. Définissez la variable d'environnement `HF_HUB_OFFLINE=1` pour activer ce mode.

<Tip>

Expand All @@ -180,7 +180,7 @@ Ajoutez [🤗 Datasets](https://huggingface.co/docs/datasets/) à votre processu
</Tip>

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/it/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ I modelli pre-allenati sono scaricati e memorizzati localmente nella cache in: `

## Modalità Offline

🤗 Transformers può essere eseguita in un ambiente firewalled o offline utilizzando solo file locali. Imposta la variabile d'ambiente `TRANSFORMERS_OFFLINE=1` per abilitare questo comportamento.
🤗 Transformers può essere eseguita in un ambiente firewalled o offline utilizzando solo file locali. Imposta la variabile d'ambiente `HF_HUB_OFFLINE=1` per abilitare questo comportamento.

<Tip>

Expand All @@ -169,7 +169,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Esegui lo stesso programma in un'istanza offline con:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/ja/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ conda install conda-forge::transformers

## オフラインモード

🤗 Transformersはローカルファイルのみを使用することでファイアウォールやオフラインの環境でも動作させることができます。この動作を有効にするためには、環境変数`TRANSFORMERS_OFFLINE=1`を設定します。
🤗 Transformersはローカルファイルのみを使用することでファイアウォールやオフラインの環境でも動作させることができます。この動作を有効にするためには、環境変数`HF_HUB_OFFLINE=1`を設定します。

<Tip>

Expand All @@ -174,7 +174,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
オフラインインスタンスでこの同じプログラムを実行します:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/ko/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ conda install conda-forge::transformers

## 오프라인 모드[[offline-mode]]

🤗 Transformers를 로컬 파일만 사용하도록 해서 방화벽 또는 오프라인 환경에서 실행할 수 있습니다. 활성화하려면 `TRANSFORMERS_OFFLINE=1` 환경 변수를 설정하세요.
🤗 Transformers를 로컬 파일만 사용하도록 해서 방화벽 또는 오프라인 환경에서 실행할 수 있습니다. 활성화하려면 `HF_HUB_OFFLINE=1` 환경 변수를 설정하세요.

<Tip>

Expand All @@ -174,7 +174,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
오프라인 기기에서 동일한 프로그램을 다음과 같이 실행할 수 있습니다.

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/pt/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ No Windows, este diretório pré-definido é dado por `C:\Users\username\.cache\
## Modo Offline

O 🤗 Transformers também pode ser executado num ambiente de firewall ou fora da rede (offline) usando arquivos locais.
Para tal, configure a variável de ambiente de modo que `TRANSFORMERS_OFFLINE=1`.
Para tal, configure a variável de ambiente de modo que `HF_HUB_OFFLINE=1`.

<Tip>

Expand All @@ -191,7 +191,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
Execute esse mesmo programa numa instância offline com o seguinte comando:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
4 changes: 2 additions & 2 deletions docs/source/zh/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ conda install conda-forge::transformers

## 离线模式

🤗 Transformers 可以仅使用本地文件在防火墙或离线环境中运行。设置环境变量 `TRANSFORMERS_OFFLINE=1` 以启用该行为。
🤗 Transformers 可以仅使用本地文件在防火墙或离线环境中运行。设置环境变量 `HF_HUB_OFFLINE=1` 以启用该行为。

<Tip>

Expand All @@ -186,7 +186,7 @@ python examples/pytorch/translation/run_translation.py --model_name_or_path goog
在离线环境中运行相同的程序:

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
```

Expand Down
2 changes: 2 additions & 0 deletions src/transformers/modeling_flax_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -823,6 +823,8 @@ def from_pretrained(
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, SAFE_WEIGHTS_INDEX_NAME, **has_file_kwargs):
is_sharded = True
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/modeling_tf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2864,6 +2864,8 @@ def from_pretrained(
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, SAFE_WEIGHTS_INDEX_NAME, **has_file_kwargs):
is_sharded = True
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3405,6 +3405,8 @@ def from_pretrained(
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
cached_file_kwargs = {
"cache_dir": cache_dir,
Expand Down Expand Up @@ -3432,6 +3434,8 @@ def from_pretrained(
"revision": revision,
"proxies": proxies,
"token": token,
"cache_dir": cache_dir,
"local_files_only": local_files_only,
}
if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
raise EnvironmentError(
Expand Down Expand Up @@ -3459,6 +3463,7 @@ def from_pretrained(
f" {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
f" {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or {FLAX_WEIGHTS_NAME}."
)

except EnvironmentError:
# Raise any environment error raise by `cached_file`. It will have a helpful error message adapted
# to the original exception.
Expand Down
56 changes: 47 additions & 9 deletions src/transformers/utils/hub.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,11 @@
GatedRepoError,
HFValidationError,
LocalEntryNotFoundError,
OfflineModeIsEnabled,
RepositoryNotFoundError,
RevisionNotFoundError,
build_hf_headers,
get_session,
hf_raise_for_status,
send_telemetry,
)
Expand All @@ -75,7 +77,7 @@

logger = logging.get_logger(__name__) # pylint: disable=invalid-name

_is_offline_mode = True if os.environ.get("TRANSFORMERS_OFFLINE", "0").upper() in ENV_VARS_TRUE_VALUES else False
_is_offline_mode = huggingface_hub.constants.HF_HUB_OFFLINE


def is_offline_mode():
Expand Down Expand Up @@ -599,11 +601,17 @@ def has_file(
revision: Optional[str] = None,
proxies: Optional[Dict[str, str]] = None,
token: Optional[Union[bool, str]] = None,
*,
local_files_only: bool = False,
cache_dir: Union[str, Path, None] = None,
repo_type: Optional[str] = None,
**deprecated_kwargs,
):
"""
Checks if a repo contains a given file without downloading it. Works for remote repos and local folders.
If offline mode is enabled, checks if the file exists in the cache.
<Tip warning={false}>
This function will raise an error if the repository `path_or_repo` is not valid or if `revision` does not exist for
Expand All @@ -621,15 +629,41 @@ def has_file(
raise ValueError("`token` and `use_auth_token` are both specified. Please set only the argument `token`.")
token = use_auth_token

# If path to local directory, check if the file exists
if os.path.isdir(path_or_repo):
return os.path.isfile(os.path.join(path_or_repo, filename))

url = hf_hub_url(path_or_repo, filename=filename, revision=revision)
headers = build_hf_headers(token=token, user_agent=http_user_agent())
# Else it's a repo => let's check if the file exists in local cache or on the Hub

# Check if file exists in cache
# This information might be outdated so it's best to also make a HEAD call (if allowed).
cached_path = try_to_load_from_cache(
repo_id=path_or_repo,
filename=filename,
revision=revision,
repo_type=repo_type,
cache_dir=cache_dir,
)
has_file_in_cache = isinstance(cached_path, str)

# If local_files_only, don't try the HEAD call
if local_files_only:
return has_file_in_cache

r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=10)
# Check if the file exists
try:
hf_raise_for_status(r)
response = get_session().head(
hf_hub_url(path_or_repo, filename=filename, revision=revision, repo_type=repo_type),
headers=build_hf_headers(token=token, user_agent=http_user_agent()),
allow_redirects=False,
proxies=proxies,
timeout=10,
)
except OfflineModeIsEnabled:
return has_file_in_cache

try:
hf_raise_for_status(response)
return True
except GatedRepoError as e:
logger.error(e)
Expand All @@ -640,16 +674,20 @@ def has_file(
) from e
except RepositoryNotFoundError as e:
logger.error(e)
raise EnvironmentError(f"{path_or_repo} is not a local folder or a valid repository name on 'https://hf.co'.")
raise EnvironmentError(
f"{path_or_repo} is not a local folder or a valid repository name on 'https://hf.co'."
) from e
except RevisionNotFoundError as e:
logger.error(e)
raise EnvironmentError(
f"{revision} is not a valid git identifier (branch name, tag name or commit id) that exists for this "
f"model name. Check the model page at 'https://huggingface.co/{path_or_repo}' for available revisions."
)
) from e
except EntryNotFoundError:
return False # File does not exist
except requests.HTTPError:
# We return false for EntryNotFoundError (logical) as well as any connection error.
return False
# Any authentication/authorization error will be caught here => default to cache
return has_file_in_cache


class PushToHubMixin:
Expand Down
1 change: 0 additions & 1 deletion tests/test_configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os
import shutil
Expand Down
19 changes: 16 additions & 3 deletions tests/utils/test_hub_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import unittest.mock as mock
from pathlib import Path

from huggingface_hub import hf_hub_download
from requests.exceptions import HTTPError

from transformers.utils import (
Expand All @@ -33,6 +34,7 @@


RANDOM_BERT = "hf-internal-testing/tiny-random-bert"
TINY_BERT_PT_ONLY = "hf-internal-testing/tiny-bert-pt-only"
CACHE_DIR = os.path.join(TRANSFORMERS_CACHE, "models--hf-internal-testing--tiny-random-bert")
FULL_COMMIT_HASH = "9b8c223d42b2188cb49d29af482996f9d0f3e5a6"

Expand Down Expand Up @@ -99,9 +101,20 @@ def test_non_existence_is_cached(self):
mock_head.assert_called()

def test_has_file(self):
self.assertTrue(has_file("hf-internal-testing/tiny-bert-pt-only", WEIGHTS_NAME))
self.assertFalse(has_file("hf-internal-testing/tiny-bert-pt-only", TF2_WEIGHTS_NAME))
self.assertFalse(has_file("hf-internal-testing/tiny-bert-pt-only", FLAX_WEIGHTS_NAME))
self.assertTrue(has_file(TINY_BERT_PT_ONLY, WEIGHTS_NAME))
self.assertFalse(has_file(TINY_BERT_PT_ONLY, TF2_WEIGHTS_NAME))
self.assertFalse(has_file(TINY_BERT_PT_ONLY, FLAX_WEIGHTS_NAME))

def test_has_file_in_cache(self):
with tempfile.TemporaryDirectory() as tmp_dir:
# Empty cache dir + offline mode => return False
assert not has_file(TINY_BERT_PT_ONLY, WEIGHTS_NAME, local_files_only=True, cache_dir=tmp_dir)

# Populate cache dir
hf_hub_download(TINY_BERT_PT_ONLY, WEIGHTS_NAME, cache_dir=tmp_dir)

# Cache dir + offline mode => return True
assert has_file(TINY_BERT_PT_ONLY, WEIGHTS_NAME, local_files_only=True, cache_dir=tmp_dir)

def test_get_file_from_repo_distant(self):
# `get_file_from_repo` returns None if the file does not exist
Expand Down
Loading

0 comments on commit c3044ec

Please sign in to comment.