Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError when reading file symlink from cache dir. #1388

Closed
ivlcic opened this issue Mar 13, 2023 · 4 comments
Closed

FileNotFoundError when reading file symlink from cache dir. #1388

ivlcic opened this issue Mar 13, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@ivlcic
Copy link

ivlcic commented Mar 13, 2023

Describe the bug

When using huggingface_hub==0.13.1 the following cache dir structure is created

0 ✓ nikola@koshast ~/project/tmp $ tree
.
└── xlmrb
    └── models--xlm-roberta-base
        ├── blobs
        │   └── 1960141250d189366dfb76630ba794a9c104ec07
        ├── refs
        │   └── main
        └── snapshots
            └── 42f548f32366559214515ec137cdd16002968bf6
                └── config.json -> tmp/xlmrb/models--xlm-roberta-base/blobs/1960141250d189366dfb76630ba794a9c104ec07

7 directories, 3 files
0 ✓ nikola@koshast ~/project/tmp $

Which fails with:

2023-03-13 11:55:46 [INFO]  ner_train [43]: Will run on specified cuda [0] device only!
2023-03-13 11:56:52 DEBUG   urllib3.connectionpool 1003: Starting new HTTPS connection (1): huggingface.co:443
2023-03-13 11:56:53 DEBUG   urllib3.connectionpool 456: https://huggingface.co:443 "HEAD /xlm-roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 404 0
2023-03-13 11:58:14 DEBUG   urllib3.connectionpool 1003: Starting new HTTPS connection (1): huggingface.co:443
2023-03-13 11:58:15 DEBUG   urllib3.connectionpool 456: https://huggingface.co:443 "HEAD /xlm-roberta-base/resolve/main/config.json HTTP/1.1" 200 0
2023-03-13 11:59:14 DEBUG   filelock 172: Attempting to acquire lock 140655888411680 on tmp/xlmrb/models--xlm-roberta-base/blobs/1960141250d189366dfb76630ba794a9c104ec07.lock
2023-03-13 11:59:14 DEBUG   filelock 176: Lock 140655888411680 acquired on tmp/xlmrb/models--xlm-roberta-base/blobs/1960141250d189366dfb76630ba794a9c104ec07.lock
2023-03-13 11:59:43 DEBUG   urllib3.connectionpool 1003: Starting new HTTPS connection (1): huggingface.co:443
2023-03-13 11:59:43 DEBUG   urllib3.connectionpool 456: https://huggingface.co:443 "GET /xlm-roberta-base/resolve/main/config.json HTTP/1.1" 200 615
Downloading (…)lve/main/config.json: 100%|██████████| 615/615 [00:00<00:00, 202kB/s]
2023-03-13 12:06:12 DEBUG   filelock 209: Attempting to release lock 140655888411680 on tmp/xlmrb/models--xlm-roberta-base/blobs/1960141250d189366dfb76630ba794a9c104ec07.lock
2023-03-13 12:06:12 DEBUG   filelock 212: Lock 140655888411680 released on tmp/xlmrb/models--xlm-roberta-base/blobs/1960141250d189366dfb76630ba794a9c104ec07.lock
Traceback (most recent call last):
  File "/home/nikola/project/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 650, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/nikola/project/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 736, in _dict_from_json_file
    with open(json_file, "r", encoding="utf-8") as reader:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/xlmrb/models--xlm-roberta-base/snapshots/42f548f32366559214515ec137cdd16002968bf6/config.json'

When invoking:

AutoTokenizer.from_pretrained('xlm-roberta-base', cache_dir='tmp/xlmrb')

When using huggingface_hub==0.11.1 the following cache dir structure is created and everything works

0 ✓ nikola@koshast ~/project/tmp $ tree
.
└── xlmrb
    └── models--xlm-roberta-base
        ├── blobs
        │   ├── 1960141250d189366dfb76630ba794a9c104ec07
        │   ├── 463f3414782c1c9405828c9b31bfa36dda1f45c5
        │   ├── 9d83baaafea92d36de26002c8135a427d55ee6fdc4faaa6e400be4c47724a07e
        │   └── db9af13bf09fd3028ca32be90d3fb66d5e470399
        ├── refs
        │   └── main
        └── snapshots
            └── 42f548f32366559214515ec137cdd16002968bf6
                ├── config.json -> ../../blobs/1960141250d189366dfb76630ba794a9c104ec07
                ├── pytorch_model.bin -> ../../blobs/9d83baaafea92d36de26002c8135a427d55ee6fdc4faaa6e400be4c47724a07e
                ├── sentencepiece.bpe.model -> ../../blobs/db9af13bf09fd3028ca32be90d3fb66d5e470399
                └── tokenizer.json -> ../../blobs/463f3414782c1c9405828c9b31bfa36dda1f45c5

7 directories, 9 files
0 ✓ nikola@koshast ~/project/tmp $

The following packages are installed in venv:

0 ✓ nikola@koshast ~/project/tmp $ pip list
Package                  Version
------------------------ ----------
aiohttp                  3.8.4
aiosignal                1.3.1
async-timeout            4.0.2
attrs                    22.2.0
certifi                  2022.12.7
charset-normalizer       2.0.12
classla                  1.2.0
contourpy                1.0.7
cycler                   0.11.0
datasets                 2.10.1
dill                     0.3.6
eland                    8.3.0
elastic-transport        8.4.0
elasticsearch            8.6.2
emoji                    2.2.0
evaluate                 0.4.0
filelock                 3.9.0
fonttools                4.39.0
frozenlist               1.3.3
fsspec                   2023.3.0
huggingface-hub          0.13.1
idna                     3.4
joblib                   1.2.0
kiwisolver               1.4.4
lxml                     4.9.2
matplotlib               3.7.1
multidict                6.0.4
multiprocess             0.70.14
numpy                    1.24.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
obeliks                  1.1.6
packaging                23.0
pandas                   1.5.3
Pillow                   9.4.0
pip                      23.0.1
protobuf                 4.21.2
pyarrow                  11.0.0
pyparsing                3.0.9
python-dateutil          2.8.2
pytz                     2022.7.1
PyYAML                   6.0
regex                    2022.10.31
reldi-tokeniser          1.0.2
requests                 2.28.0
responses                0.18.0
scikit-learn             1.2.2
scipy                    1.10.1
seqeval                  1.2.2
setuptools               65.5.0
six                      1.16.0
stanza                   1.4.2
threadpoolctl            3.1.0
tokenizers               0.13.2
torch                    1.13.1
tqdm                     4.62.3
transformers             4.26.1
typing_extensions        4.5.0
urllib3                  1.26.15
wheel                    0.38.4
xxhash                   3.2.0
yarl                     1.8.2

Reproduction

No response

Logs

No response

System info

- huggingface_hub version: 0.13.1
- Platform: Linux-6.1.12-arch1-1-x86_64-with-glibc2.37
- Python version: 3.10.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/nikola/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.4.0
- hf_transfer: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/nikola/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/nikola/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/nikola/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
@ivlcic ivlcic added the bug Something isn't working label Mar 13, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Mar 13, 2023

Hi @ivlcic , thanks for reporting this issue. I'll try to reproduce it and keep you updated.

Just to be sure, does it work with huggingface 0.13.1 when passing

AutoTokenizer.from_pretrained('xlm-roberta-base', cache_dir='/tmp/xlmrb')

with /tmp? (Or any absolute path if /tmp is not the right one).
Doesn't mean it's not a bug in huggingface_hub but it' s good if it can help you have a workaround while we fix this.

@ivlcic
Copy link
Author

ivlcic commented Mar 13, 2023

Yes with absolute path it works:

0 ✓ nikola@koshast /tmp/mcbert $ tree
.
└── models--bert-base-multilingual-cased
    ├── blobs
    │   ├── 420b0fc31334c64ddf53cc3e9222a6d4c59d0cae
    │   ├── b122e74db13b415ea824c074da33c1c44f0d13a3
    │   ├── e3c6d456fb2616f01a9a6cd01a1be1a36353ed22
    │   └── e837bab60a5d204e29622d127c2dafe508aa0731
    ├── refs
    │   └── main
    └── snapshots
        └── fdfce55e83dbed325647a63e7e1f5de19f0382ba
            ├── config.json -> /tmp/mcbert/models--bert-base-multilingual-cased/blobs/b122e74db13b415ea824c074da33c1c44f0d13a3
            ├── tokenizer_config.json -> /tmp/mcbert/models--bert-base-multilingual-cased/blobs/e3c6d456fb2616f01a9a6cd01a1be1a36353ed22
            ├── tokenizer.json -> /tmp/mcbert/models--bert-base-multilingual-cased/blobs/420b0fc31334c64ddf53cc3e9222a6d4c59d0cae
            └── vocab.txt -> /tmp/mcbert/models--bert-base-multilingual-cased/blobs/e837bab60a5d204e29622d127c2dafe508aa0731

6 directories, 9 files
0 ✓ nikola@koshast /tmp/mcbert $

@Wauplin
Copy link
Contributor

Wauplin commented Mar 13, 2023

Hi @ivlcic, I made a patch release to fix this: huggingface_hub==0.13.2 (release notes). Thanks again for reporting and please let me know if you encounter any related issue. Before using it you must delete the existing corrupted tmp/mcbert folder.

@Wauplin Wauplin closed this as completed Mar 13, 2023
@ivlcic
Copy link
Author

ivlcic commented Mar 13, 2023

Wow, that was super fast!
Thanks for everything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants