Skip to content

AutoTokenizer.from_pretrained does not propagate token #39030

@anakin87

Description

@anakin87

Since Transformers 4.52, the Autotokenizer.from_pretrained loading mechanism has changed and the token is not correctly propagated.

System Info

(Colab)

  • transformers version: 4.52.4
  • Platform: Linux-6.1.123+-x86_64-with-glibc2.35
  • Python version: 3.11.13
  • Huggingface_hub version: 0.33.0
  • Safetensors version: 0.5.3
  • Accelerate version: 1.7.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cu124 (False)
  • Tensorflow version (GPU?): 2.18.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.10.6 (cpu)
  • Jax version: 0.5.2
  • JaxLib version: 0.5.1
  • Using distributed or parallel set-up in script?:

Who can help?

@Rocketknight1 @ArthurZucker @Wauplin
I have the impression that this is related to #36588

Reproduction

In this code example, I am trying to load a private tokenizer using the token parameter (not the env var).

import os
from transformers import AutoTokenizer

# we first make sure that the token is not present in environment variables
# if the env var is present, THE BUG DOES NOT OCCUR
os.environ.pop('HF_TOKEN', None)


model = "deepset/bert-base-NER" # a valid private model I can access
token = "..."

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model, token=token)
Error /usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:86: UserWarning: Access to the secret `HF_TOKEN` has not been granted on this notebook. You will not be requested again. Please restart the session if you want to be prompted again. warnings.warn( --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) [/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_http.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name) 408 try: --> 409 response.raise_for_status() 410 except HTTPError as e:

8 frames
/usr/local/lib/python3.11/dist-packages/requests/models.py in raise_for_status(self)
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
1025

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/deepset/bert-base-NER/tree/main/additional_chat_templates?recursive=False&expand=False

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last)
/tmp/ipython-input-6-4075835132.py in <cell line: 0>()
6
----> 7 tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model, token=token)

/usr/local/lib/python3.11/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
1030
1031 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
-> 1032 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
1033 else:
1034 if tokenizer_class_py is not None:

/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
1966 )
1967 else:
-> 1968 for template in list_repo_templates(
1969 pretrained_model_name_or_path,
1970 local_files_only=local_files_only,

/usr/local/lib/python3.11/dist-packages/transformers/utils/hub.py in list_repo_templates(repo_id, local_files_only, revision, cache_dir)
159 if not local_files_only:
160 try:
--> 161 return [
162 entry.path.removeprefix(f"{CHAT_TEMPLATE_DIR}/")
163 for entry in list_repo_tree(

/usr/local/lib/python3.11/dist-packages/transformers/utils/hub.py in (.0)
159 if not local_files_only:
160 try:
--> 161 return [
162 entry.path.removeprefix(f"{CHAT_TEMPLATE_DIR}/")
163 for entry in list_repo_tree(

/usr/local/lib/python3.11/dist-packages/huggingface_hub/hf_api.py in list_repo_tree(self, repo_id, path_in_repo, recursive, expand, revision, repo_type, token)
3166 encoded_path_in_repo = "/" + quote(path_in_repo, safe="") if path_in_repo else ""
3167 tree_url = f"{self.endpoint}/api/{repo_type}s/{repo_id}/tree/{revision}{encoded_path_in_repo}"
-> 3168 for path_info in paginate(path=tree_url, headers=headers, params={"recursive": recursive, "expand": expand}):
3169 yield (RepoFile(**path_info) if path_info["type"] == "file" else RepoFolder(**path_info))
3170

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_pagination.py in paginate(path, params, headers)
35 session = get_session()
36 r = session.get(path, params=params, headers=headers)
---> 37 hf_raise_for_status(r)
38 yield from r.json()
39

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_http.py in hf_raise_for_status(response, endpoint_name)
457 " https://huggingface.co/docs/huggingface_hub/authentication"
458 )
--> 459 raise _format(RepositoryNotFoundError, message, response) from e
460
461 elif response.status_code == 400:

RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-685bc7f1-448cc03d560cc5bc2bc95865;1708f136-0e94-4e7d-a6b4-4e38a9c50920)

Repository Not Found for url: https://huggingface.co/api/models/deepset/bert-base-NER/tree/main/additional_chat_templates?recursive=False&expand=False.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Invalid username or password.

Expected behavior

The tokenizer loads without errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions