Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug happens in processor use model cached in local #28697

Closed
2 of 4 tasks
wwx007121 opened this issue Jan 25, 2024 · 5 comments · Fixed by #28709
Closed
2 of 4 tasks

Bug happens in processor use model cached in local #28697

wwx007121 opened this issue Jan 25, 2024 · 5 comments · Fixed by #28709
Assignees

Comments

@wwx007121
Copy link

System Info

version: transformers>=4.37.0

bug occurs in https://github.com/huggingface/transformers/blob/main/src/transformers/processing_utils.py ,line 466

I understand the purpose of this code, but this creats a conflict occurred with code in 'utils/hub.py line 426' that the error detail descriptions may have been changed.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I have made a simple solution that change my local code "if "does not appear to have a file named processor_config.json." in str(e):" to if "processor_config.json." in str(e): " .otherwise , Reduce version to 4.36.2 is also working。

Expected behavior

I think it may have a better solution.

@amyeroberts
Copy link
Collaborator

Hi @wwx007121, thanks for raising an issue!

Could you give some more details about exactly the bug that is occurring i.e. the error being encountered (including full traceback) and a minimal code snippet to reproduce the issue?

cc @ydshieh

@wwx007121
Copy link
Author

wwx007121 commented Jan 25, 2024

Hi @wwx007121, thanks for raising an issue!

Could you give some more details about exactly the bug that is occurring i.e. the error being encountered (including full traceback) and a minimal code snippet to reproduce the issue?

cc @ydshieh

    model_id = "openai/whisper-large-v3"
    pretrain_model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, cache_dir=model_cache
    )
    pretrain_model.to(device)
    print("load model done")

    processor = AutoProcessor.from_pretrained(model_id, cache_dir=model_cache)

model in cache was downloaded in others process which shared same docker environments.

Traceback (most recent call last):
  File "script/get_whisper_result.py", line 28, in <module>
    processor = AutoProcessor.from_pretrained(model_id, cache_dir=model_cache)
  File "/opt/miniconda/lib/python3.8/site-packages/transformers/models/auto/processing_auto.py", line 313, in from_pretrained
    return processor_class.from_pretrained(
  File "/opt/miniconda/lib/python3.8/site-packages/transformers/processing_utils.py", line 464, in from_pretrained
    processor_dict, kwargs = cls.get_processor_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/miniconda/lib/python3.8/site-packages/transformers/processing_utils.py", line 308, in get_processor_dict
    resolved_processor_file = cached_file(
  File "/opt/miniconda/lib/python3.8/site-packages/transformers/utils/hub.py", line 425, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like distil-whisper/distil-large-v2 is not the path to a directory containing a file named processor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

@ydshieh ydshieh self-assigned this Jan 25, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented Jan 25, 2024

Hi @wwx007121 , it looks like I should modify the condition indeed, thank you for reporting this.

However, to make sure, I would really like to be able to reproduce the issue. So far, I am doing the following, which should be the situation you described, but this code snippet works without any error.

Could you describe in more detail how to reproduce it, please?

You mentioned that cache was downloaded in others process. When running the provided code example, is the connection cut/disabled?

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

model_id = "openai/whisper-large-v3"

model_cache = "my_cache"

pretrain_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, use_safetensors=True, cache_dir=model_cache
)

processor = AutoProcessor.from_pretrained(model_id, cache_dir=model_cache)

@ydshieh
Copy link
Collaborator

ydshieh commented Jan 25, 2024

Well, I tried to disable the internet connection and I can reproduce the issue. I will open a PR to fix it, thanks again for reporting

@ydshieh
Copy link
Collaborator

ydshieh commented Jan 26, 2024

@wwx007121

The fix is merged into main. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants