Fix dynamic module import error #21646

ydshieh · 2023-02-15T15:08:50Z

What does this PR do?

Issue

We have failing test

FAILED tests/models/auto/test_modeling_auto.py::AutoModelTest::test_from_pretrained_dynamic_model_distant

ModuleNotFoundError: No module named 'transformers_modules.local.modeling'

The full trace is given at the end.

After a long debug process, it turns out that, when reloading from the saved model

model = AutoModel.from_pretrained("hf-internal-testing/test_dynamic_model", trust_remote_code=True)
with tempfile.TemporaryDirectory() as tmp_dir:
    model.save_pretrained(tmp_dir)
    reloaded_model = AutoModel.from_pretrained(tmp_dir, trust_remote_code=True)

if configuration.py appears in the dynamic module directory (here transformers_modules/local), sometimes it interferes the import of transformers_modules.local.modeling. I have no clear reason for this situation however.

What this PR fixes

This PR therefore tries to avoid the appearance of other module files while the code imports a specific module file, around this line

def get_class_in_module():
    ...
    module = importlib.import_module(module_path)
    ...

Result

Running the reproduce code snippet (provided in the comment below) in a loop of 300 times:

with this PR: this issue doesn't appear, job run
without the fix: this issue appears with 50% probability job run

Full traceback

Traceback (most recent call last):
    ...
    reloaded_model = AutoModel.from_pretrained(tmp_dir, trust_remote_code=True)
  File "/home/circleci/.pyenv/versions/3.7.12/lib/python3.7/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
    pretrained_model_name_or_path, module_file + ".py", class_name, **hub_kwargs, **kwargs
  File "/home/circleci/.pyenv/versions/3.7.12/lib/python3.7/site-packages/transformers/dynamic_module_utils.py", line 367, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/home/circleci/.pyenv/versions/3.7.12/lib/python3.7/site-packages/transformers/dynamic_module_utils.py", line 147, in get_class_in_module
    module = importlib.import_module(module_path)
  File "/home/circleci/.pyenv/versions/3.7.12/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.local.modeling'

HuggingFaceDocBuilderDev · 2023-02-15T15:26:47Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2023-02-15T15:26:56Z

Run the following commpand

python run_debug.py

with the 2 files

run_debug.py

import os

for i in range(300):
    print(i)
    with open("output.txt", "a+") as fp:
        fp.write(str(i) + "\n")
    os.system("python3 debug.py")

(we need to run the debugging code foo (contained in file debug.py) in difference processes each time, instead of running the script debug.py with a for loop defined inside it - as this will be always in the same process)

debug.py

import time, traceback, tempfile, os
from transformers.utils import HF_MODULES_CACHE


def foo():
    from transformers import AutoModel

    model = AutoModel.from_pretrained("hf-internal-testing/test_dynamic_model", trust_remote_code=True)
    # Test model can be reloaded.
    with tempfile.TemporaryDirectory() as tmp_dir:
        model.save_pretrained(tmp_dir)
        try:
            reloaded_model = AutoModel.from_pretrained(tmp_dir, trust_remote_code=True)
        except Exception as e:
            print(e)
            with open("output.txt", "a+") as fp:
                fp.write(f"{traceback.format_exc()}" + "\n")


if __name__ == "__main__":
    timeout = os.environ.get("PYTEST_TIMEOUT", 10)
    timeout = int(timeout)
    for i in range(1):
        time.sleep(1)
        print(i)
        with open("output.txt", "a+") as fp:
            fp.write(str(i) + "\n")
        try:
            os.system(f'rm -rf "{HF_MODULES_CACHE}"')
        except:
            pass
        foo()
        print("=" * 80)
        with open("output.txt", "a+") as fp:
            fp.write("=" * 80 + "\n")

sgugger · 2023-02-15T15:38:22Z

Thanks for working on this! I was going to have a look at it when back from vacation but if you beat me to it ;-)

My solution would have been to change the way the local module works: for now I dumb every file there without structure, I wanted to add a folder per model (so given by pretrained_model_name_or_path) which would also fix this issue I believe.

ydshieh · 2023-02-15T15:47:18Z

@sgugger I am open to explore further, but I have a bit doubt regarding

I wanted to add a folder per model (so given by pretrained_model_name_or_path) which would also fix this issue I believe.

While I am debugging (this single test), the only model appears

transformers_modules/hf-internal-testing/test_dynamic_model/12345678901234567890.../
transformers_modules/local/

so I don't see multiple models sharing the same folder, but the issue still occurs. So, I am not sure how to proceed with the solution you mentioned above.

ydshieh · 2023-02-15T15:48:20Z

Hmm, there seems to affect other related tests. I will have to take a look 😭

sgugger · 2023-02-15T15:50:08Z

I believe the conflict is between two files in local being written/deleted concurrently (but I might be wrong) hence making sure we things like

transformers_modules/local/123456...
transformers_modules/local/777888...

might fix the issue.

ydshieh · 2023-02-15T15:55:57Z

I believe the conflict is between two files in local being written/deleted concurrently

On (circleci) CI, we have pytest -n 8, which might cause the situation you mentioned. But I am debugging by running the following function in a loop (and the issue still appears), so I kinda feel the issue is not from the concurrently read/write/delete operations

def foo():
    from transformers import AutoModel

    model = AutoModel.from_pretrained("hf-internal-testing/test_dynamic_model", trust_remote_code=True)
    # Test model can be reloaded.
    with tempfile.TemporaryDirectory() as tmp_dir:
        model.save_pretrained(tmp_dir)
        reloaded_model = AutoModel.from_pretrained(tmp_dir, trust_remote_code=True)

I could explore anyway - but maybe let me finalize the current PR (make CI green) first

ydshieh · 2023-02-15T20:59:07Z

src/transformers/dynamic_module_utils.py

@@ -212,7 +244,7 @@ def get_cached_module_file(
    # Download and cache module_file from the repo `pretrained_model_name_or_path` of grab it if it's a local file.
    pretrained_model_name_or_path = str(pretrained_model_name_or_path)
    if os.path.isdir(pretrained_model_name_or_path):
-        submodule = "local"
+        submodule = f"local_{pretrained_model_name_or_path.replace(os.path.sep, '_')}"


@sgugger You already mentioned this in your comment. As I said, the issue doesn't seem come from the concurrent file operations. However, the fix I implemented in this PR add more operations to the module directory, and at some point it looks getting some race condition (not 100% confident).

Therefore, I move forward to make the module directory depending on pretrained_model_name_or_path, but I need to add replace(os.path.sep, '_') to avoid the case where pretrained_model_name_or_path being like /tmp/xxxyyy.

You can just taje the xxxyyy which should solve the issue for the tests (since they are all in tmp dirs that have unique names).

@sgugger Sorry, but what is taje the xxxyyy?

Regarding they are all in tmp dirs that have unique names -> should solve the issue for the tests:
I guess what I did here also gives the unique names (during testing), but without the (latest) changes in get_class_in_module, we still get the same issue, as I already run it several times.

If you ever want to double check: run this code snippet

This test issue is really tricky to reproduce

You're the one who called your folder /tmp/xxxyyy in your first comment. I'm just saying you should take the last part, so pretrained_model_name_or_path.split(os.path.sep)[-1]

Done!

(My brain also has tmp memory regarding xxxyyy)

src/transformers/dynamic_module_utils.py

sgugger · 2023-02-16T14:53:51Z

src/transformers/dynamic_module_utils.py

+    # remove `configuration.py`: this is necessary when we try to import modeling module, or other tokenizer/processor
+    # modules, while configuration module has been imported previously.
+    # TODO: This is only a simple heuristic. In general, we might need to consider any dynamic module that has been
+    #       imported. However, we don't have this information so far.
+    if os.path.isfile(f"{module_dir}/configuration.py"):
+        os.remove(f"{module_dir}/configuration.py")


This is very weird and way to specific. Just because the tests call the file configuration doesn't mean it will always be called this way.

no longer need to deal with this specific file, but the same trick is required for the module file (that we want to import)

sgugger · 2023-02-16T14:54:27Z

src/transformers/dynamic_module_utils.py

@@ -212,7 +244,7 @@ def get_cached_module_file(
    # Download and cache module_file from the repo `pretrained_model_name_or_path` of grab it if it's a local file.
    pretrained_model_name_or_path = str(pretrained_model_name_or_path)
    if os.path.isdir(pretrained_model_name_or_path):
-        submodule = "local"
+        submodule = f"local_{pretrained_model_name_or_path.replace(os.path.sep, '_')}"


You can just taje the xxxyyy which should solve the issue for the tests (since they are all in tmp dirs that have unique names).

ydshieh · 2023-02-16T15:18:19Z

src/transformers/dynamic_module_utils.py

+        # copy to a temporary directory
+        shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir)
+        cmd = f'import os; os.remove("{module_dir}/{module_file_name}")'
+        os.system(f"python3 -c '{cmd}'")


no more test error is we remove the file in a subprocess.

That's a bit crazy! Can you use the subprocess command instead of os.system? Not sure if this is going to fly well on Windows for instance.

Changed to subprocess. Tested on my Windows env. and it works.

ydshieh · 2023-02-16T15:18:27Z

src/transformers/dynamic_module_utils.py

+        shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir)
+        cmd = f'import os; os.remove("{module_dir}/{module_file_name}")'
+        os.system(f"python3 -c '{cmd}'")
+        # os.remove(f"{module_dir}/{module_file_name}")


os.remove(f"{module_dir}/{module_file_name}") is not working!!!!!!!

ydshieh · 2023-02-16T15:21:00Z

src/transformers/dynamic_module_utils.py

-    module_path = module_path.replace(os.path.sep, ".")
-    module = importlib.import_module(module_path)
-    return getattr(module, class_name)
+    with tempfile.TemporaryDirectory() as tmp_dir:


this is not the location to load the module. It's just to hold the file temporarily , and it will be copied back to the original place.

ydshieh · 2023-02-16T15:26:09Z

Finally get it:

we don't need to remove other files (config, __init__.py) or __pycache__ folder
the point is: we need to remove the module_file_name in a subprocess, then copy it back
- os.system("rm -rf ...") works: as it is in another process
- os.system(f"python3 -c '{cmd}'"): same, but we don't use Linux specific command --> way to go
- os.remove(...): not working! I could not explain (as I don't know the reason behind) 😢

ydshieh · 2023-02-16T16:11:05Z

Don't know why we get an error where a module is not a python file, but a package. See below.
Can't reproduce so far, but the fix works for the auto model dynamic loading test.

FAILED tests/models/auto/test_image_processing_auto.py::AutoImageProcessorTest::test_from_pretrained_dynamic_image_processor

 - ModuleNotFoundError: No module named 'transformers_modules.local__tmp_tmpkcj_lb5j'

ydshieh · 2023-02-16T16:55:59Z

This PR is ready for review.

There is one failure thtat I can't reproduce with the same code snippet. See this comment. It seems this happens much rarely. And probably we can investigate it if it happens again.

ydshieh · 2023-02-17T09:15:03Z

src/transformers/dynamic_module_utils.py

+        shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir)
+        # On Windows, we need this character `r` before the path argument of `os.remove`
+        cmd = f'import os; os.remove(r"{module_dir}{os.path.sep}{module_file_name}")'
+        subprocess.run(["python", "-c", cmd])


If something goes wrong in the subprocess.run, no error will be thrown (in the process that calls this method).
I think we should capture/check the output of subprocess.run, and do something:

either: not to call shutil.copyfile below (although this makes the test flaky in this logic branch)

or: throw an error manually with some information

Let me know if you have any suggestion :-)

Why do we need to do something? If there is a problem deleting the file (which we copy just after), at worst we get the flaky failure again (though it should be extremely rare at this stage).

yeah, right!

sgugger · 2023-02-17T16:42:03Z

Thanks for investigating so deeply this issue!

* fix dynamic module import error --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ydshieh added 8 commits February 15, 2023 20:57

fix dynamic module import error

6d83254

temp for history

2472668

fix dynamic module import error

37a046c

update

ab57566

update

da2e785

update

93c973d

update

7bf2582

update

e347d17

ydshieh force-pushed the fix_dynamic_module_import_flaky_error branch from b678fba to e347d17 Compare February 15, 2023 20:50

ydshieh commented Feb 15, 2023

View reviewed changes

src/transformers/dynamic_module_utils.py Outdated Show resolved Hide resolved

ydshieh added 2 commits February 15, 2023 22:38

update

c4cadb7

update

d13ed84

sgugger reviewed Feb 16, 2023

View reviewed changes

ydshieh added 2 commits February 16, 2023 15:55

update

f5c1d12

update

f10b2d8

ydshieh commented Feb 16, 2023

View reviewed changes

ydshieh changed the title ~~[WIP] Fix dynamic module import error~~ Fix dynamic module import error Feb 16, 2023

ydshieh marked this pull request as ready for review February 16, 2023 16:49

ydshieh requested a review from sgugger February 16, 2023 19:46

ydshieh added 2 commits February 17, 2023 03:10

update

cbf45dc

update

b9c583e

ydshieh commented Feb 17, 2023

View reviewed changes

sgugger approved these changes Feb 17, 2023

View reviewed changes

ydshieh merged commit 7f1cdf1 into main Feb 17, 2023

ydshieh deleted the fix_dynamic_module_import_flaky_error branch February 17, 2023 20:22

ydshieh mentioned this pull request Feb 20, 2023

Fix get_class_in_module #21709

Merged

ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request Mar 2, 2023

Fix dynamic module import error (huggingface#21646)

9ac4609

* fix dynamic module import error --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

This was referenced Apr 3, 2023

Dynamic module import error when using ddp #22506

Closed

Remove hack for dynamic modules and use Python functions instead #22537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dynamic module import error #21646

Fix dynamic module import error #21646

ydshieh commented Feb 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 15, 2023 •

edited

Loading

ydshieh commented Feb 15, 2023

sgugger commented Feb 15, 2023

ydshieh commented Feb 15, 2023 •

edited

Loading

ydshieh commented Feb 15, 2023

sgugger commented Feb 15, 2023

ydshieh commented Feb 15, 2023 •

edited

Loading

ydshieh Feb 15, 2023 •

edited

Loading

sgugger Feb 16, 2023

ydshieh Feb 16, 2023 •

edited

Loading

ydshieh Feb 16, 2023

sgugger Feb 16, 2023

ydshieh Feb 17, 2023

sgugger Feb 16, 2023

ydshieh Feb 16, 2023 •

edited

Loading

sgugger Feb 16, 2023

ydshieh Feb 16, 2023

sgugger Feb 16, 2023

ydshieh Feb 17, 2023

ydshieh Feb 16, 2023 •

edited

Loading

ydshieh Feb 16, 2023

ydshieh commented Feb 16, 2023 •

edited

Loading

ydshieh commented Feb 16, 2023 •

edited

Loading

ydshieh commented Feb 16, 2023

ydshieh Feb 17, 2023 •

edited

Loading

sgugger Feb 17, 2023

ydshieh Feb 17, 2023

sgugger commented Feb 17, 2023

Fix dynamic module import error #21646

Fix dynamic module import error #21646

Conversation

ydshieh commented Feb 15, 2023 • edited Loading

What does this PR do?

Issue

What this PR fixes

Result

Full traceback

HuggingFaceDocBuilderDev commented Feb 15, 2023 • edited Loading

ydshieh commented Feb 15, 2023

run_debug.py

debug.py

sgugger commented Feb 15, 2023

ydshieh commented Feb 15, 2023 • edited Loading

ydshieh commented Feb 15, 2023

sgugger commented Feb 15, 2023

ydshieh commented Feb 15, 2023 • edited Loading

ydshieh Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh Feb 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Feb 16, 2023 • edited Loading

ydshieh commented Feb 16, 2023 • edited Loading

ydshieh commented Feb 16, 2023

ydshieh Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger commented Feb 17, 2023

ydshieh commented Feb 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 15, 2023 •

edited

Loading

ydshieh commented Feb 15, 2023 •

edited

Loading

ydshieh commented Feb 15, 2023 •

edited

Loading

ydshieh Feb 15, 2023 •

edited

Loading

ydshieh Feb 16, 2023 •

edited

Loading

ydshieh Feb 16, 2023 •

edited

Loading

ydshieh Feb 16, 2023 •

edited

Loading

ydshieh commented Feb 16, 2023 •

edited

Loading

ydshieh commented Feb 16, 2023 •

edited

Loading

ydshieh Feb 17, 2023 •

edited

Loading