Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deprecated/unused dependencies 🧹 🧹 #36419

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

gante
Copy link
Member

@gante gante commented Feb 26, 2025

What does this PR do?

Updates deprecated (because of our minimum versions, like python>=3.9 or torch>=2.0) and unused dependency-related code.

  • Removes dependencies/maximum version pins from _deps (each change is commented in this PR with further information about why we can safely make the change)
  • Removes unused is_xxx_available and requires_xxx functions
  • Updates code that can no longer be reached or is redundant due to newer versions (like code to check that we are using torch>=2.0 or dynamo+nvfuser)
  • Other minor dependency-related changes

✅ CI is green on dev docker images (failing TF tests are unrelated, and a result of [test-all]). dev image PR: #36427


Post merge:

  • Update CI image docs on notion (commit message now doesn't need to be exactly dev-ci, it only needs to include dev-ci)

@gante gante changed the title up until blobfile Update deprecated/unused dependencies Feb 26, 2025
Comment on lines -78 to -91
# Remove stale transformers.egg-info directory to avoid https://github.com/pypa/pip/issues/5466
stale_egg_info = Path(__file__).parent / "transformers.egg-info"
if stale_egg_info.exists():
print(
(
"Warning: {} exists.\n\n"
"If you recently updated transformers to 3.0 or later, this is expected,\n"
"but it may prevent transformers from installing in editable mode.\n\n"
"This directory is automatically generated by Python's packaging tools.\n"
"I will remove it now.\n\n"
"See https://github.com/pypa/pip/issues/5466 for details.\n"
).format(stale_egg_info)
)
shutil.rmtree(stale_egg_info)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue was sorted and added to pip 20.1, released in early 2020. Our minimum python version is 3.9, released in late 2020, and it comes with a pip version more recent than 20.1.

@@ -345,6 +327,7 @@ def run(self):
)
+ extras["retrieval"]
+ extras["modelcreation"]
+ extras["tiktoken"]
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to test py.test tests/models/llama/test_tokenization_llama.py::TikTokenIntegrationTests, so it makes sense to be in the testing extra

Some of our CI images rely on installing testing (e.g. see scheduled daily torch tests), and were not running those tests as a result

Comment on lines +847 to +848
@require_tiktoken
@require_read_token
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@require_read_token behaves slightly different from other @require_xxx: it doesn't add a @unittest.skipUnless. On my machine these tests were not being run, and I can't find them on CI either (but perhaps I'm not looking in the right place)

"accelerate>=0.26.0",
"av",
"beautifulsoup4",
"blobfile",
"codecarbon>=2.8.1",
"cookiecutter==1.7.3",
"dataclasses",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataclasses is part of the standard library since python 3.7: https://peps.python.org/pep-0557/

"datasets!=2.5.0",
"deepspeed>=0.9.3",
"diffusers",
"dill<0.3.5",
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dill was pinned to sort example tests (#17368) -- removing to test whether it is still needed

"evaluate>=0.2.0",
"faiss-cpu",
"fastapi",
"filelock",
"flax>=0.4.1,<=0.7.0",
"fsspec<2023.10.0",
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsspec was pinned to fix CI (#27010) -- removing to test whether it is still needed

"ftfy",
"fugashi>=1.0",
"GitPython<3.1.19",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pin on GitPython was added to fix CI (#12858) -- removing to test whether it is still needed

@@ -130,11 +108,10 @@
"keras>2.9,<2.16",
"keras-nlp>=0.3.1,<0.14.0", # keras-nlp 0.14 doesn't support keras 2, see pin on keras.
"librosa",
"natten>=0.14.6,<0.15.0",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pin on GitPython was added to fix CI (#28432) -- removing to test whether it is still needed

"nltk<=3.8.1",
"num2words",
"numpy>=1.17",
"onnxconverter-common",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onnxconverter-common (library name) nor onnxconverter_common (corresponding import) are present in our library

"nltk<=3.8.1",
"num2words",
"numpy>=1.17",
"onnxconverter-common",
"onnxruntime-tools>=1.4.2",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

"protobuf",
"psutil",
"pyyaml>=5.1",
"pydantic",
"pytest>=7.2.0,<8.0.0",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pin on pytest was added to fix CI (#28758) -- removing to test whether it is still needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for doctest ... and I kinda think we have issues running doctests on circleci for some time.
But I think it's time for us to move doctest from circleci anyway, so ok for me here.

@@ -158,7 +135,6 @@
"requests",
"rhoknp>=1.1.0,<1.3.1",
"rjieba",
"rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no references to rouge-score (library name) nor rouge_score (import name) in our library

It was pinned at some point (#18247), but the actual dependency predates this PR

Comment on lines -196 to -197
"libcst",
"rich",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(moved, out of alphabetical order)

@@ -192,9 +169,6 @@
"unidic_lite>=1.0.7",
"urllib3<2.0.0",
"uvicorn",
"pytest-rich",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no references to pytest-rich (library name) nor --rich (the flag it enables) in our library

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's there in case someone want to use this pytest feature (that's why it is not seen in our library)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added in this PR and, from the commit messages, to help with debugging halfway through it.

But it is not used anywhere in the library, and we should try to minimize dependencies (source of issues) 👀

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante changed the title Update deprecated/unused dependencies Update deprecated/unused dependencies 🧹 🧹 Feb 26, 2025
@@ -67,7 +66,6 @@ def check_forward(test_module, model, batch_size=1, context_size=1024):
test_module.assertEqual(out.shape[1], context_size)


@require_torchao
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require_torchao_version_greater_or_equal implies require_torchao

optimizing memory utilization, speeding up the training, or both. If you'd like to understand how GPU is utilized during
training, please refer to the [Model training anatomy](model_memory_anatomy) conceptual guide first. This guide
focuses on practical techniques.
This guide demonstrates practical techniques that you can use to increase the efficiency of your model's training by
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry, my IDE clears trailing whitespeces 🙈 the actual diff is L392-393)

Comment on lines -392 to -393
* `dynamo.optimize("nvfuser")` - nvFuser with TorchScript. [Read more](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-1-nvfuser-and-its-primitives/593)
* `dynamo.optimize("aot_nvfuser")` - nvFuser with AotAutograd. [Read more](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-1-nvfuser-and-its-primitives/593)
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvfuser was removed: pytorch/pytorch#105789

Comment on lines -319 to -320
* `dynamo.optimize("nvfuser")` - nvFuser with TorchScriptを使用します。 [詳細はこちら](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-1-nvfuser-and-its-primitives/593)
* `dynamo.optimize("aot_nvfuser")` - nvFuser with AotAutogradを使用します。 [詳細はこちら](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-1-nvfuser-and-its-primitives/593)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvfuser was removed: pytorch/pytorch#105789

@@ -733,7 +733,7 @@ class TrainingArguments:
distributed training. Important: this will negatively impact the performance, so only use it for debugging.
torchdynamo (`str`, *optional*):
If set, the backend compiler for TorchDynamo. Possible choices are `"eager"`, `"aot_eager"`, `"inductor"`,
`"nvfuser"`, `"aot_nvfuser"`, `"aot_cudagraphs"`, `"ofi"`, `"fx2trt"`, `"onnxrt"` and `"ipex"`.
Copy link
Member Author

@gante gante Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvfuser was removed: pytorch/pytorch#105789

(note that this does not remove support, only removes it from the docs -- this feature was deprecated in torch 2.1 and removed in torch 2.2)

@@ -3990,91 +3988,12 @@ def test_torchdynamo_full_eval(self):
del trainer
torchdynamo.reset()

# 3. TorchDynamo nvfuser
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvfuser was removed: pytorch/pytorch#105789

@unittest.skip(reason="torch 2.0.0 gives `ModuleNotFoundError: No module named 'torchdynamo'`.")
@require_torch_non_multi_gpu
@require_torchdynamo
def test_torchdynamo_memory(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test compares nvfuser to eager, nvfuser was removed: pytorch/pytorch#105789

(the test was also being skipped 👀 )

@@ -84,7 +84,7 @@ def __post_init__(self):
else:
# BIG HACK WILL REMOVE ONCE FETCHER IS UPDATED
print(os.environ.get("GIT_COMMIT_MESSAGE"))
if "[build-ci-image]" in os.environ.get("GIT_COMMIT_MESSAGE", "") or os.environ.get("GIT_COMMIT_MESSAGE", "") == "dev-ci":
if "[build-ci-image]" in os.environ.get("GIT_COMMIT_MESSAGE", "") or "dev-ci" in os.environ.get("GIT_COMMIT_MESSAGE", ""):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can do [test-all] dev-ci 🤗

@@ -431,7 +395,7 @@ def run(self):
deps["numpy"],
deps["packaging"], # utilities from PyPA to e.g., compare versions
deps["pyyaml"], # used for the model cards metadata
deps["regex"], # for OpenAI GPT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inaccurate comment, it is now used in more places

@gante gante marked this pull request as ready for review February 26, 2025 18:04
@gante gante requested a review from ydshieh February 26, 2025 18:04
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's nice to cleanup the codebase. From previous experience, sometimes other libraries (or just some codes/scripts) use these attributes/function too, and removing them will break, and I had to put it back again.

We could go with this PR and see if we get some reports, and think of a better way to deal with these old stuff in a better way that won't break things.

WDYT, @ArthurZucker @LysandreJik ?

"protobuf",
"psutil",
"pyyaml>=5.1",
"pydantic",
"pytest>=7.2.0,<8.0.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for doctest ... and I kinda think we have issues running doctests on circleci for some time.
But I think it's time for us to move doctest from circleci anyway, so ok for me here.

@@ -192,9 +169,6 @@
"unidic_lite>=1.0.7",
"urllib3<2.0.0",
"uvicorn",
"pytest-rich",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's there in case someone want to use this pytest feature (that's why it is not seen in our library)

@@ -833,6 +775,7 @@ def is_psutil_available():


def is_py3nvml_available():
logger.warning_once("`is_py3nvml_available` is deprecated and will be removed in v4.51.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is kept here specially ..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is unused, but is present in the main init 👀

The other ones are not in the main init. I can remove this one, or add the deprecation warning to the other ones, if you prefer 🤗

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I see is_torch_tpu_available is also in main init file and remove in this PR. Is this expected (considering your message above).

@gante
Copy link
Member Author

gante commented Feb 27, 2025

@ydshieh PR comments addressed/answered 🤗

The main motivation behind this PR is to attempt to contain dependency bloat: we frequently add dependencies, but we rarely review them. For instance, maximum version pins to fix transient CI issues may cause conflicts with other packages.

By keeping our codebase leaner where we can, without breaking BC, it becomes easier to read and maintain 🙌

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 27, 2025

@ydshieh PR comments addressed/answered 🤗

The main motivation behind this PR is to attempt to contain dependency bloat: we frequently add dependencies, but we rarely review them. For instance, maximum version pins to fix transient CI issues may cause conflicts with other packages.

By keeping our codebase leaner where we can, without breaking BC, it becomes easier to read and maintain 🙌

Yes I understand. I would still want to know the core maintainers' opinion about removing those functions , as even not used in transformers they might be still used in external codebases, something like #35734

@gante
Copy link
Member Author

gante commented Feb 28, 2025

@ydshieh 😭 I see your point

Would you be okay with a deprecation warning instead (on the removed functions that could be imported, even if they are not a top-level import)? That would give time for downstream libraries to stop using them, while giving us margin to keep the library clean 🤗

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 4, 2025

Would you be okay with a deprecation warning instead (on the removed functions that could be imported, even if they are not a top-level import)?

Yes, whenever that could be applied 🙏 .

I also have a small suggestion: the changes in setup.py could be separated from the changes of removing unused function/variables (for which now we decide to add warnings first but still keep them for some time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants