Skip to content

Conversation

@sergiopaniego
Copy link
Member

What does this PR do?

Removing tokenizer/processor creation from example scripts when possible. The idea comes from this comment

In some trainers, it's still needed to pass directly the tokenizer/processor so they're not updated here.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better, thanks!

@sergiopaniego sergiopaniego merged commit ae6837f into main Oct 6, 2025
3 of 12 checks passed
@sergiopaniego sergiopaniego deleted the scripts-no-autotokenizers branch October 6, 2025 16:40
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit ae6837f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 45ee98b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 7ad9ce8
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <ben.burtenshaw@gmail.com>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit 65eb45c
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Oct 6 13:07:18 2025 -0600

    Apply style and revert change in `sft_video_llm` example (#4214)

commit ae6837f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 45ee98b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 7ad9ce8
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <ben.burtenshaw@gmail.com>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants