Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading tokenizer from local folder #76 #81

Conversation

vinu-vanjari
Copy link
Contributor

Resolves #76

  • Added new method in AutoTokenizer to load from local model folder.

  • Also added supporting unit tests at Tests/TokenizersTests/FactoryTests.swift

  • Although we could have simply loaded tokenizer_config.json and tokenizer.json files directly into configurations, I chose to go via LanguageModelConfigurationFromHub because it has it's own magic for choosing appropriate tokenizer class and it also selects fallback tokenizer if needed refer getter var tokenizerConfig: Config?

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks a lot for working on this! 🙌

This is in very good shape, I only have a couple of suggestions and we can merge.

Maybe we should rename LanguageModelConfigurationFromHub in a future PR.

Sources/Hub/Hub.swift Show resolved Hide resolved
Comment on lines 100 to 101
/// Assumes the file has already present at local url.
/// `url` is complete local file path for given model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Assumes the file has already present at local url.
/// `url` is complete local file path for given model
/// Assumes the file is already present at local url.
/// `url` is a complete local file path for the given model


/// Assumes the file has already present at local url.
/// `url` is complete local file path for given model
func configuration(url: URL) throws -> Config {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call the parameter fileURL to make it clear?

@vinu-vanjari vinu-vanjari force-pushed the support-loading-tokenizer-from-local-folder branch from 08d4b56 to 800353c Compare March 25, 2024 17:48
@vinu-vanjari vinu-vanjari requested a review from pcuenca March 25, 2024 17:49
Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'll try to trigger a CI run before we merge.

@pcuenca
Copy link
Member

pcuenca commented Mar 25, 2024

Something's up with the CI, but the tests pass locally so I'll merge. Thanks a lot @vinu-vanjari!

@pcuenca pcuenca merged commit 18b62e5 into huggingface:main Mar 25, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support loading tokenizer from local folder
2 participants