Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move tokenizer model to third party llama2 #27

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

FanhaiLu1
Copy link
Contributor

Move tokenizer model to third party llama2

Copy link
Collaborator

@patemotter patemotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Would some of these tests break if we were to change the tokenizer being used?

@@ -48,7 +48,7 @@ def decode(self, t: int) -> str:

class TokenUtilsTest(unittest.TestCase):
def setup(self):
tokenizer_path = "tokenizer.model"
tokenizer_path = "third_party/llama2/tokenizer.model"
current_dir = os.path.dirname(__file__)
tokenizer_path = os.path.join(current_dir, tokenizer_path)
print(f"model_path: {tokenizer_path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit outside of this change: Should probably be renamed to tokenizer_path.

Also typos in comments above "Tokenier" -> "Tokenizer"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pate! there could be more data set from llama2. Keep it as "third_party/llama2"

@FanhaiLu1
Copy link
Contributor Author

Overall LGTM. Would some of these tests break if we were to change the tokenizer being used?

The test_underscore_in_output test is specific for a under score token id, it may fail if tokenizer changed. All other 3 test are tokenizer independent

@FanhaiLu1 FanhaiLu1 merged commit 426c915 into AI-Hypercomputer:main Apr 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants