Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compute_output_spec() for tokenizers with vocabulary. #1523

Merged
merged 2 commits into from
Mar 29, 2024
Merged

Implement compute_output_spec() for tokenizers with vocabulary. #1523

merged 2 commits into from
Mar 29, 2024

Conversation

briango28
Copy link
Contributor

Small fix for Issue 1522

Implements the same compute_output_spec() method for BytePairTokenizer, WordPieceTokenizer, and SentencePieceTokenizer.

@briango28
Copy link
Contributor Author

Previous version used keras.KerasTensor which apparently did not exist in keras v.2.
Updated to use keras.Input instead.

@briango28
Copy link
Contributor Author

Ran format.sh.
I was working behind a MITM proxy without a proper linux environment, and had to resort to manual copying which turned out to be rather unwieldy.
Hopefully will pass tests now.

@mattdangerw mattdangerw self-requested a review March 29, 2024 00:01
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few comments

keras_nlp/tokenizers/byte_pair_tokenizer.py Outdated Show resolved Hide resolved
keras_nlp/tokenizers/byte_pair_tokenizer.py Outdated Show resolved Hide resolved
@briango28 briango28 closed this Mar 29, 2024
@briango28 briango28 reopened this Mar 29, 2024
@briango28
Copy link
Contributor Author

Applied above discussions. The function now looks like this:

class TokenizerWithVocabulary:
    def compute_output_spec(self, input_spec) -> keras.KerasTensor:
        return keras.KerasTensor(
            input_spec.shape + (self.sequence_length,), dtype=self.compute_dtype
        )

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 29, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 29, 2024
@mattdangerw
Copy link
Member

Thank you!

@mattdangerw mattdangerw merged commit 5341426 into keras-team:master Mar 29, 2024
10 checks passed
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
…s-team#1523)

* Implement compute_output_spec() for tokenizers with vocabulary. (restarted from new point in master branch)

* Remove type annotation from compute_output_spec() in tokenizers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants