Skip to content

Conversation

@briango28
Copy link
Contributor

Small fix for Issue 1522

Implements the same compute_output_spec() method for BytePairTokenizer, WordPieceTokenizer, and SentencePieceTokenizer.

@briango28
Copy link
Contributor Author

Previous version used keras.KerasTensor which apparently did not exist in keras v.2.
Updated to use keras.Input instead.

@briango28
Copy link
Contributor Author

Ran format.sh.
I was working behind a MITM proxy without a proper linux environment, and had to resort to manual copying which turned out to be rather unwieldy.
Hopefully will pass tests now.

@mattdangerw mattdangerw self-requested a review March 29, 2024 00:01
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few comments

@briango28 briango28 closed this Mar 29, 2024
@briango28 briango28 reopened this Mar 29, 2024
@briango28
Copy link
Contributor Author

Applied above discussions. The function now looks like this:

class TokenizerWithVocabulary:
    def compute_output_spec(self, input_spec) -> keras.KerasTensor:
        return keras.KerasTensor(
            input_spec.shape + (self.sequence_length,), dtype=self.compute_dtype
        )

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 29, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 29, 2024
@mattdangerw
Copy link
Member

Thank you!

@mattdangerw mattdangerw merged commit 5341426 into keras-team:master Mar 29, 2024
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
…s-team#1523)

* Implement compute_output_spec() for tokenizers with vocabulary. (restarted from new point in master branch)

* Remove type annotation from compute_output_spec() in tokenizers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants