Exception: Custom Normalizer cannot be serialized #1361

shivanraptor · 2023-10-09T07:49:46Z

I got the codes from here, and I tried to save the trained tokenizer, it said:

Exception: Custom Normalizer cannot be serialized

How can I resolve this exception?

The custom normalizer is as follows:

class CustomNormalizer:
    def normalize(self, normalized: NormalizedString):
        # Most of these can be replaced by a `Sequence` combining some provided Normalizer,
        # (ie Sequence([ NFKC(), Replace(Regex("\s+"), " "), Lowercase() ])
        # and it should be the prefered way. That being said, here is an example of the kind
        # of things that can be done here:
        try:
            if normalized is None:
                noramlized = NormalizedString("")
            else:
                normalized.nfkc()
                normalized.filter(lambda char: not char.isnumeric())
                normalized.replace(Regex("\s+"), " ")
                normalized.lowercase()
        except TypeError as te:
            print("CustomNormalizer TypeError:", te)
            print(normalized)

And the custom Tokenizer is as follows:

model = models.WordPiece(unk_token="[UNK]")
tokenizer = Tokenizer(model)
tokenizer.normalizer = Normalizer.custom(CustomNormalizer())
trainer = trainers.WordPieceTrainer(
    vocab_size=2500, 
    special_tokens=special_tokens,
    show_progress=True
)
tokenizer.train_from_iterator(get_training_corpus(), trainer=trainer, length=len(dataset))

# Save the Tokenizer result
tokenizer.save('saved.json') # in this line, it gives Exception

The text was updated successfully, but these errors were encountered:

shivanraptor · 2023-10-10T02:02:35Z

Similar case as #581

shivanraptor closed this as completed Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Custom Normalizer cannot be serialized #1361

Exception: Custom Normalizer cannot be serialized #1361

shivanraptor commented Oct 9, 2023

shivanraptor commented Oct 10, 2023

Exception: Custom Normalizer cannot be serialized #1361

Exception: Custom Normalizer cannot be serialized #1361

Comments

shivanraptor commented Oct 9, 2023

shivanraptor commented Oct 10, 2023