Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Load exceptions last in Tokenizer.from_bytes (explosion#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that they are only processed once as part of loading the model. The exceptions are tokenized as phrase matcher patterns in the background and the internal tokenization needs to be synced with all the remaining tokenizer settings. If the exceptions are not loaded last, there are speed regressions for `Tokenizer.from_bytes/disk` vs. `Tokenizer.add_special_case` as the caches are reloaded more than necessary during deserialization.
- Loading branch information