Release v2.0.3: Improvements to tokenizer caching and serialization, plus various bug fixes · explosion/spaCy

✨ New features and improvements

Fix issue #1248: Update English tokenizer and norm exceptions for "-in" and "-in'" verbs.
Fix issue #1506: Fix KeyError from cleaning up strings during Language.pipe (work in progress).
Fix issue #1521: Ensure path in Doc.to_disk and Doc.from_disk.
Fix issue #1525, #1582: Update fastText example to accommodate whitespace.
Fix issue #1541: Remove broken link from documentation.
Fix issue #1546: Add missing import to make util.minibatch work correctly.
Fix issue #1557: Add dummy serialization methods to Japanese tokenizer to allow saving and loading models.
Fix caching in Tokenizer (partially addresses performance regression in #1371 and #1508).

Thanks to @MathiasDesch, @mcsalgado, @Wahib, @ligser, @abhi18av, @DuyguA, @KMLDS and @yogendrasoni for the pull requests and contributions.