Releases: gandersen101/spaczz
Releases · gandersen101/spaczz
v0.6.1 Regex[Searcher/Matcher] Bugfix
What’s Changed
🪲 Fixes
- Updating readthedocs config (#86) @gandersen101
- Partial regex matcher doesn't work if the found token has index 0 (#82) @adinowi
🚨 Testing
- Adding Test for Partial Regex Search at 0 Index (#85) @gandersen101
- Updating Dependencies to Test Against (#83) @gandersen101
👷 Continuous Integration
- Updating GH action versions (#84) @gandersen101
- Updating Dependencies to Test Against (#83) @gandersen101
📚 Documentation
- Updating readthedocs config (#86) @gandersen101
v0.6.0 Returning Patterns, Consistency and Support Updates
- Returning the matching pattern for all matchers, this is a breaking change as matches are now tuples of length 5 instead of 4.
- Regex and token matches now return match ratios.
- Support for
python<=3.11,>=3.7
, along withrapidfuzz>=1.0.0
. - Dropped support for spaCy v2. Sorry to do this without a deprecation cycle, but I stepped away from this project for a long time.
- Removed support of
"spaczz_"
preprended optionalSpaczzRuler
init arguments. Also, sorry to do this without a deprecation cycle. Matcher.pipe
methods, which were deprecated, are now removed.spaczz_span
custom attribute, which was deprecated, is now removed.
v0.5.4 RegexSearcher Bugfix
What’s Changed
- BugFix for german Combination words for RegexSearcher (#66) @JonasHablitzel
- Including flake8 plugins in pre-commit (#63) @gandersen101
📚 Documentation
- Updating available fuzzyfuncs in docs (#62) @gandersen101
v0.5.3 Bugfix: TokenMatcher Match Order
- Fixed a "bug" in the
TokenMatcher
. Spaczz expects token matches returned in order of ascending match start, then descending match length. However, spaCy'sMatcher
does not return matches in this order by default. Added a sort in theTokenMatcher
to ensure this.
v0.5.2 CI/Dev Updates
- Minor updates to pre-commits and noxfile.
v0.5.1 Dependency and Typing Updates
- Minor updates to allowed dependency versions and CI.
- Switched back to using typing types instead of generic types because spaCy v3 uses Pydantic and Pydantic does not support generic types in Python < 3.9. I don't know if this would actually cause any issues but I am playing it safe. Potentially more changes for spaczz to play nicely with Pydantic to follow.
v0.5.0 spaCy v3 Support
What’s Changed
🚀 Features
- Enhancement spacy3 support (#52) @gandersen101
- Support for spaCy v3.
- If using spaCy v3, the
SpaczzRuler
optional arguments no longer need to be prepended with"spaczz_"
. This will still work in most cases offering some backwards compatibility. However, optional arguments prepended with"spaczz_"
will not work with spaCy v3's newspacy.load
andnlp.add_pipe
config driven APIs. It is therefore recommended that users move away from using the prepended versions if using spaCy v3. It should be noted however that the prepended arguments are still necessary if using spaczz with spaCy v2. Matcher.pipe
methods are now deprecated in accordance with spaCy v3.spaczz_span
custom attribute is deprecated in favor ofspaczz_ent
. They both have the same functionality but the -spaczz_ent
name makes more sense.
v0.4.2 SpaczzRuler Bug Fixes
- Fixed a bug where TokenMatcher callbacks did nothing.
- Fixed a bug where spaczz_token_defaults in the SpaczzRuler did nothing.
- Fixed a bug where defaults would not be added to their respective matchers when loading from bytes/disk in the SpaczzRuler.
- Fixed some inconsistencies in the SpaczzRuler which will be particularly noticeable with ent_ids. See the "Known Issues" section below for more details.
- Small tweaks to spaczz custom attributes.
- Available fuzzy matching functions have changed in RapidFuzz and have changed in spaczz accordingly.
- Preparing for spaCy v3 updates.
v0.4.1 Phrasesearch Performance Improvements
- Spaczz's phrase searching algorithm has been further optimized so both the FuzzyMatcher and SimilarityMatcher should run considerably faster.
- The FuzzyMatcher and SimilarityMatcher now include a thresh parameter that defaults to 100. When matching, if flex > 0 and the match ratio is >= thresh during the initial scan of the document, no optimization will be attempted. By default perfect matches don't need to be run through match optimization.
- flex now defaults to len(pattern) // 2. This creates more meaningful difference between "default" and "max" with longer patterns.
- PEP585 code updates.
v0.4.0 TokenMatcher
Adds the TokenMatcher
to spaczz and integrates it with the SpaczzRuler
. Also overhauls spaczz's custom attributes and includes some quality of life improvements and bug fixes.