- 🐛 Fix trailing period/ellipses with spaces - #83
- 🐛 Regex escape for parenthesis - #87
- 🐛 Better handling consecutive periods and reserved special symbols - allenai/scholarphi#114
- Add CONTRIBUTING.md
- 🐛 ✅ Enforce clean=True when doc_type="pdf" - #75
- 🚑 ✅ Handle Newline character & update tests
- ✨ 💫 Support Multiple languages - #2
- 🏎⚡️💯 Benchmark across Segmentation Tools, Libraries and Algorithms
- 🎨 ♻️ Update sentence char_span logic
- ⚡️ Performance improvements - #41
- ♻️🐛 Refactor AbbreviationReplacer
- ✨ 💫 sent
char_span
through with spaCy & regex approach - #63 - ♻️ Refactoring to support multiple languages
- ✨ 💫Initial language support for - Hindi, Marathi, Chinese, Spanish
- ✅ Updated tests - more coverage & regression tests for issues
- 👷👷🏻♀️ GitHub actions for CI-CD
- 💚☂️ Add code coverage - coverage.py Add Codecov
- 🐛 Fix incorrect text span & vanilla pysbd vs spacy output discrepancy - #49, #53, #55 , #59
- 🐛 Fix
NUMBERED_REFERENCE_REGEX
for zero or one time - #58 - 🔐Fix security vulnerability bleach - #62
- 🐛 Performance improvement in
abbreviation_replacer
- #50
- 🐛 Fix unbalanced parenthesis - #47
- ✨pySBD as a spaCy component through entrypoints
- ✨Add
char_span
parameter (optional) to get sentence & its (start, end) char offsets from original text - ✨pySBD as a spaCy component example
- 🐛 Fix double question mark swallow bug - #39
- 🐛 Handle text with only punctuations - #36
- 🐛 Handle exclamation marks at EOL- #37
- ✨ ✅ Handle intermittent punctuations - #34
- 🐛 Fix
lists_item_replacer
- #29 - 🐛 Fix & ♻️refactor
replace_multi_period_abbreviations
- #30 - 🐛 Fix
abbreviation_replacer
- #31 - ✅ Add regression tests for issues
- 🐛BugFix - IndexError of
scanlists
function
- English language support only
- Support for oother languages - WIP
- Initial Release