Release v0.2.2 · chonkie-ai/chonkie

Highlights

Added Token Estimate Validate Loops inside the SentenceChunker for higher speed of upto ~5x at times
Added auto thresholding mode for SemanticChunkers to remove similarity_threshold hard requirement. SemanticChunkers can decide on their own threshold now, based on the minimum and maximum
Added OverlapRefinery for adding overlap context to the chunks. chunk_overlap parameter will be deprecated in the future for OverlapRefinery instead.

[Fix] AutoEmbeddings not loading all-minilm-l6-v2 but loads All-MiniLM-L6-V2 by @bhavnicksm in #57
[Fix] Add fix for #55 by @bhavnicksm in #58
[Refactor] Add min_chunk_size parameter to SemanticChunker and SentenceChunker by @bhavnicksm in #60
[Update] Bump version to 0.2.1.post1 and require Python 3.9 or higher by @bhavnicksm in #62
[Update] Change default embedding model in SemanticChunkers by @bhavnicksm in #63
Add min_chunk_size to SDPMChunker + Lint codebase with ruff + minor changes by @bhavnicksm in #68
Added automated testing using Github Actions by @pratyushmittal in #66
Add support for automated testing with Github Actions by @bhavnicksm in #69
[Fix] Allow for functions as token_counters in BaseChunkers by @bhavnicksm in #70
Add TEVL to speed up sentence chunker by @bhavnicksm in #71
Add TEVL to speed-up sentence chunking by @bhavnicksm in #72
Update the docs path to docs.chonkie.ai by @bhavnicksm in #75
[FEAT] Add BaseRefinery and OverlapRefinery support by @bhavnicksm in #77
Add support for BaseRefinery and OverlapRefinery + minor changes by @bhavnicksm in #78
[FEAT] Add "auto" threshold configuration via Statistical analysis in SemanticChunker + minor fixes by @bhavnicksm in #79
[Fix] Unify dataclasses under a types.py for ease by @bhavnicksm in #80
Expose the seperation delim for simple multilingual chunking by @bhavnicksm in #81
Bump version to v0.2.2 for release by @bhavnicksm in #82

Full Changelog: v0.2.1...v0.2.2