Skip to content

v0.2.2

Compare
Choose a tag to compare
@bhavnicksm bhavnicksm released this 06 Dec 22:56
· 363 commits to main since this release
475f08d

Highlights

  • Added Token Estimate Validate Loops inside the SentenceChunker for higher speed of upto ~5x at times
  • Added auto thresholding mode for SemanticChunkers to remove similarity_threshold hard requirement. SemanticChunkers can decide on their own threshold now, based on the minimum and maximum
  • Added OverlapRefinery for adding overlap context to the chunks. chunk_overlap parameter will be deprecated in the future for OverlapRefinery instead.

What's Changed

  • [Fix] AutoEmbeddings not loading all-minilm-l6-v2 but loads All-MiniLM-L6-V2 by @bhavnicksm in #57
  • [Fix] Add fix for #55 by @bhavnicksm in #58
  • [Refactor] Add min_chunk_size parameter to SemanticChunker and SentenceChunker by @bhavnicksm in #60
  • [Update] Bump version to 0.2.1.post1 and require Python 3.9 or higher by @bhavnicksm in #62
  • [Update] Change default embedding model in SemanticChunkers by @bhavnicksm in #63
  • Add min_chunk_size to SDPMChunker + Lint codebase with ruff + minor changes by @bhavnicksm in #68
  • Added automated testing using Github Actions by @pratyushmittal in #66
  • Add support for automated testing with Github Actions by @bhavnicksm in #69
  • [Fix] Allow for functions as token_counters in BaseChunkers by @bhavnicksm in #70
  • Add TEVL to speed up sentence chunker by @bhavnicksm in #71
  • Add TEVL to speed-up sentence chunking by @bhavnicksm in #72
  • Update the docs path to docs.chonkie.ai by @bhavnicksm in #75
  • [FEAT] Add BaseRefinery and OverlapRefinery support by @bhavnicksm in #77
  • Add support for BaseRefinery and OverlapRefinery + minor changes by @bhavnicksm in #78
  • [FEAT] Add "auto" threshold configuration via Statistical analysis in SemanticChunker + minor fixes by @bhavnicksm in #79
  • [Fix] Unify dataclasses under a types.py for ease by @bhavnicksm in #80
  • Expose the seperation delim for simple multilingual chunking by @bhavnicksm in #81
  • Bump version to v0.2.2 for release by @bhavnicksm in #82

New Contributors

Full Changelog: v0.2.1...v0.2.2