-
Notifications
You must be signed in to change notification settings - Fork 119
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor SDPM and SemanticChunker, adding "auto" thresholding and fix…
…ing percentile mode - Updated SDPMChunker and SemanticChunker to replace similarity_threshold and similarity_percentile with a unified threshold parameter, enhancing clarity and usability. - Introduced new parameters: mode, min_sentences, min_characters_per_sentence, and threshold_step to provide more control over chunking behavior. - Refactored chunking logic to support both cumulative and window-based grouping of sentences, improving flexibility in semantic chunking. - Enhanced docstrings and method signatures for better documentation and understanding of class functionalities. - Updated tests to reflect changes in parameter names and ensure proper initialization and functionality of chunkers.
- Loading branch information
1 parent
05418e9
commit 5800480
Showing
5 changed files
with
295 additions
and
129 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.