Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.0.1a4 #1

Merged
merged 21 commits into from
Nov 2, 2024
Merged

v0.0.1a4 #1

merged 21 commits into from
Nov 2, 2024

Conversation

bhavnicksm
Copy link
Collaborator

This pull request introduces comprehensive documentation and a new chunking strategy to the chonkie library, along with some structural and initialization changes. The most important changes include the addition of detailed documentation, a new semantic chunking strategy, and updates to the initialization files.

Documentation:

  • DOCS.md: Added detailed documentation for the TokenChunker class, including initialization, methods, and examples.
  • README.md: Enhanced the README with a logo, a detailed introduction, usage instructions, and citation guidelines.

New Features:

  • chonkie/chunker/semantic.py: Introduced a new SemanticChunker class that groups sentences based on semantic similarity and splits them into chunks while maintaining sentence boundaries.

Structural Updates:

  • chonkie/__init__.py: Updated the initialization file to include imports for all chunker classes and defined __all__ for module exports.
  • chonkie/chunker/__init__.py: Added imports for all chunker classes and defined __all__ for module exports.

Base Class Addition:

  • chonkie/chunker/base.py: Added a BaseChunker abstract class and a Chunk dataclass to serve as the foundation for all chunker implementations.

@bhavnicksm bhavnicksm merged commit 380fdc2 into main Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant