Skip to content

Commit

Permalink
Merge pull request #8 from moekidev/add-chunks-with-metadata
Browse files Browse the repository at this point in the history
Add chunks with metadata
  • Loading branch information
moekiorg authored Sep 26, 2023
2 parents feb2fd8 + aa44dd6 commit 0832571
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 6 deletions.
2 changes: 1 addition & 1 deletion lib/baran/character_text_splitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ def splitted(text)
merged(splits, @separator)
end
end
end
end
8 changes: 5 additions & 3 deletions lib/baran/text_splitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@ def splitted(text)
raise NotImplementedError, "splitted method should be implemented in a subclass"
end

def chunks(text)
def chunks(text, metadata: nil)
cursor = 0
chunks = []

splitted(text).compact.each do |chunk|
chunks << { text: chunk, cursor: cursor }
chunk = { text: chunk, cursor: cursor }
chunk[:metadata] = metadata if metadata
chunks << chunk
cursor += chunk.length
end

Expand Down Expand Up @@ -56,4 +58,4 @@ def merged(splits, separator)
results
end
end
end
end
2 changes: 1 addition & 1 deletion test/test_character_text_splitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ def test_chunks

assert_equal(chunks.length, 3)
end
end
end
2 changes: 1 addition & 1 deletion test/test_recursive_character_text_splitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ def test_empty_chunks

assert_equal(chunks.length, 6)
end
end
end
8 changes: 8 additions & 0 deletions test/test_text_splitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,14 @@ def test_chunks
assert_equal 'text', documents[0][:text]
end

def test_chunks_with_metadata
text = 'text one'
metadata = { page: 1 }
documents = @test_splitter.chunks(text, metadata: metadata)

assert_equal({ page: 1 }, documents[0][:metadata])
end

def test_joined
items = ['one', 'two', 'three']
separator = ' '
Expand Down

0 comments on commit 0832571

Please sign in to comment.