-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back section chunking #322
Conversation
…section, imporve vision api function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co-authored by me so approved :)
for (let i = 0; i < markdown.length; i += chunkSize - overlapSize) { | ||
const chunk = markdown.slice(i, i + chunkSize).trim() | ||
if (chunk.length > 0) chunks.push(chunk) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked this idea, but you likely have a reason for reverting back to the section-based chunking we used before.
src/workers/nlmExtractTables.ts
Outdated
const filename = tablesOnPage[0].filename | ||
const markdown = await extractTextViaVisionAPI( | ||
{ filename, name: `Tables from page ${page_idx}` }, | ||
context.slice(-3).join('\n') | ||
lastPageMarkdown.slice(0, 5000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why limit to 5000 characters here?
Does this mean that we send a larger context compared to before when we only sent the last 3 tables?
🎸 Fix chroma range error bugg
🎸 improved reduce funciton to vision api
Fix #276