-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update unstructured and associated examples #688
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #688 +/- ##
==========================================
+ Coverage 87.31% 87.34% +0.02%
==========================================
Files 113 113
Lines 10791 10791
Branches 1479 1479
==========================================
+ Hits 9422 9425 +3
+ Misses 991 989 -2
+ Partials 378 377 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -51,16 +52,17 @@ def process_pdf(file: File) -> Iterator[Chunk]: | |||
) | |||
chunk.apply(replace_unicode_quotes) | |||
chunk.apply(group_broken_paragraphs) | |||
text_chunks.append({"text": str(chunk)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caused by this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mattseddon !
Follow up to #687, means we don't have to wait for Unstructured-IO/unstructured#3730 (I will close that PR if we merge this) to upgrade the unstructured package.