You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the Textractor pipeline supports a limited set of Markdown (tables and lists). It's been shown that Markdown improves LLM generation accuracy as there is a baseline understanding of the importance of emphasis, links and headings.
This issue will add support for the following Markdown items.
Headings
Blockquotes
Code
Emphasis and links
This change will also improve article text parsing to ignore elements that are unlikely to be related to the main article content.
This change will also bypass text extraction when the mime-type is text/plain.
The text was updated successfully, but these errors were encountered:
Currently, the
Textractor
pipeline supports a limited set of Markdown (tables and lists). It's been shown that Markdown improves LLM generation accuracy as there is a baseline understanding of the importance of emphasis, links and headings.This issue will add support for the following Markdown items.
This change will also improve article text parsing to ignore elements that are unlikely to be related to the main article content.
This change will also bypass text extraction when the mime-type is
text/plain
.The text was updated successfully, but these errors were encountered: