Support MarkItDown to extract text. #950
alkampfergit
started this conversation in
2. Feature requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
New Microsoft library https://github.com/microsoft/markitdown promise extracting structured text from pdf and other documents.
We should allow using it to chunk documents, it could be done using direct invocatino with command line with process.invoke, or wrapping in a python file with fast api (more complex to deploy).
Beta Was this translation helpful? Give feedback.
All reactions