Replies: 3 comments 1 reply
-
Thank you for your feedback! We could add another predefined pipeline similar to the ones we have here to make the indexing simpler: https://docs.haystack.deepset.ai/docs/pipeline-templates#indexing |
Beta Was this translation helpful? Give feedback.
-
Hi there, this is a great suggestion, and we'll look at ways we can make it easier for our users. In the meantime, have you seen our integration with unstructured.io. They offer an easy way to load documents with different types seamlessly. |
Beta Was this translation helpful? Give feedback.
-
Hi Greg, let us know how you get on! We are always keen to hear from our users to prioritise the things we want to work on, would you be open to a 15 min chat with me? Your input would be extremely valuable for us. |
Beta Was this translation helpful? Give feedback.
-
I am in the "Processing Different File Types" section and I see the following code:
file_type_router = FileTypeRouter(mime_types=["text/plain", "application/pdf", "text/markdown"]) text_file_converter = TextFileToDocument() markdown_converter = MarkdownToDocument() pdf_converter = PyPDFToDocument()
And then later
preprocessing_pipeline = Pipeline() preprocessing_pipeline.add_component(instance=file_type_router, name="file_type_router") preprocessing_pipeline.add_component(instance=text_file_converter, name="text_file_converter") preprocessing_pipeline.add_component(instance=markdown_converter, name="markdown_converter") preprocessing_pipeline.add_component(instance=pdf_converter, name="pypdf_converter") preprocessing_pipeline.add_component(instance=document_joiner, name="document_joiner")
to form a pipeline. But is there a facility to autoprocess the documents? It seems extra work determining which file types you have, then creating a difference component for each file type. For advanced cases this gives you the ability to configure how each types get processed I suppose, but for the default case to just handle each document dynamically behind the scenes?
Beta Was this translation helpful? Give feedback.
All reactions