-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: text/html content not found in email #290
Comments
We solved that already in h2oGPT. See https://github.com/h2oai/h2ogpt i.e. https://github.com/h2oai/h2ogpt/blob/main/gpt_langchain.py#L364-L375 Basically the default mode for email loader is to assume html, it doesn't auto-detect. The other option is text/plain. |
How can I add it to privateGPT? |
You can make a PR to add the same kind of code I shared above. I gave link to the code itself. |
Thanks. I am not a coder. I will find how to make a PR to add your code. |
Then ask one of the devs here to do it. |
Done. #294 |
Describe the bug and how to reproduce it
I put ~4000 eml files in source_document folder. Run ingest.py and got:
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 78, in main
documents = load_documents(source_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 65, in load_documents
return [load_single_document(file_path) for file_path in all_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 65, in
return [load_single_document(file_path) for file_path in all_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 53, in load_single_document
return loader.load()[0]
^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/langchain/document_loaders/email.py", line 24, in _get_elements
return partition_email(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/unstructured/partition/email.py", line 249, in partition_email
raise ValueError(f"{content_source} content not found in email")
ValueError: text/html content not found in email
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: