Skip to content

Commit

Permalink
Add ntbk
Browse files Browse the repository at this point in the history
  • Loading branch information
rlancemartin committed Jun 25, 2023
1 parent 0195b86 commit 3bd124e
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "87067cdf",
"metadata": {},
"source": [
"# mhtml\n",
"\n",
"MHTML is a is used both for emails but also for archived webpages. MHTML, sometimes referred as MHT, stands for MIME HTML is a single file in which entire webpage is archived. When one saves a webpage as MHTML format, this file extension will contain HTML code, images, audio files, flash animation etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d4c6174",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import MHTMLLoader"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "12dcebc8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='LangChain\\nLANG CHAIN 🦜️🔗Official Home Page\\xa0\\n\\n\\n\\n\\n\\n\\n\\nIntegrations\\n\\n\\n\\nFeatures\\n\\n\\n\\n\\nBlog\\n\\n\\n\\nConceptual Guide\\n\\n\\n\\n\\nPython Repo\\n\\n\\nJavaScript Repo\\n\\n\\n\\nPython Documentation \\n\\n\\nJavaScript Documentation\\n\\n\\n\\n\\nPython ChatLangChain \\n\\n\\nJavaScript ChatLangChain\\n\\n\\n\\n\\nDiscord \\n\\n\\nTwitter\\n\\n\\n\\n\\nIf you have any comments about our WEB page, you can \\nwrite us at the address shown above. However, due to \\nthe limited number of personnel in our corporate office, we are unable to \\nprovide a direct response.\\n\\nCopyright © 2023-2023 LangChain Inc.\\n\\n\\n' metadata={'source': '../../../../../../tests/integration_tests/examples/example.mht', 'title': 'LangChain'}\n"
]
}
],
"source": [
"# Create a new loader object for the MHTML file\n",
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
"\n",
"# Load the document from the file\n",
"documents = loader.load()\n",
"\n",
"# Print the documents to see the results\n",
"for doc in documents:\n",
" print(doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions langchain/document_loaders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
from langchain.document_loaders.max_compute import MaxComputeLoader
from langchain.document_loaders.mediawikidump import MWDumpLoader
from langchain.document_loaders.merge import MergedDataLoader
from langchain.document_loaders.mhtml import MHTMLLoader
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
from langchain.document_loaders.notebook import NotebookLoader
from langchain.document_loaders.notion import NotionDirectoryLoader
Expand Down Expand Up @@ -204,6 +205,7 @@
"MathpixPDFLoader",
"MaxComputeLoader",
"MergedDataLoader",
"MHTMLLoader",
"ModernTreasuryLoader",
"NotebookLoader",
"NotionDBLoader",
Expand Down

0 comments on commit 3bd124e

Please sign in to comment.