Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a MHTML document loader #6311

Merged
merged 3 commits into from
Jun 25, 2023

Commits on Jun 25, 2023

  1. Added a MHTML document loader

    MHTML is a very interesting format since it's used both for emails but
    also for archived webpages. Some scraping projects want to store pages in
    disk to process them later, mhtml is perfect for that use case.
    
    This is heavily inspired from the beautifulsoup html loader, but
    extracting the html part from the mhtml file.
    masylum authored and rlancemartin committed Jun 25, 2023
    Configuration menu
    Copy the full SHA
    b830317 View commit details
    Browse the repository at this point in the history
  2. Format

    rlancemartin committed Jun 25, 2023
    Configuration menu
    Copy the full SHA
    0195b86 View commit details
    Browse the repository at this point in the history
  3. Add ntbk

    rlancemartin committed Jun 25, 2023
    Configuration menu
    Copy the full SHA
    3bd124e View commit details
    Browse the repository at this point in the history