The PDF Summarizer is a command-line tool designed to help users manage and perform various operations on PDF files. This README provides a clear overview of how to use the tool, highlighting its key functionalities and their implementations.
-
Clone the repository:
git clone https://github.com/Programming-Sai/PDF-Summarizer.git
-
Navigate to the project directory
cd PDF-Summarizer
-
Create a Virtual Environment and activate it.
python -m venv .ospdf-venv .ospdf-venv\Scripts\activate # Windows OR source .ospdf-venv/bin/activate # MacOS/Linux
Important
Make sure to select the new virtual environment .ospdf-venv
as your interpreter in VS Code. Use the shortcut Ctrl + Shift + P
(Windows/Linux) or Cmd + Shift + P
(Mac), then type and select "Python: Select Interpreter". Choose the interpreter option marked Recommended
or Python 3.x.x ('.ospdf-venv':venv)
.
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application:
python -u main.py
On running the application, you should see an output similar to this:
_____ _____ ( ___ )---------------------------( ___ ) | | | | | | _ __ | | | | ___ ___ _ __ __| |/ _| | | | | / _ \/ __| '_ \ / _` | |_ | | | | | (_) \__ \ |_) | (_| | _| | | | | \___/|___/ .__/ \__,_|_| | | | | |_| | | |___| |___| (_____)---------------------------(_____) Welcome to PDF Summarizer! Version: 0.0.1 PDF Summarizer helps you manage and work with PDFs. Here are some of the things you can do: - Summarize PDF content based on highlighted text. - Split a PDF into individual pages or ranges. - Merge multiple PDFs into one. - Convert a PDF page into an image. Tips ------ - Use `init` to set the input file once and avoid specifying it repeatedly. - Reset your session with `init -r` to start fresh. - Use `-h` or `--help` when in doubt.
Description: Extract and summarize highlighted text from a PDF file.
- Implementation:
- Parses the PDF for annotations.
- Extracts the highlighted content.
- Optionally includes images from the PDF in the output.
- What it does:
- Produces a summary as plain text, a PDF, or a Word document.
Usage:
python main.py summarize --input-path <path_to_pdf> --output-path <output_path>
Description: Extract specific pages or ranges of pages from a PDF.
- Implementation:
- Uses a PDF parser to split the document based on page indices.
- Saves the extracted pages as a new PDF.
- What it does:
- Enables breaking large PDFs into smaller, more manageable files.
Usage:
python main.py split <path_to_pdf> <output_pdf> --start-page <start> --end-page <end>
Description: Combine multiple PDF files into one.
- Implementation:
- Reads the input PDFs.
- Concatenates their pages in the specified order.
- Outputs a single, merged PDF.
- What it does:
- Consolidates multiple related documents into a single file.
Usage:
python main.py merge <output_pdf> <input_pdf_1> <input_pdf_2> ...
Description: Convert a single page of a PDF into an image.
- Implementation:
- Extracts the specified page from the PDF.
- Renders the page as an image.
- Saves the image in the desired format (e.g., PNG, JPEG).
- What it does:
- Enables visual representation of PDF content for use in presentations or web pages.
Usage:
python main.py pdf2img <path_to_pdf> <output_image> <page_number>
- Initialization: Use the
init
command to set a default PDF file for your session, eliminating the need to specify the file repeatedly for each operation. - Help: Add
-h
or--help
to any command for detailed usage instructions. - Reset: Start fresh by resetting the session with the
init -r
command.
-
Ensure you have Python 3.10+ installed.
-
Verify dependencies are correctly installed using:
pip list
-
If a command fails, check the help menu for correct syntax.