Skip to content

Artificial Intelligence Laboratory (6th semester) course's project.

License

Notifications You must be signed in to change notification settings

rayan2162/extractive_text_summarization_using_TF-IDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractive Text Summarization using TF-IDF

Artificial Intelligence Laboratory course of the 6th semester's project.

This project focuses on creating an extractive text summarization model using Term Frequency-Inverse Document Frequency (TF-IDF) to generate concise summaries from large textual datasets. .

Project Structure

1. Code Folder

  • tf_idf_built_in_function.ipynb: Implementation of the TF-IDF algorithm using built-in Python functions.
  • tf_idf_raw_code.ipynb: Manual implementation of the TF-IDF algorithm from scratch.

2. Final Report Folder

  • Images Folder:
    • output.png: The output of the summarization process.
    • process_flow.png: A visual representation of the process flow.
  • ai_lab_final_report: The final report available in .docx, .pdf, and .zip formats for LaTeX.

3. Preprocessing Folder

  • Generated Text Data Folder:

    • bangladesh_small.txt: Sample text data used in the project.
  • text_pre_processing.ipynb: Jupyter notebook for preprocessing text data.

  • web_scraper.ipynb: Jupyter notebook for scraping text data from the web.

4. Presentation Folder

  • final_presentation.pptx: The final presentation for the project.

5. Project Proposal Folder

  • ai_proposal_cover: The cover page of the project proposal in .docx and .pdf formats.
  • ai_proposal_main: The main content of the project proposal in .docx and .pdf formats.

6. Reference Materials

  • reference_pdf.zip: A collection of research papers and other reference materials for convenience.

Note: The reference_pdf.zip file contains some research papers that were downloaded in PDF format for convenience and were used for this project.

  • These PDFs and materials may be subject to copyright.
  • I do not own these materials nor do I have permission to distribute them.
  • They are provided solely for educational purposes, to facilitate access to reference papers.
  • Please cite these sources appropriately if you use them.

How to Use

  1. Code Execution:

    • The code for the project is located in the Code folder.
    • Use tf_idf_raw_code.ipynb to explore the raw implementation.
    • Use tf_idf_built_in_function.ipynb for a version using built-in functions.
  2. Preprocessing:

    • The Preprocessing folder contains the scripts used to clean and preprocess the text data.
    • text_pre_processing.ipynb handles text data cleaning.
    • web_scraper.ipynb is used to scrape data from web sources.
  3. Final Report:

    • The Final Report folder contains the final documentation of the project.
    • You can find output.png and process_flow.png in the Images folder.
    • The final report is available in .docs, .pdf, and .zip (for LaTeX) formats.
  4. Presentation:

    • The Presentation folder includes final_presentation.pptx which summarizes the project for presentations.
  5. Project Proposal:

    • The Project Proposal folder contains the proposal documents in both .docx and .pdf formats.

Example Usage

Enter size of your summary: 3

3 lines sized summary:

Sentence: Russell's viper (Daboia russelii) is responsible for nearly half of snakebites in neighboring India, but in Bangladesh, where it’s known as chandra bora, it was thought to be an exceedingly rare species for more than a century.
Sentence: Hospitals in rural Bangladesh have reported an increase in people being bitten by snakes, especially by the Russell's viper, which is found in South Asia.
Sentence: A series of stories have been making rounds on social media, of people dying in different parts of Bangladesh from the bite of the Russell's viper, a venomous snake.

Contributing

Feel free to fork this repository, create a new branch, and submit pull requests.

License

This project is open-source and available under the MIT License.