Artificial Intelligence Laboratory course of the 6th semester's project.
This project focuses on creating an extractive text summarization model using Term Frequency-Inverse Document Frequency (TF-IDF) to generate concise summaries from large textual datasets. .
tf_idf_built_in_function.ipynb
: Implementation of the TF-IDF algorithm using built-in Python functions.tf_idf_raw_code.ipynb
: Manual implementation of the TF-IDF algorithm from scratch.
Images Folder
:output.png
: The output of the summarization process.process_flow.png
: A visual representation of the process flow.
ai_lab_final_report
: The final report available in.docx
,.pdf
, and.zip
formats for LaTeX.
-
Generated Text Data Folder
:bangladesh_small.txt
: Sample text data used in the project.
-
text_pre_processing.ipynb
: Jupyter notebook for preprocessing text data. -
web_scraper.ipynb
: Jupyter notebook for scraping text data from the web.
final_presentation.pptx
: The final presentation for the project.
ai_proposal_cover
: The cover page of the project proposal in.docx
and.pdf
formats.ai_proposal_main
: The main content of the project proposal in.docx
and.pdf
formats.
reference_pdf.zip
: A collection of research papers and other reference materials for convenience.
Note: The reference_pdf.zip
file contains some research papers that were downloaded in PDF format for convenience and were used for this project.
- These PDFs and materials may be subject to copyright.
- I do not own these materials nor do I have permission to distribute them.
- They are provided solely for educational purposes, to facilitate access to reference papers.
- Please cite these sources appropriately if you use them.
-
Code Execution:
- The code for the project is located in the
Code
folder. - Use
tf_idf_raw_code.ipynb
to explore the raw implementation. - Use
tf_idf_built_in_function.ipynb
for a version using built-in functions.
- The code for the project is located in the
-
Preprocessing:
- The
Preprocessing
folder contains the scripts used to clean and preprocess the text data. text_pre_processing.ipynb
handles text data cleaning.web_scraper.ipynb
is used to scrape data from web sources.
- The
-
Final Report:
- The
Final Report
folder contains the final documentation of the project. - You can find
output.png
andprocess_flow.png
in theImages
folder. - The final report is available in
.docs
,.pdf
, and.zip
(for LaTeX) formats.
- The
-
Presentation:
- The
Presentation
folder includesfinal_presentation.pptx
which summarizes the project for presentations.
- The
-
Project Proposal:
- The
Project Proposal
folder contains the proposal documents in both.docx
and.pdf
formats.
- The
Enter size of your summary: 3
3 lines sized summary:
Sentence: Russell's viper (Daboia russelii) is responsible for nearly half of snakebites in neighboring India, but in Bangladesh, where it’s known as chandra bora, it was thought to be an exceedingly rare species for more than a century.
Sentence: Hospitals in rural Bangladesh have reported an increase in people being bitten by snakes, especially by the Russell's viper, which is found in South Asia.
Sentence: A series of stories have been making rounds on social media, of people dying in different parts of Bangladesh from the bite of the Russell's viper, a venomous snake.
Feel free to fork this repository, create a new branch, and submit pull requests.
This project is open-source and available under the MIT License.