Jupyter notebooks for text summarization using Deep Learning techniques
The purpose of this project is to produce a model for Abstractive Text Summarization, starting with the RNN encoder-decoder as the baseline model. From there, we come across the effectiveness of different methods for attention in abstractive summarization. These methods try to first understand the text and then rephrase it in a shorter manner, using possibly different words. For perfect abstractive summary, the model has to first truly understand the document and then try to express that understanding in short possibly using new words and phrases. We have used the concept of an encoder-decoder recurrent neural network with LSTM units and attention to generate summary from a given text.
- Word Embeddings using GloVe (Global Vectors)
- Encoder-decoder using RNN(Recurrent Neural Network)
- Python
- Keras Library
- TensorFlow
- Jupyter
- etc.
In this project we have used a sample dataset of news articles (CNN , Daily Mail). Currently we are facing a problem in implementing the pointer-generator network.
CyclicLR(mode='triangular2', base_lr= 0.2, max_lr= 0.001, step_size= (len(padded_sorted_texts)*0.9/BATCH_SIZE) * 2)
- ConceptNet Numbernatch word embeddings were used to encode the word meanings
-
Clone this repo (for help see this tutorial).
-
Raw Data is being kept on the local storage at the location ~/Text-Summarization/Original_data/cnn/stories
-
Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)
-
Installation steps: Use
single backticks
to call out code or a command within a sentence.
To format code or text into its own distinct block, use triple backticks
example:
git status
git commit -m
Team Lead (Contacts) : Nikhil Gupta
Blair Fernandes, [Asjad Baig]
- Feel free to contact me on nikhil.css97@gmail.com with any questions or if you are interested in contributing!