-
Baseline Classifiers: https://developers.google.com/machine-learning/guides/text-classification
Followed the step by step process to create a classifier with Embedding, Dropout and Dense layer using Keras API -
BERT Classifiers: https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-multiclass-classification
Used the starter code provided for using the BERT Classification wrapper -
load_embeddings function picked up from NLP assignments
-
emoticons list for pre-processing: https://towardsdatascience.com/extracting-twitter-data-pre-processing-and-sentiment-analysis-using-python-3-0-7192bd8b47cf
-
embeddings downloaded from: https://worksheets.codalab.org/worksheets/0x84b71dd010cf4bff8d9f59cc22b49344
-
Resampling code referred from: https://elitedatascience.com/imbalanced-classes
- Change the data directory to point to where the dataset is located.
- Each of the sections in the IPython Notebook provided - 'Final NLP_Project.ipynb' are clearly marked.
- If you want to start with the baseline classifiers:
Run all the cells before and including the one under section Model Training - This will load the data, perform pre-processing, up-sampling, convert the text to token sequence and start training the models one by one for each experiment.
Once training is completed, classification report for the validation set will be generated alongwith a submission.csv file to submit to Kaggle.
Also, the created model is saved in the output/ directory. - Here on, the rest of the experiments can be run individually.
- For training the BERT model Run the cells under Bert Loading and Training. The directories for caching and sacing models can be given as parameter to the MultiLabelClassification Model Wrapper.
- For Binary Classification and Twitter Transfer Learning Experiment, run the code under section 7 and 8 respectively.
Code implemented on Google Colab.
- TensorFlow 2.x
- Python 3.6
- Keras provided by Tensor Flow to be used
- For SimpleTransformers BERT implementation, the commands are provided within the notebook but some are commented out. Please run them as required:
i) install Anaconda and by extension conda
ii) install pytorch cudatoolkit using codalab
iii) install simpletransformers