This project is a text prediction, meaning the model has to predict which text out of 2 (A ,B) is easier to understand for a person.
Library | Description |
nltk (Natural Language Toolkit) | Used for natural language processing tasks in Python |
stopwords | is a corpus from the nltk library that contains a list of commonly used words in texts that are often ignored in NLP tasks |
PorterStemmer | is a class from the nltk library that is used for stemming |
pandas | useful for data manipulation and analysis |
numpy | used for numerical operations in Python |
TextBlob | library for processing data from a text, provides support in common natural language processing tasks such as part-of-speech tagging, sentiment analysis, classification, translation, and more |
sklearn.ensemble. RandomForestClassifier | is a class for the Random Forest algorithm from the scikit-learn library |
sklearn.preprocessing.MinMaxScaler | is a class from the scikit-learn library for scaling data between a given minimum and maximum value |
word_tokenize | used to tokenize text (split it into individual words) |
FreqDist | used for counting word frequency |
cmudict | corpus from the nltk library containing phonetic pronunciations of words |
textstat | Python library for text statistics |
sklearn.model_selection.GridSearchCV | help create a grid for obtaining combinations of hyperparameters (cross-validation) |