Skip to content

This project is a text prediction, meaning the model has to predict which text out of 2 (A ,B) is easier to understand for a person.

Notifications You must be signed in to change notification settings

nicu-chiriac/Text-Prediction

Repository files navigation

Text-Prediction

This project is a text prediction, meaning the model has to predict which text out of 2 (A ,B) is easier to understand for a person.

The project is made with Python and the following main libraries : Pandas, Numpy, Sklearn and NLTK

Library Description
nltk (Natural Language Toolkit) Used for natural language processing tasks in Python
stopwords is a corpus from the nltk library that contains a list of commonly used words in texts that are often ignored in NLP tasks
PorterStemmer is a class from the nltk library that is used for stemming
pandas useful for data manipulation and analysis
numpy used for numerical operations in Python
TextBlob library for processing data from a text, provides support in common natural language processing tasks such as part-of-speech tagging, sentiment analysis, classification, translation, and more
sklearn.ensemble. RandomForestClassifier is a class for the Random Forest algorithm from the scikit-learn library
sklearn.preprocessing.MinMaxScaler is a class from the scikit-learn library for scaling data between a given minimum and maximum value
word_tokenize used to tokenize text (split it into individual words)
FreqDist used for counting word frequency
cmudict corpus from the nltk library containing phonetic pronunciations of words
textstat Python library for text statistics
sklearn.model_selection.GridSearchCV help create a grid for obtaining combinations of hyperparameters (cross-validation)

The main steps in this project are : text pre-processing, feature exraction, dictionary associations, model training,validation and test.

A more detalied explanation is found in the "Predictie SII Chiriac Nicu Manuel.docx" file (romanian language)

About

This project is a text prediction, meaning the model has to predict which text out of 2 (A ,B) is easier to understand for a person.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published