Skip to content

This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System

Notifications You must be signed in to change notification settings

kush1912/Phocket---ML-Internship

Repository files navigation

Phocket---ML-Internship

This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.

1. Designing the preprocessing template

  • It was able to load the dataset on its own.
  • Fill the missing values using fillna() methods and the techniques you have used to fill them.
  • Using standard scalar functions to standardize the attributes of the column.
  • One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.

2. Design a template which identifies the 3 most important independent features in the dataset.

  • Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.
  • BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.
  • Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.
  • I have also applied PCA and derived 4 principal components and trained and tested the model.

3. Evaluation Of Classification model.

  • Analysis of ROC Curve
  • Finding when the model is being going through overfitting and when the model is being underfitted.
  • ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms
  • Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.
  • Health data set was used as a reference.

4. Topic Modelling

  • Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.
  • NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.
  • A proper template was build to understand how is the preprocessing of text based dataset is used.
  • IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.
  • Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values
  • Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF

5. SEQUENCE2SEQUENCE MODELLING.

  • Prediction of Song lyrics and different text based on feed data into the model
  • Completion of all the modules in coursera course and its assignments
  • Some extra assignments were given by the mentors to test weather we have really understood the concept or not.
  • 3D visualization of these models in the tensorflow library and tools
  • Sarcasm dataset was used as reference for this task

6. Combining different models in the flask web app:

  • Learning how to combine flask and their models with the algorithm machine learning models.
  • There were around 3-4 projects going on in which I Combined the different models.