Phocket---ML-Internship

This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.

Link to collab files:https://drive.google.com/open?id=1X07MHhVhrY8oWvP2VadjUjrzfJkCN5pW
Link to Datasets used:https://drive.google.com/open?id=1NC4CmlifjKnT94bNJvSV4_r9xD1UOWrA

1. Designing the preprocessing template

It was able to load the dataset on its own.
Fill the missing values using fillna() methods and the techniques you have used to fill them.
Using standard scalar functions to standardize the attributes of the column.
One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.

2. Design a template which identifies the 3 most important independent features in the dataset.

Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.
BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.
Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.
I have also applied PCA and derived 4 principal components and trained and tested the model.

3. Evaluation Of Classification model.

Analysis of ROC Curve
Finding when the model is being going through overfitting and when the model is being underfitted.
ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms
Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.
Health data set was used as a reference.

4. Topic Modelling

Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.
NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.
A proper template was build to understand how is the preprocessing of text based dataset is used.
IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.
Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values
Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF

5. SEQUENCE2SEQUENCE MODELLING.

Prediction of Song lyrics and different text based on feed data into the model
Completion of all the modules in coursera course and its assignments
Some extra assignments were given by the mentors to test weather we have really understood the concept or not.
3D visualization of these models in the tensorflow library and tools
Sarcasm dataset was used as reference for this task

6. Combining different models in the flask web app:

Learning how to combine flask and their models with the algorithm machine learning models.
There were around 3-4 projects going on in which I Combined the different models.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Black_Friday.ipynb		Black_Friday.ipynb
Datasets Link.txt		Datasets Link.txt
Flower_Recognition.ipynb		Flower_Recognition.ipynb
Keyword_Extraction.ipynb		Keyword_Extraction.ipynb
Pre-processing Template.ipynb		Pre-processing Template.ipynb
README.md		README.md
Recommendation_System.ipynb		Recommendation_System.ipynb
Topic_Modeling_in_Tweets.ipynb		Topic_Modeling_in_Tweets.ipynb
Twitter_Sentiment.ipynb		Twitter_Sentiment.ipynb
views_1_1_py.ipynb		views_1_1_py.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phocket---ML-Internship

About

Releases

Packages

Languages

kush1912/Phocket---ML-Internship

Folders and files

Latest commit

History

Repository files navigation

Phocket---ML-Internship

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages