This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System.
- Link to collab files:https://drive.google.com/open?id=1X07MHhVhrY8oWvP2VadjUjrzfJkCN5pW
- Link to Datasets used:https://drive.google.com/open?id=1NC4CmlifjKnT94bNJvSV4_r9xD1UOWrA
1. Designing the preprocessing template
- It was able to load the dataset on its own.
- Fill the missing values using fillna() methods and the techniques you have used to fill them.
- Using standard scalar functions to standardize the attributes of the column.
- One hot encoding of categorical features so that they could be sent to the algorithmic models which uses numerical models to build the model.
2. Design a template which identifies the 3 most important independent features in the dataset.
- Used the above mentioned preprocessing template to preprocess the data which in way shows the utility of in work.
- BLACK FRIDAY DATASET was used as reference-One of the very popular datasets which is highly skewed and have categorical attributes as input independent features and continuous output.
- Designed a template which splits the data on the user input biased ratio and then trains and tests the model. I have used 6 different algorithms to train the model and compare the results.
- I have also applied PCA and derived 4 principal components and trained and tested the model.
3. Evaluation Of Classification model.
- Analysis of ROC Curve
- Finding when the model is being going through overfitting and when the model is being underfitted.
- ROC curve also helps us in finding out the effect of different hyper parameters used in the algorithms
- Acurracy of the model has significant role but that just can't be the only parameters to analyse the utility of our model.
- Health data set was used as a reference.
4. Topic Modelling
- Twitter's Climate dataset was used for reference and to extract the different topics which might have been used in the discussion of the tweets.
- NLP techniques such as tokenizing, lemmatization, stop words removal, POS tagging was used.
- A proper template was build to understand how is the preprocessing of text based dataset is used.
- IMPORTANT features such as popular hastags, popular mentions, and popular tweets were identified.
- Corelation matrix was built among all three to identify the strong relationship and negative relationship between all these values
- Algorithms used in topic modeling were LDA-Latent Dirichlet and NMF
5. SEQUENCE2SEQUENCE MODELLING.
- Prediction of Song lyrics and different text based on feed data into the model
- Completion of all the modules in coursera course and its assignments
- Some extra assignments were given by the mentors to test weather we have really understood the concept or not.
- 3D visualization of these models in the tensorflow library and tools
- Sarcasm dataset was used as reference for this task
6. Combining different models in the flask web app:
- Learning how to combine flask and their models with the algorithm machine learning models.
- There were around 3-4 projects going on in which I Combined the different models.