Welcome to my portfolio! Here, you'll find brief descriptions of the projects I've undertaken to enhance my skills as a Data Scientist.
Table of Contents:
-
Tools : Numpy, Pandas, Scikit-Learn, Matplotlib, Seaborn, SQL, Statistics
-
Tools : TensorFlow, Pytorch, OpenCV, Albumentations, HuggingFace Transformers
-
Tools : Spacy, Nltk, Embeddings, LSTM, RNN
Laptop Price Prediction | Streamlit App | Deployment on Render
- Goal Of the Project : Predict the price of Laptop using Parameters Like Brand, RAM, OS, Touchscreen, HDD, SSD etc
Body Fat Estimation Flask Web Application
- Goal of Project : Illustrate multiple regression techniques, for Accurate measurement of body fat as majority of tools are inconvenient/costly and it is desirable to have easy methods.
- Highlights :
- Checking outliers with upper and limit equals to -3 and 3 in normal distribution.
- Feature Selection :
Extra Tree Regressor
Mutual Information Gain
Variance inflation factor
- Result :
Random Forest Regression
performs best withR2 Score : 99%
- Hosted Website on
FLASK
framework.
SpaceX Falcon 9 1st stage Landing Prediction
- Goal of the Project : SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of $165 million per launch, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch.
- Highlights:
- HTML
Web scrapping
from Wikipedia and Request to SpaceX API - Connecting to
IBM DB2 Database
and UsingSQL
queries to explore. - Using
Seaborn
andMatplotlib
library - Algorithms :
Logistic Regression
,KNN
,SVM
,Decision Tree
. - Hyper-Parameter Tuning : Using
GridSearchCV
| Decision Trees Performed Best |Accuracy : ~ 90%
- HTML
HR Department Case Study: Employees Attrition Prediction
- Goal of Project : Perform classification analysis to determine wheather employee will leave the company or not. Small Business owners spends 40% of their working hours on tasks that do not generate any income such as hiring. Companies spend 15-20% of employee's salary to recruit new candidate.
- Goal of Project :
- Highlights :
- Imported Libraries, CSV Dataset | Data Cleaning : Nulls, Dropped Un-related columns
- Data Viz + Satistical Analysis :
Correlation Matrix, Kde Plots, Box Plots, Count Plots
- Performing
ANOVA
AndChisquare test
for feature selection - OneHot Encoder, Min Max Scaler
- Goal of Project : Better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns.
- Highlights :
- Feature Enginnering, Outliers removing, Dimensionality Reduction using
PCA
, Agglomerative Clustering
Elbow Method
, 3D plotting, Profiling, Deriving Conclusion
- Feature Enginnering, Outliers removing, Dimensionality Reduction using
Mask Detection using Detectron2 Library
- Goal of Project : Create an application which can detect wheather a person is wearing mask, not wearing it propoperly and not wearing.
- Highlights
- Use of
Detectron 2
Library developed byFacebook AI Research
- Used
faster_rcnn_R_50_FPN_3x
for object detection - Trained for
1000 Iterations
- Use of
Arthropod Taxonomy Orders Object Detection
- Goal of Project : Create an object detection model that can accurately and efficiently detect objects in an image or video stream in real-time.
- Highlights
- Use of
Yolov8
developed byUltralytics
- Converted JSON file containing maps information to DataFrame
- Dataset : Arthropod Taxonomy Orders Object Detection Dataset | Total of 15,000 Images | Images spread across 7 classes
- mAP at 50% IoU threshold is 0.525 | mAP50-95 : 0.365 | Fitness score of 0.6 on val.
- Exported model to
ONXX
format
- Use of
- Goal of Project : Create an object detection model that can accurately and efficiently detect objects in an image or video stream in real-time.
- Highlights :
- Parsed
XML
which contains annotation of training images with object detection information. Albumentations
Library- Used
EfficientNet1
and changed top two layers to CNN layers, instead of using Fully connected layers. - Defined
Intersection Over Union
function to measure the overlap between two sets of bounding boxes. - Defined
YOLO Loss function
- Trained for
50 Epochs
| Results displayed on Validation Dataset.
- Parsed
- Goal of the Project : Develop a deep learning model that accurately recognizes emotions from facial expressions for potential applications in psychology and marketing.
- Highlights:
- Dataset : ~30,000 Training Images, belonging to 7 different Classes
Data Augmentation
,Lenet
,ResNet34
, Transfer LearningEfficientNet
, FineTuning EfficientNet, Vision Transformer, UsingHuggingFace Transformer
- HuggingFace downloaded Model performed best :
Accuracy : ~70%
Malaria Detection by Blood Sample Images
- Goal of the Project : Detecting whether a Blood Sample is infected by Malarial Parasite. This Model can help in easy detection of malaria cases. In remote places, where doctors and technicians are not available, this Deep learning model can aid in faster diagnosis and can save lives.
- Highlights:
prefetching dataset
to make training faster |Mixup data augmentation, Cutmix augmentation and Albumenations
CNN
and Usedcallbacks
| Plotted Loss Curves, Confusion MatrixAccuracy : 93%
- As it was a Medical diagnosis case, so we have to reduce False Positives (diagosing a person Uninfected, despite being parasatized.)
- Used
ROC Curve
and CalculatedThreshold
parameter. - Then re-plotted the Confusion Matrix
- Used
Food Vision 101 : Image Classification model using TensorFlow
Semantic Segmentation for Self Driving cars
- Goal of the project : Assign a specific class label to each pixel or region in the image, allowing the autonomous vehicle's perception system to understand the environment in a more detailed and meaningful way.
- Highlights :
- Dataset consist of
5000 images with 13 classes
, divided them into train, val, test set |lyft-udacity-challenge dataset
, from Kaggle - Preprocessed the data and created tuples for image and masks => split it into train, test, val
- defined
upsampling blocks
,downsampling blocks
andunet_model
- After training for only 5 epochs, achieved
Accuracy of 95%
on test_set.
- Dataset consist of
- Fined tuned
DeBERTa
from HuggingFace for Intent Classification.
- Fined tuned
RoBERTa
from HuggingFace for NER.
Sentiment Analysys : Alexa Reviews
- Goal of Project : Based on reviews, predicting whether customers are satisfied with the product or not.
- Dataset consists of ~ 3000 Amazon customer reviews (input text), star ratings, date of review, variant and feedback of various amazon Alexa products like Alexa Echo, Echo dots, Alexa Firesticks etc.
- Highlights :
- Data Evaluation,
WordCloud
, Cleaning : droppin not important columns, remove punctuations, - Baseline Model :
TfidfVectorizer
,MultinomialNB
- Model 1 : Conv1D with token Embeddings :
layers.Embedding
, - Model 2 : Feature extraction with pretrained token embeddings :
hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4")
- Model 3 : Conv1D with character embeddings
custom_token_embed_conv1d
, performs slightlty better than other models.- Accuracy : 94.179894
- Precision : 0.927576
- Recall : 0.941799
- F1 Score : 0.921988
- Data Evaluation,
- Goal of Project : Text Generation model, which outputs Drake Style lyrics from any English Language inputs. Given a sequence of characters from the data, training a model to predict the next character in the sequence. Longer sequences of text can be generated by calling the model repeatedly.
- Highlights :
- Text Processing :
StringLookup
,tf.strings.unicode_split
, - Layers :
Embeddings
,GRU
,
- Text Processing :
Disaster Tweet Prediction : NLP
- Goal of Project :
- Constructing a Deep Learning Classification model to predict which Tweets are about real disaster and which one aren't
- Highlights :
- Model 1 :
Feed-Forward neural network
| Model 2 :LSTM
model | Model 3 :GRU
model | Model 4 :Bidirectional-LSTM
model | Model 5 :1D CNN
| Model 6 :TensorFlow Hub Pretrained Feature Extractor
- Model 1 :
Sales Department Case Study : Sales Forecasting
- Goal of Project :
- Forecast sales using store, promotion, and competitor data
- For companies to become competitive and skyrocket their growth, they need to leaverage AI/ML to develop predictive models to forecast sales in future. Predictive models attempt at forceasting future sales based on historical data while taking into account seasonality effects, demand, holidays, promotions, and competition.
- The Process:
- Dataset from Kaggle :
Sales Data
,Store information Data
| Data Cleaning : Checked Nulls, Dropped not-important columns, Merged
both dataset on 'Store Dataset'- Statistical Anlaysis
- Used Facebook
Prophet Alogorithm
for Prediction.
- Dataset from Kaggle :
- Bridge gap b/w non-technical users and databases Developed script that prompted users for input and Utilized OpenAI's "text-davinci-003" model to process prompts and generate tailored SQL queries.
- Goal of the Project : Generate recipes based on left over ingredients and even provide stunning of the dishes!
- Highlights :
- Combines the power of OpenAI's
DALL·E
and OpenAI'stext-davinci-003
- Combines the power of OpenAI's
- OpenCV
- Google Deep Dream : The DeepDream algorithm effective in asking a pre-trained CNN to take a look at the image, identify patterns you recognise and amplify it. It uses representations learned by CNNs to produce these hallucinogenic or 'trippy' images.