by Patcharanat P.
- Machine Learning Research
- Machine Learning Development Research
- ML Development Process & Algorithms
- Machine Learning Evaluation
- Metrics according to specific tasks
- Deep Learning Research
- Machine Learning Development Research
- Edible or Poisonous Mushroom Classification
- Time Series Forecasting using Regression Model for Temperature Prediction
- AI-Driven Interpretable Dynamic Customer Segmentation
- MBTI Classification (MBTI-IPIP)
- Finance
- OpenAI Swarm - POC
- Agentic Workflow - POC
- Basic NN
folder: mushroom
Overview
Since a dataset is downloaded from Kaggle, the dataset is already cleaned and ready to apply to the models. The purpose of this project is only to learn the basics of machine learning models and data-preprocessing approaches. The work would further expand feature engineering, such as feature selection later.
dataset: Mushroom Classification By UCI Machine Learning from Kaggle
Models used in the project
- DecisionTreeClassifier
- RandomForestClassifier
- ExtraTreesClassifier
- Support Vector Classification (SVC)
- XGBoost Classifier (XGBClassifier)
Aspects learned in the project
- Compared accuracy score between Label encoding and One hot encoding approach
- Hyperparameter tuning
- Using GridSearchCV and RandomizedSearchCV
- Metrics used to evaluated in classification tasks including precision, recall, f-score (f1-score), confusion matrix
- Overfitting checking (accuracy score on a training set and on a test set)
-- see more detail: mushroom/mushroom_notebook.ipynb
folder: regression
Overview
The main purpose of this project is to learn different approach of developing ML model in a regression task which is temperature prediction. The dataset was collected roughly, so there's processes of data cleaning included. Attributes definition of the dataset weren't given, so domain knowledge and comprehensive for data meaning and pattern in this project will be not emphasized. The most represented in this project are Multivariate Imputer and Regression Result Analysis.
dataset: Weather Conditions in World War Two By United States National Oceanic and Atmospheric Administration from Kaggle
Model used in the project
- RandomForestRegressor with Random Search
Aspects learned in the project
- Filling missing value with a Mean or Median value
- Multivariate feature Imputation (IterativeImputer)
- (MSE and MAE explaination)
- A Baseline model & A Benchmark model
- Normalization (MinMaxScaler, RobustScaler, StandardScaler exclude Normalizer)
- Random Forest Tuning
- Using time series cross validation
- Dimensional Reduction with PCA and t-SNE (In progress...)
-- see more detail: regression/regression_test.ipynb
folder: dynamic_segmentation
- revised logic notebook
- Related to End-to-end E-commerce Data Project - AI-Driven Interpretable Dynamic Customer Segmentation
folder: mbti_ipip
- Model development notebook
- Related to MLOps-ml-system project.
About Dataset
- Big Five Personality Test - Kaggle
- Local Data Dict
- International Personality Item Pool
- Converting IPIP Item Responses to Scale Scores
- Big-Five Factor Markers - Questions Classification
- Interpreting Individual IPIP Scale Scores
- MBTI - Letters personalities explain
folder: finance
In progress . . .
dataset: Financial Transactions Dataset: Analytics
folder: openai_swarm
Research + POC of AI Agentic Workflow: OpenAI Swarm
folder: agentic_workflow
Research + POC of AI Agentic Workflow: CrewAI
folder: basic_nn
Learning Neural Network