The Waffer Fault Detection project in Python involves using machine learning algorithms to detect and classify defects on silicon waffers in semiconductor manufacturing.
For each Waffer we get values from 500+ sensors and based on that we tell whether the waffer is functioning(+1) or faulty(-1)
Click Here to get Data
In this step, we perform different steps of validation like,
Is the filename valid?
Are all columns present?
Name of each column
Data type of each columns
In this step we perform the following things,
Database Creation and connection
Table creation in the database
Insertion of files in the table
The data in a stored database is exported as a CSV file to be used for model training.
In this step we check for null values in each column. If null values are present we use KNN Imputer to fill in those values with the mean of k neighbours of it.
Also we will remove the columns which have a standard deviation of 0, it means all the values in that column are same and hence that column won't add any meaning to the model training.
The idea behind clustering is to find enteries(rows) that are relatively similar to each other, create cluster and train separte model for each cluster. This Technique lets us get better accuracy by grouping similar data together.
We use Kmeans to cluster of preprocessed data and save the model for later use.
After clusters are created, we find the best model for each cluster. Two algorithms are used RandomForest and XGBoost. We perform Grid Search CV to get both models for best paramenters and then compare their accuracy to get the better model.
Here also all the above steps like Data Validation, Data Insertion in Database, Data Preprocessing and Clustering is performed. Based on the cluster group, the model is loaded and prediction is made.