EDA stands for Exploratory Data Analysis, it is basically an approach to analyzing data sets to summarize their main characteristics, often with visual methods. Here, I have analysed the Company Bankruptcy dataset.
TASK 1:
- What is the distribution of bankruptcy and non-bankruptcy classes in the dataset? Are the classes balanced or imbalanced?
- How does the distribution of the "Operating Profit Rate" differ between bankrupt and non-bankrupt companies? Can you create a suitable plot to visualize this difference?
- Plot a bar graph to show how many companies are bankrupt or not(already asked in first ques)
- Plot a countplot for Liability Assets Flag(use Bankrupt column for colour encoding)
- Plot a heatmap without using the bankrupt column(using the seaborn lib)
TASK 2:
Perform following steps on the same dataset which you used for EDA.
- Data Preprocessing (as per requirement)
- Feature Engineering
- Split dataset in train-test (80:20 ratio)
- Model selection
- Model training
- Model evaluation
- Fine-tune the Model
- Make predictions
Summarize your model's performance by evaluation metrices.
Link to dataset : https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-prediction