This module focuses on understanding current data through exercises that include creating visualizations, analyzing correlations, standardizing and normalizing data, and splitting datasets to prepare for predictive modeling.
-
Create a graph to visualize the data using the "Test_knight.csv" file.
-
Additionally, create a graph to understand the interaction between the knight’s skills (features) and the "knight" column (target) using the "Train_knight.csv" file.
- Write a correlation factor to identify the columns with the strongest correlation between the target column "knight" and all feature columns.
-
Display 4 graphs using both "Train_knight.csv" and "Test_knight.csv".
-
One of the graphs must visually separate the clusters, while the other should mix them for each file.
-
Standardize and print your data.
-
Display one of the graphs from the previous exercise using the standardized data.
-
Ensure compatibility with both "Train_knight.csv" and "Test_knight.csv".
-
Normalize and print your data.
-
Display the other graphs from Exercise 02 using the normalized data.
-
Ensure compatibility with both "Train_knight.csv" and "Test_knight.csv".
-
Write a program to randomly split "Train_knight.csv" into "Training_knight.csv" and "Validation_knight.csv".
-
You must explain the percentage retained in each file and the reasoning behind it.
This module emphasizes the importance of data analysis in predicting future outcomes based on historical data, preparing you for real-world applications in data science.