Skip to content

Latest commit

 

History

History

datascience-3

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

The Present

Exercises Overview

Introduction

This module focuses on understanding current data through exercises that include creating visualizations, analyzing correlations, standardizing and normalizing data, and splitting datasets to prepare for predictive modeling.

Exercise 00: Histogram

  • Create a graph to visualize the data using the "Test_knight.csv" file.

  • Additionally, create a graph to understand the interaction between the knight’s skills (features) and the "knight" column (target) using the "Train_knight.csv" file.

Exercise 01: Correlation

  • Write a correlation factor to identify the columns with the strongest correlation between the target column "knight" and all feature columns.

Exercise 02: It’s Raining Cats No Points!

  • Display 4 graphs using both "Train_knight.csv" and "Test_knight.csv".

  • One of the graphs must visually separate the clusters, while the other should mix them for each file.

Exercise 03: Standardization

  • Standardize and print your data.

  • Display one of the graphs from the previous exercise using the standardized data.

  • Ensure compatibility with both "Train_knight.csv" and "Test_knight.csv".

Exercise 04: Normalization

  • Normalize and print your data.

  • Display the other graphs from Exercise 02 using the normalized data.

  • Ensure compatibility with both "Train_knight.csv" and "Test_knight.csv".

Exercise 05: Split

  • Write a program to randomly split "Train_knight.csv" into "Training_knight.csv" and "Validation_knight.csv".

  • You must explain the percentage retained in each file and the reasoning behind it.

Conclusion

This module emphasizes the importance of data analysis in predicting future outcomes based on historical data, preparing you for real-world applications in data science.