Skip to content

Latest commit

 

History

History
106 lines (70 loc) · 8.6 KB

readme.md

File metadata and controls

106 lines (70 loc) · 8.6 KB

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

🌟 Hit star button to save this repo in your profile

Assignment 4: Feature Engineering

Introduction

In this assignment, you will explore the essential concept of feature engineering in data science. Feature engineering is a critical step in the data preprocessing process, where you manipulate and create new features from your dataset to improve the performance of machine learning models.

Task Overview

  1. Dataset Selection: Your first task is to select a suitable dataset for this assignment. You can choose a dataset from various sources, such as Kaggle, UCI Machine Learning Repository, or any other relevant dataset source. Make sure the dataset is in a format that can be easily loaded into Google Colab.

  2. Loading the Dataset: Use Python libraries like Pandas to load the selected dataset into your Colab notebook. You can upload the dataset from your local machine or load it directly from an online source.

  3. Exploratory Data Analysis (EDA): Perform a basic exploratory data analysis to understand the dataset's characteristics. This includes checking for missing values, understanding the data types, and getting a sense of the dataset's structure.

  4. Feature Selection: Identify which features are relevant for your analysis. You can use techniques like correlation analysis, domain knowledge, or feature importance to choose the most important features. Create a new DataFrame containing only the selected features.

  5. Feature Preprocessing: Clean the selected features as needed. This may include handling missing values, dealing with outliers, and standardizing or normalizing data.

  6. Feature Transformation: Apply transformations to the selected features. You can use techniques like one-hot encoding for categorical features, logarithmic transformations for skewed data, or any other relevant transformations to make the data suitable for modeling.

  7. Feature Creation: Create new features if they can provide valuable information. This could involve combining or aggregating existing features or engineering new ones based on domain knowledge.

  8. Visualization: Visualize the transformed data to gain insights into feature distributions and relationships.

  9. Conclusion: Summarize your findings, the feature engineering steps you've taken, and why you made those decisions.

Submission

  1. Create a new Markdown document in Google Colab and name it "Feature_Engineering.md"

  2. Provide clear and organized explanations of each step in your Markdown document using appropriate headings and bullet points.

  3. Include Python code snippets where necessary to demonstrate your implementation.

  4. Attach your Colab notebook (.ipynb file) with all the code, annotations, and visualizations to your submission.

  5. Make sure to include your name, student ID, and date in the Markdown document.

  6. Share the Markdown document and Colab notebook with your instructor as instructed for evaluation.

🚀 Form project teams comprising a minimum of three and a maximum of four students. Teamwork is essential for this assignment. Please complete the Google Sheets page with your group information here. Please update your group information:

No Group File Dataset
0. Sample
1. Truth Archive NYC Yellow Taxi Trip Data
2. TheBoys Brooklyn Home Sales
3. Kicap Sambal Flight Price Data
4. F4 Used Cars Dataset
5. Avengers 10+ M. Beatport Tracks / Spotify Audio Features
6. RAM NIFTY-50 Stock Market Data (2000 - 2021)
7. Ayam Rendang Predict Diabetes

3. Academic Integrity

🚫 Uphold the highest standards of academic integrity. Any candidate suspected of cheating in the assignment will face disciplinary action, which may include suspension or expulsion from the University. Moreover, any materials or devices found to be in violation of examination rules and regulations will be confiscated.

4. Submission Requirements

📝 Prepare a comprehensive document that outlines the step-by-step process for creating the case study. The deadline for submission is 26 November 2023, at 5:00 PM. Late submissions will not be accepted and will be disregarded.

File and Folder Structure

You must place your file in the submission folder. Within the bdm/ folder, create a folder called your group. Name the default file as readme.md. Suggested folder structure for this project:

bdm/your_group/
├── 📄 feature_eng.md
└── 📄 readme.md

Grading

Your assignment will be evaluated based on the following criteria:

  • Dataset selection and loading
  • Exploratory Data Analysis
  • Feature selection and preprocessing
  • Feature transformation and creation
  • Clarity of explanations
  • Proper documentation and comments in the code
  • Correctness of the code and results

Additional Resources

You can refer to the following resources for guidance and inspiration:

  • Python libraries like Pandas, NumPy, and Matplotlib/Seaborn
  • Online tutorials and documentation for feature engineering
  • Books on data preprocessing and feature engineering in data science

Good luck with your assignment! If you have any questions or need help, don't hesitate to reach out to your instructor or fellow students.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors