🌟 Hit star button to save this repo in your profile

Assignment 4: Feature Engineering

Introduction

In this assignment, you will explore the essential concept of feature engineering in data science. Feature engineering is a critical step in the data preprocessing process, where you manipulate and create new features from your dataset to improve the performance of machine learning models.

Task Overview

Dataset Selection: Your first task is to select a suitable dataset for this assignment. You can choose a dataset from various sources, such as Kaggle, UCI Machine Learning Repository, or any other relevant dataset source. Make sure the dataset is in a format that can be easily loaded into Google Colab.
Loading the Dataset: Use Python libraries like Pandas to load the selected dataset into your Colab notebook. You can upload the dataset from your local machine or load it directly from an online source.
Exploratory Data Analysis (EDA): Perform a basic exploratory data analysis to understand the dataset's characteristics. This includes checking for missing values, understanding the data types, and getting a sense of the dataset's structure.
Feature Selection: Identify which features are relevant for your analysis. You can use techniques like correlation analysis, domain knowledge, or feature importance to choose the most important features. Create a new DataFrame containing only the selected features.
Feature Preprocessing: Clean the selected features as needed. This may include handling missing values, dealing with outliers, and standardizing or normalizing data.
Feature Transformation: Apply transformations to the selected features. You can use techniques like one-hot encoding for categorical features, logarithmic transformations for skewed data, or any other relevant transformations to make the data suitable for modeling.
Feature Creation: Create new features if they can provide valuable information. This could involve combining or aggregating existing features or engineering new ones based on domain knowledge.
Visualization: Visualize the transformed data to gain insights into feature distributions and relationships.
Conclusion: Summarize your findings, the feature engineering steps you've taken, and why you made those decisions.

Submission

Create a new Markdown document in Google Colab and name it "Feature_Engineering.md"
Provide clear and organized explanations of each step in your Markdown document using appropriate headings and bullet points.
Include Python code snippets where necessary to demonstrate your implementation.
Attach your Colab notebook (.ipynb file) with all the code, annotations, and visualizations to your submission.
Make sure to include your name, student ID, and date in the Markdown document.
Share the Markdown document and Colab notebook with your instructor as instructed for evaluation.

🚀 Form project teams comprising a minimum of three and a maximum of four students. Teamwork is essential for this assignment. Please complete the Google Sheets page with your group information here. Please update your group information:

No	Group	Dataset
0.	Sample
1.	Truth Archive	NYC Yellow Taxi Trip Data
2.	TheBoys	Brooklyn Home Sales
3.	Kicap Sambal	Flight Price Data
4.	F4	Used Cars Dataset
5.	Avengers	10+ M. Beatport Tracks / Spotify Audio Features
6.	RAM	NIFTY-50 Stock Market Data (2000 - 2021)
7.	Ayam Rendang	Predict Diabetes

3. Academic Integrity

🚫 Uphold the highest standards of academic integrity. Any candidate suspected of cheating in the assignment will face disciplinary action, which may include suspension or expulsion from the University. Moreover, any materials or devices found to be in violation of examination rules and regulations will be confiscated.

4. Submission Requirements

📝 Prepare a comprehensive document that outlines the step-by-step process for creating the case study. The deadline for submission is 26 November 2023, at 5:00 PM. Late submissions will not be accepted and will be disregarded.

File and Folder Structure

You must place your file in the submission folder. Within the bdm/ folder, create a folder called your group. Name the default file as readme.md. Suggested folder structure for this project:

bdm/your_group/
├── 📄 feature_eng.md
└── 📄 readme.md

Grading

Your assignment will be evaluated based on the following criteria:

Dataset selection and loading
Exploratory Data Analysis
Feature selection and preprocessing
Feature transformation and creation
Clarity of explanations
Proper documentation and comments in the code
Correctness of the code and results

Additional Resources

You can refer to the following resources for guidance and inspiration:

Python libraries like Pandas, NumPy, and Matplotlib/Seaborn
Online tutorials and documentation for feature engineering
Books on data preprocessing and feature engineering in data science

Good luck with your assignment! If you have any questions or need help, don't hesitate to reach out to your instructor or fellow students.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Assignment 4: Feature Engineering

Introduction

Task Overview

Submission

3. Academic Integrity

4. Submission Requirements

File and Folder Structure

Grading

Additional Resources

Contribution 🛠️

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Assignment 4: Feature Engineering

Introduction

Task Overview

Submission

3. Academic Integrity

4. Submission Requirements

File and Folder Structure

Grading

Additional Resources

Contribution 🛠️