Kaggle Datathon: DataHub 2.0

Problem Statement

Music is as much a powerful form of human expression as it is an entertainment. Over time, it has developed from the earliest calls and rhythms, into a huge variety of different genres. The high contrast between the simplicity of folk songs to the complexities of classical symphonies and the hypnotic rhythms of dance music suggests that we can effectively assign a category to each song based on various elements. For example, genres can be defined by the use of specific instruments. If the piece was being played in a certain style using orchestral instruments, then we could classify it as classical music. Similarly, if the instruments were highly distorted guitars, we would classify them as rock or heavy metal. Likewise, drum and bass use a very fast bpm and is primarily electronic.

Currently, many music aggregator applications rely on machine learning to power their recommendation engine, and curate playlists.

In this challenge, you are expected to develop a machine learning model with the given dataset which classifies music into genres, taking into account relevant features.

Objective

Your goal is to predict the correct genre of each music record, given their respective features!

Evaluation Metric

The evaluation metric for this competition is Categorization Accuracy - the percentage of predictions that are correct.

Submission Format

Sample_submission.csv

Dataset

Use Kaggle API command, given below, to download the dataset

>_ kaggle competitions download -c datahub-2021

OR Use Git to clone this repository

$ git clone https://github.com/Alpha-github/Kaggle_Competition_Datahub2021.git

Files	Description
`train.csv`	The training set
`test_x.csv`	The test set
`Sample_submission.csv`	A sample submission file in the correct format
`metaData.csv`	Supplemental information about the data

Output File	Description
`submission.csv`	This files contains id of test data and its respective prediction

Program Description

The program involves preprocessing of data using pandas and building a predictive categorical model.

As accuracy is the key, 3 models have been built; Two models using Sci-kitLearn, GaussianNB and DecisionTreeClassifier, and the last one using Tensorflow Keras Deep Learning model. The model which gives the highest accuracy will be opted. Replace PATH_TO_TRAIN_CSV and PATH_TO_TEST_CSV with the path of your train.csv and test.csv files.

Feel free to play with the Deep Learning Model by tweeking hyperparameters, number of layers, Optimization and Loss functions, etc.

Important: Beware to not overfit your model, else it won't perform well on the test dataset. The final prediction on the Test data is stored in submission.csv NOTE: As training Neural Networks is hardware intensive, its better to run the model using Google Colab.

Technology

An easy to pick up programming language and fun to play with.

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Tensorflow Keras - The core open source library to help you develop and train ML models.

Simple and efficient tools for predictive data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Support Libraries:

Logging :
- Python has a built-in module logging which allows writing status messages to a file or any other output streams.
Seaborn:
- Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python.

Setup

This project was built using Windows 10

Install Tensorflow using pip

System requirements :-

Python 3.6–3.9 Python 3.9 support requires TensorFlow 2.5 or later. Python 3.8 support requires TensorFlow 2.2 or later.

Important: For more information regarding proper installation and Setting up GPU. Click here

pip install tensorflow

Install Numpy using pip

pip install numpy

Install Pandas using pip

pip install pandas

Install Numpy using pip

pip install numpy

Install Seaborn using pip

pip install seaborn

This project was built on Python version 3.9

To download python, Click on the thumbnail below to be redirected to Python Downloads page

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
MetaData.xlsx		MetaData.xlsx
README.md		README.md
Sample_submission.csv		Sample_submission.csv
music.py		music.py
requirements.txt		requirements.txt
submission.csv		submission.csv
test_x.csv		test_x.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Datathon: DataHub 2.0

Problem Statement

Objective

Evaluation Metric

Submission Format

Dataset

Program Description

Technology

Support Libraries:

Logging :

Seaborn:

Setup

This project was built using Windows 10

Install Tensorflow using pip

Install Numpy using pip

Install Pandas using pip

Install Numpy using pip

Install Seaborn using pip

This project was built on Python version 3.9

License

Public

About

Releases

Packages

Languages

Alpha-github/Kaggle_Competition_Datahub2021

Folders and files

Latest commit

History

Repository files navigation

Kaggle Datathon: DataHub 2.0

Problem Statement

Objective

Evaluation Metric

Submission Format

Dataset

Program Description

Technology

Support Libraries:

Logging :

Seaborn:

Setup

This project was built using Windows 10

Install Tensorflow using pip

Install Numpy using pip

Install Pandas using pip

Install Numpy using pip

Install Seaborn using pip

This project was built on Python version 3.9

License

Public

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages