-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProject_Log.txt
70 lines (53 loc) · 3.66 KB
/
Project_Log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
In our project we would like to use machine learning algorithms to classify the genre of a given song.
We would like to answer or explore the following questions while making this project:
1. How we use the mp3/wav files for machine learning? do we need to conver them to one format or we need to
convert them to a photo or other way of presentation for them to indentify the genre.
2. What kind of data manipulation if at, all is needed?
3. How to use and create random forests? and how is it compaired to the other machine learning algorithms:
1. Random Forests.
2. Decision Trees.
3. KNN.
4. Adaboost
5. Neural Networks.
-- Created the project and downloaded the dadasets.
-- converted all the mp3 files to wav files in the emotional dataset using the following script:
find . -type f -name "*.mp3" -exec ffmpeg -i {} -acodec pcm_s16le -ar 44100 {}.wav \;
We doscovered the different strategies to use .wav files with machine learning models:
1. Mel-Frequency Cepstral Coefficients (MFCCs)
2. Spectrogram
3. Chroma Features
Also we take to considiration that there are some genres that have more audio files than other when combining the data.
Some research found 3 different aproaches for this problem:
1. Under-sampling: Reduce the number of instances in the over-represented classes.
2. Over-sampling: Increasing the number of instances in the under-represented classes by duplicating.
3. Weighted Training: Adjusting the weight of classes in the loss function when training.
-- This is relevant only specificly for the spectrogram strategy as the other find averages over the audio file waves.
We decide to use the MFCCs and chroma as our data is coming from 2 different datasets with different lengths - one with 30 second audio files and the
other 1 minute audio files, so the length of the original length of the audio file doesn't matter, and for
the imbalanced data approach we will choose later in the future. Our next step is - extract
the MFCCs from our data and join it to one big table dataset.
While extracting the mfcc, we encountered a currupt file - data/gen-dataset/genres_original/jazz/jazz.00054.wav
in our dataset.
After handling and creating the mfcc, we use again the libary "librosa" to extract the chroma features into
the features/chroma folder.
-- In total we got 2 csv's that contain each audio file, its class (genre) and it's features according to the
strategy we chose.
-- We would like to explore which feature extracting algorithm is better for our problem.
After creating the dataset, we would like to start with the knn classifier. For that we created a
working version with k=5, and than build the project structure to follow OOP princibles.
Now when we want to create a new classifier we have to implement the - train, predict and evaluate
methods.
Running knn with k=5 on the mfcc features we get a result of 57.75% accuracy and for the chroma we get
35.3% accuracy.
* We would like to explore and see how changing the k will impact the results, for each of the features lists.
* We also decided to use for each algorithm the 2 feature files we created and compare between them.
Now, after the KNN, we implement other machine learning algorithms for our problem. In total:
1. KNN
2. SVM
3. Random Forests
4. Decition Trees
5. Nureal network.
* Ideas for future work:
1. Use the spectrogram strategy - which includes augmenting the data so the length will be the same for all the files and the spectrogram pictures will be the same. On the pictures, we can run for example knn.
2. Try other ML algorithms like combining different classifiers into a better one using adaboost.
3. Run the ML algorithms with a combined data table of the chroma and the mfcc features.