Skip to content

Commit

Permalink
Merge pull request #34 from neurodata/dalabengba
Browse files Browse the repository at this point in the history
Complete pre-print draft
  • Loading branch information
PSSF23 authored Sep 1, 2021
2 parents abe0e82 + dfde103 commit 9a65a8a
Show file tree
Hide file tree
Showing 31 changed files with 3,625 additions and 187 deletions.
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,9 @@
# rf-dn-paper
Random forests (RF) and deep networks (DN) are two of the primary machine
learning methods in current literature, yet they are known to yield different
levels of performance on different data modalities. In particular, RF (and
other tree based ensemble methods) is one of the leading means for tabular
data, while DNs are known to perform best on structured data. We wish to
further explore and establish the conditions and domains in which each
approach excels, particularly in the context of sample size. To address these
issues, our team is analyzing the performance of these models across different
settings using varying model parameters and architectures. For our data, we
are focusing on well-known image, tabular, and audio datasets to observe
performance from well-documented sources. For image data, we used CIFAR-10/100
with fixed sample sizes across different numbers of classes. For tabular data,
we used OpenML-CC18, a collection of datasets representing a benchmark for
machine learning analysis. For audio data, we used UrbanSound8k dataset across
ten classes employing Mel-frequency cepstrum as a feature representation for
audio. The goal of this project is to observe novel trends in model
classification accuracy visible across a range of sample sizes.
# When are Deep Networks really better than Random Forests at small sample sizes?

[![arXiv](https://img.shields.io/badge/arXiv-2108.13637-red.svg?style=flat)](https://arxiv.org/abs/2108.13637)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-MIT-blue)](https://opensource.org/licenses/MIT)

**This is preliminary work. More details will be available.**

Random forests (RF) and deep networks (DN) are two of the most popular machine learning methods in the current scientific literature and yield differing levels of performance on different data modalities. We wish to further explore and establish the conditions and domains in which each approach excels, particularly in the context of sample size and feature dimension. To address these issues, we tested the performance of these approaches across tabular, image, and audio settings using varying model parameters and architectures. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found RF to excel at tabular and structured data (image and audio) with small sample sizes, whereas DN performed better on structured data with larger sample sizes. Although we plan to continue updating this technical report in the coming months, we believe the current preliminary results may be of interest to others.
4 changes: 2 additions & 2 deletions audio/audio_toolbox.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def load_spoken_digit(path_recordings, feature_type="spectrogram"):

audio_data = [] # audio data
x_spec = [] # STFT spectrogram
x_spec_mini = [] # resized image, 28*28
x_spec_mini = [] # resized image, 32*32
y_number = [] # label of number
y_speaker = [] # label of speaker
if feature_type == "spectrogram":
Expand All @@ -34,7 +34,7 @@ def load_spoken_digit(path_recordings, feature_type="spectrogram"):
x, sr = librosa.load(path_recordings + i, sr=8000)
x_stft_db = a(torch.tensor(x)).numpy()
# Convert an amplitude spectrogram to dB-scaled spectrogram
x_stft_db_mini = cv2.resize(x_stft_db, (32, 32)) # Resize into 28 by 28
x_stft_db_mini = cv2.resize(x_stft_db, (32, 32)) # Resize into 32 by 32
y_n = i[0] # number
y_s = i[2] # first letter of speaker's name

Expand Down
Binary file removed audio/figures/mel.pdf
Binary file not shown.
Binary file removed audio/figures/mfcc.pdf
Binary file not shown.
Binary file removed audio/figures/spoken.pdf
Binary file not shown.
Loading

0 comments on commit 9a65a8a

Please sign in to comment.