This project provides a minimal example of training a neural network to recognise short audio clips. It is built on top of PyTorch and torchaudio and exposes a small command line interface for training and prediction.
- Convert audio files to Mel-spectrograms on the fly
- Simple convolutional neural network architecture
- CLI commands for training and live microphone prediction
- Create a virtual environment (optional but recommended)
- Install the dependencies
pip install -r requirements.txtPlace your audio files (e.g. WAV or MP3) in a directory and run:
python main.py train /path/to/audioThe trained model will be saved to song_recognizer.pth.
To make a prediction using the microphone run:
python main.py predictor provide a prerecorded file:
python main.py predict --input_file sample.wav.
├── song_recognizer
│ ├── __init__.py
│ ├── data.py
│ ├── model.py
│ ├── recognition.py
│ └── train.py
├── main.py
├── requirements.txt
└── README.md
MIT