- Table of Contents
- Introduction
- HW1 MLP | Phoneme Recognition
- HW2 CNN | Face Recognition and Verification
- HW3 RNN - Forward/Backword/CTC Beamsearch | Connectionist Temporal Classification
- HW4 Word-level Neural Language Models using RNNs | Attention Mechanisms and Memory Networks
This repo contains course project of 11785 Deep Learning at CMU. The projects starts off with MLPs and progresses into more complicated concepts like attention and seq2seq models. Each homework assignment consists of two parts. Part 1 is the Autolab software engineering component that involves engineering my own version of pytorch libraries, implementing important algorithms, and developing optimization methods from scratch. Part 2 is the Kaggle data science component which work on project on hot AI topics, like speech recognition, face recognition, and neural machine translation.
-
HW1P1 Implement simple MLP activations, loss, batch normalization.
-
HW1P2 Kaggle challenge: Frame level classification of speech.
Using knowledge of feedforward neural networks and apply it to speech recognition task. The provided dataset consists of audio recordings (utterances) and their phoneme state (subphoneme) lables. The data comes from articles published in the Wall Street Journal (WSJ) that are read aloud and labelled using the original text. The job is to identify the phoneme state label for each frame in the test dataset. It is important to note that utterances are of variable length.
-
HW2P1 Implement NumPy-based Convolutional Neural Networks libraries.
-
HW2P2 Kaggle challebge: Face Classification & Verification using Convolutional Neural Networks.
Given an image of a person’s face, the task of classifying the ID of the face is known as face classification. The input to the system will be a face image and the system will have to predict the ID of the face. The ground truth will be present in the training data and the network will be doing an N-way classification to get the prediction. The system is provided with a validation set for fine-tuning the model.
-
HW3P1 Implement RNNs and GRUs deep learning library like PyTorch.
-
HW3P2 Kaggle challenge: Utterance to Phoneme Mapping.
This challenge works with speech data. The contest uses unaligned labels, which means the correlation between the features and labels is not given explicitly and the model will have to figure this out by itself. Hence the data will have a list of phonemes for each utterance, but not which frames correspond to which phonemes. The main task for this assignment will be to predict the phonemes contained in utterances in the test set. The training data does not contain aligned phonemes, and it is not a requirement to produce alignment for the test data.
-
HW4P1 Train a Recurrent Neural Network on the WikiText-2 Language Moldeling Dataset. This task uses reucurrent network to model and generate text, and uses various techniques to regularize recurrent networks and improve their performance.
-
HW4P2 Kaggle challenge: Deep Learning Transcript Generation with Attention.
In this challenge, use a combination of Recurrent Neural Networks (RNNs) / Convolutional Neural Networks (CNNs) and Dense Networks to design a system for speech to text transcription. End-to-end, the system should be able to transcribe a given speech utterance to its corresponding transcript.