All contents in this document are tentative.
- Alice Oh, alice.oh@kaist.edu
- Jiseon Kim, jiseon_kim@kaist.ac.kr
- Dongkwan Kim, dongkwan.kim@kaist.ac.kr
When you send emails, please email to all TAs and prof. Oh. and put "CS475" to the title. (e.g., [CS475] Do we have a class on thanksgiving day?)
This course will cover advanced and state-of-the-art machine learning for text data. ML methods covered will include graphical models, Bayesian inference, nonparametric models, and deep learning. By the end of the course, students will be able to
- Understand important concepts in NLP
- Read current research papers in NLP
- Implement some of the basic ML models for NLP
- Conduct replication studies based on a recent NLP+ML paper
- Communicate in written and spoken English about NLP+ML research
- You need to have good programming skills in Python.
- You need to have a basic understanding of ML concepts. You do not need to have taken CS376 or any other undergraduate ML course, but you need to know concepts such as train vs test data, clustering vs classification, accuracy/precision/recall, overfitting, and basic classification models such as SVM, random forest, etc. You can learn these concepts as we go along, but you may find some lectures and papers difficult to understand if you do not put in extra time to learn these concepts.
- We will use well-known frameworks for machine learning. You may start with little prior experience and learn these libraries during this semester, but that will require extra time and effort. Note that we do not provide any lectures about learning them.
- The topic of the course includes Korean NLP. You do not need to be fluent in Korean, but you need to know what the Korean alphabet (Hangeul) is and how they combine to form syllables and words.
- Papers from JMLR, ICML, NeurPS, IJCAI, AAAI, ICLR, ACL, EMNLP, ArXiv, etc.
- Jacob Eisenstein, Natural Language Processing
Lecture Schedule Topics include (not in this order)
- Word Vectors & Distributed Semantics
- Text Classification
- Language Models N-grams
- Sequence Models RNN
- Machine Translation
- Korean NLP
- Neural Language Models
- NLP Applications (QA, Dialogue, Information Extraction, etc)
TBA.
Your grade will be a combination of the following:
- 50% Participation and attendance
- 50% Project