We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the "big data" buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.
We will roughly cover the following topics:
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.
Lectures:
Day | Time | Location |
---|---|---|
Mon/Wed-Fri | 16:00-16:50 | Zoom |
Tutorials:
The list of tutorials can be found here. See the ones starting with "T2". All tutorials will be on Zoom - see the link above. You do not need to be registered for a tutorial to be registered in the course. If possible, please try to attend the tutorial you are registered for; however, it's also fine if you attend a different tutorial.
Office hours: See the calendar.
Instructor: Mike Gelbart
Teaching Assistants:
- Daniel Ajisafe
- Ramya Basava
- Daniele Reda
- Rubia Guerra
- Amit Kadan
- Kattie Sepehri
- Joshua Tindall
- Meng Wang
- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 200, STAT 203, STAT 241, STAT 251, STAT 302, MATH 302, MATH 318, or BIOL 300).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).
Undergraduate and graduate students from any department are welcome to take the class, provided that they satisfy the prerequisites. If you do not satisfy the exact prerequisites but would still like to enroll in the class, see here. For graduate students from outside the CS department, see here.
Auditing: because the class/classroom is full, we may not have seats for auditors. If there is space and you would like to audit the course, please contact the instructor.
The grading scheme for the course is as follows:
Component | Weight |
---|---|
Syllabus quiz | 1% |
Assignments | 30% |
Midterm | 19% |
Final | 50% |
The tentative plan is for all assignments to be weighted equally. The instructor reserves the right to change this in case, for example, one of the assignments needs to be shortened or cut.
All grading concerns and challenges must be done through Gradescope.
If you perceive a problem with your homework or exam grade, you have one week to raise a concern from the time that your grades were posted. After that, your grade is final.
Grades are not perfect; some randomness in grading is normal, meaning that you'll generally get more than you deserve in some cases and less than you deserve in other cases. Thus, it is possible to exploit the system by consistently complaining when your grade is too low but not when it is too high. Unfortunately, this takes time away from the course staff which could have been spent on making the course better for everyone. Thus, in my view, students who overzealously contest grades are penalizing their classmates for personal gain.
Sometimes serious grading errors are made, for example when a grader did not see your answer to a question or completely deviated from standard grading practices for some unknown reason. Such situations can be quite frustrating for students, and we want you to feel that the course is fair. In these cases, it makes sense for the student to bring the error to our attention.
Balancing these two sides is difficult. In this course we will use the following policy: if a grade is challenged in a way that is deemed unreasonable, the student will receive a warning. This decision will be made by the instructor. If this happens a second time, the student will lose the privilege to challenge grades for the remainder of the course. Examples of unreasonable requests include extremely minor complaints (e.g. half a mark on an assignment) or repeatedly contesting the same issue once a decision has been reached. This policy applies to both assignments and exams.
We will be using Python for this course because it is open source and widely used in machine learning and data science. We will use Python 3 (in particular 3.7 or higher). We recommend the Anaconda Python distribution because it comes bundled with a bunch of useful packages (NumPy, SciPy, scikit-learn, Jupyter notebook) pre-installed. You can download Anaconda from their website for free. For some more info on Python, see here.
You will also need a way of compiling LaTeX documents. We recommend Overleaf.
UBC has a policy on academic concession for cases in which a student may be unable to complete coursework. According to this policy, grounds for academic concession can be illness, conflicting responsibilities, or compassionate grounds. Examples of compassionate grounds, from the above policy, include “a traumatic event experienced by the student, a family member, or a close friend; an act of sexual assault or other sexual misconduct experienced by the student, a family member, or a close friend; a death in the family or of a close friend.” If you would like to request an academic concession, please fill out the academic concession form and email it to the instructor as soon as possible. You may be asked for further documentation. The instructor will evaluate the situation and make a decision on whether to grant the concession and, if so, how to proceed.
By default, late submissions will not be accepted. The rationale is that we will be posting the solutions shortly after the assignment deadline, and we cannot accept submissions after the solutions are posted. I do not like this, but I believe the overall policy is best for the class as a whole.
In exceptional circumstances a late submission may be accepted with an academic concession - see above.
If you use information from students outside your group or from online sources or lecture notes, cite this at the start of each question. You will receive a mark of 0 for the assignment (and possibly other consequences) if you are found copying from other sources without citation.
The midterm exam will take place on Friday, Feb 26, 2021 during class time. The exam is open book, meaning you are allowed to consult course materials, the internet, etc. However, you are NOT allowed to communicate with anyone else in any way during the exam. The exam will be on Canvas and you will have 50 minutes to complete it.
Missed midterm exam. There is no makeup midterm exam. If you miss the midterm exam, or anticipate missing the midterm exam, please see the Academic concessions section above. In most cases, if you have missed the midterm exam for a justified reason, the weight of the midterm component of the course will be transferred to the final exam.
The final exam will be held during the exam period but it will be take-home in the sense that you will have 24-48 hours to complete it (exact duration TBD). The exam will be open book, meaning you are allowed to consult course materials, the internet, etc. However, you are NOT allowed to communicate with anyone else in any way during the exam.
- Do not distribute any course materials (slides, homework assignments, solutions, notes, etc.) without permission.
- If you commit to working with a partner on an assignment, do your fair share of the work.
- If you have a problem or complaint, let the instructor(s) know immediately. Maybe we can fix it!
- This is a tough course. If you're struggling with something, I recommend trying to figure it out 10-30 min before asking for help. Spending to little time will hinder your learning, whereas spending hours may not resulting in efficient learning either.
We're working together on this course during a global pandemic. Everyone is struggling to some extent. If you tell me you're having trouble, I'm not going to judge you or think less of you. I hope you'll extend me the same grace!
- You're always welcome to talk to me about things you're going through.
- If I can't help you, I might know someone who can.
- If you need extra help, I'm here to work with you. We're in this together.
Credit: adapted from here.
During this pandemic, the shift to online learning has greatly altered teaching and studying at UBC, including changes to health and safety considerations. Keep in mind that some UBC courses might cover topics that are censored or considered illegal by non-Canadian governments. This may include, but is not limited to, human rights, representative government, defamation, obscenity, gender or sexuality, and historical or current geopolitical controversies. If you are a student living abroad, you will be subject to the laws of your local jurisdiction, and your local authorities might limit your access to course material or take punitive action against you. UBC is strongly committed to academic freedom, but has no control over foreign authorities (please visit http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,33,86,0 for an articulation of the values of the University conveyed in the Senate Statement on Academic Freedom). Thus, we recognize that students will have legitimate reason to exercise caution in studying certain subjects. If you have concerns regarding your personal situation, consider postponing taking a course with manifest risks, until you are back on campus or reach out to your academic advisor to find substitute courses. For further information and support, please visit: http://academic.ubc.ca/support-resources/freedom-expression.