forked from adamdavisonsmith/BU-CS506-Spring2018
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsyllabus.tex
executable file
·412 lines (317 loc) · 14.2 KB
/
syllabus.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
\documentclass[11pt]{article}
\pagestyle{empty}
\usepackage{fullpage}
\usepackage{times}
%\usepackage{url}
\usepackage{algorithm}
\usepackage{mathtools}
\usepackage{mathptmx}
\usepackage{amssymb}
\usepackage{color}
\usepackage{paralist}
\usepackage{ulem}
\normalem
\usepackage{hyperref}
\begin{document}
\pagenumbering{arabic}
\begin{center}
\LARGE CAS CS 506 \& ENG EC 500\\
\Large Computational Tools for Data Science\\
\Large\rm Spring 2018\\~\\
\end{center}
\noindent{\large\bf Lectures:} Tuesdays and Thursdays, 3:30--4:45
PM in Photonics 211
\noindent \textbf{Lecture schedule}:
\url{https://tinyurl.com/cs506-spring2018}
\noindent \textbf{Piazza site:} \url{https://piazza.com/bu/spring2018/cs506}
\noindent \textbf{Lecture materials:} \url{https://github.com/adamdavisonsmith/BU-CS506-Spring2018}
\medskip
\noindent
\begin{small}
\begin{tabular}{|l|l|l|l|l|}
\hline
Name & Role & Email & Office & Office Hours\\
& & (Use Piazza if possible) & & \\
\hline
\hline
Adam Smith & Instructor & \texttt{ads22@bu.edu}
& MCS 135F & Tue 11:00am--12:30pm, \\
& & & & Thu
12:00pm--1:30pm \\
\hline
Sofia Nikolakaki & TF & \texttt{smnikol@bu.edu} & MCS B08 &
Tuesday 1:00-2:30pm, \\
& & & & Wednesday 1:00-2:30pm \\
\hline
Ziba Cranmer & Spark! coordinator & \texttt{zcranmer@bu.edu}& & \\
\hline
Dora Erdos & Co-instructor (projects) & \texttt{edori@bu.edu} &
& \\
\hline
\end{tabular}
\end{small}
\medskip
\noindent \textbf{Action items} you should complete today:
\begin{compactitem}
\item Read the syllabus
\item Sign and return academic honesty policy page
\item Add yourself to the course Piazza site and set notification preferences
\item Make sure you have a github.com account
\end{compactitem}
\noindent
Over the next few days:
\begin{compactitem}
\item Register for Top Hat (instructions forthcoming on Piazza)
\item Install python and git (see notes for Lecture 02)
\item Fill out the background survey (to be posted on Piazza, due
by Tuesday, January 23)
\item Get started on HW 0!
\end{compactitem}
\section*{Overview of the Course}
This course is targeted at students who require a basic level of
proficiency in working with and analyzing data. % The course emphasizes
% practical skills in working with data, while introducing students to a
% wide range of techniques that are commonly used in the analysis of data,
% such as clustering, classification, regression, and network analysis.
The goal of the class is to provide to students a hands-on understanding
of classical data analysis techniques and to develop proficiency in
applying these techniques in a modern programming language (Python).
%
% Broadly speaking, the course breaks down into three main components,
% which we will take in order of increasing complication: (a)
% unsupervised methods and summaries; (b) supervised methods; and (c) methods for
% structured data.
%
Lectures aim to
% not
% on the theoretical analysis of the methods, but rather on
help
students (a) understand the practical settings in which these methods are
useful and (b) interpret the results (and assess the significance) of their analyses.
% Class discussion will study use cases and will go over relevant
% Python packages that will enable the students to perform hands-on
% experiments with their data.
\section*{Prerequisites}
Students taking this class \textbf{must} have
\begin{compactitem}
\item \emph{Experience programming}, at the level of CS 105, 108, or 111, or
equivalent.
\item A solid understanding of linear algebra: CS 132 or equivalent (MA 242, MA 442) is
required. In particular, students should be comfortable with the notions
of linear independence, rank,
eigenvalue and eigenvector.
\item \emph{Probability and statistics}. Familiarity
with basic concepts of probability (indendence, random variables, expectation,
variance) and statistics (point estimates, regression, confidence intervals,
hypothesis tests).
\item \emph{General scientific mathematics}. Calculus, elementary
combinatorics, boolean logic.
\end{compactitem}
\noindent Other useful background:
\begin{compactitem}
\item Data structures and algorithms (CS 112, 131)
\item Probability for computer scientists (CS 337)
\item Vector calculus
\item Optimization
\end{compactitem}
\section*{Learning Outcomes}
Students who successfully complete this course will be proficient in
basic data acquisition, manipulation, and analysis. They will be able
to understand and carry out the most commonly used methods of clustering,
classification, and regression. They will be able to interpret their
results, and discuss the limitations of their methodology.
They will also understand and be able to articulate
efficiency and systems issues related to working on very large
datasets.
\section*{Slides}
There is no text for the course. Lecture notes will be posted online.
Many of the slides are actually executable python scripts, using the
\texttt{jupyter notebook}. You can
download and execute the lectures on your own computer, and you can
modify them any way you'd like, play around with them, experiment, etc.
The slides I use in lecture are published on \texttt{github}. The
repository is
\url{https://github.com/adamdavisonsmith/BU-CS506-Spring2018}. If you want
to access the repository using \texttt{git}, please feel free. If you
find a bug, submit a pull request (alternatively, ask
about it on Piazza.)
\subsection*{Additional Reading}
Some other recommended texts:
\begin{compactitem}
\item \emph{Introduction to Data Mining,} by
Tan, Steinbach and Kumar. Some of the lectures are based on this book.
\item \emph{Python for Data Analysis}
(\url{http://shop.oreilly.com/product/0636920023784.do}). This is the
definitive text for \emph{Pandas}.
\item Larry Wasserman, \emph{All of Statistics: A Concise Course in Statistical
Inference}, Springer, 2004. A survey of major statistical
concepts, written for scientists outside of statistics.
We will post links to additional reading on Piazza as the smester
proceeds.
% \item \emph{Programming Collective Intelligence}
% (\url{http://shop.oreilly.com/product/9780596529321.do})
\end{compactitem}
\section*{Homeworks and Project}
\begin{enumerate}
\item There will seven to nine homework assignments. In a typical
assignment you will
analyze one or more datasets using the tools and techniques presented in
class.
Homeworks will be submitted via \texttt{github}. For this, you will
need a github account (create one if you don't already have
it---they are free).
You are expected to work individually on homeworks.
\item There will be a final project to be conducted in teams. For the project you
will extract some
knowledge or conclusions from the analysis of dataset of your choice. The analysis
will be done using a subset of the methods we described in class. The
final project will require a proposal, two progress reports, and a final
presentation in poster form.
The project will have three essential components: 1) a data collection
piece (which may involve crawling or calls to an API, combining data
from different sources etc), 2) a data analysis piece (which will
involve applying different techniques we described in class for the
analysis) and 3) a conclusion component (where the results of the data
analysis will be drawn). The students will submit a 5-page report
explaining clearly all the three components of their project. That
report will be revised based on feedback from course staff. Finally, a
poster presentation will be required where the students will be prepare
to present their effort and results in front of their poster.
As an example, you may choose to collect data from Twitter related to
a specific topic (e.g., Ebola virus) and then measure the intensity of
posts about a topic in different areas of the world. Other examples
of projects may include (but are not limited to): analysis of MBTA
data, analysis of census data from your favorite country, crawling of
YouTube (or other social media data) and analysis of social behavior
like trolling and bullying.
See the lecture schedule for project deadlines.
\end{enumerate}
\section*{Piazza}
Piazza is a website that allows you to ask questions, either to
instructors or course-wide. We will be using Piazza for almost all course
communication outside of the classroom. Please sign up, and set
appropriate email notification options so that you make sure to
receive announcements.
\url{https://piazza.com/bu/spring2018/cs506}.
Piazza allows you to ask questions that are visible only to
instructors, but it also allows you to ask questions to the entire
class, and answer others' questions. When someone posts a question on
Piazza, if you know the answer, please go ahead and post it. However
please \emph{do not} provide answers to homework questions on Piazza.
It's OK to tell people \emph{where to look} to get answers, or to
correct mistakes; just don't provide actual solutions to
homeworks. Also, be polite.
See the post ``Ethics and Etiquette on Piazza''
for detail about our expectations.
\section*{Top Hat}
TopHat is a software platform for me to ask you multiple-choice questions in class. It
provides a similar functionality to ``clickers''. I use Top Hat to
help students participate, and to gauge whether the pace of the class
is right.
%
There is a subscription fee of about \$26 (but no textbooks to buy).
%
We will post details of how to sign up for this class on Piazza.
\newpage
\section*{Course and Grading Administration}
\begin{tabular}{rl}
45\% & Homework assignments \\
5\% & Class participation (in-class discussion, Piazza, Top Hat)\\
50\% & Final Project \\
\end{tabular}
Homeworks are due at 7pm on Fridays.
Assignments will be submitted using \texttt{github}. Ms.\ Nikolakaki will
explain how to submit assignments.
Late assignments \textbf{WILL NOT} be accepted.
However, you may submit \textbf{one} homework up to 3 days late. You
\textbf{must} contact Ms.\ Nikolakaki via Piazza before the deadline if you intend to
submit a homework late.
The exact cutoffs for final grades will be determined after the class is
complete.
\section*{Lecture Schedule}
We will maintain a detailed lecture schedule here:
\begin{center}
\url{https://tinyurl.com/cs506-spring2018}
\end{center}
\noindent
Tentative topics:
\begin{compactitem}
\item Introduction
\begin{compactitem}
\item Python and essential tools (Git, Jupyter Notebook, Pandas)
\item The process of ``data science''
\item Probability and Statistics Refresher
\item Linear Algebra Refresher
\end{compactitem}
\item A sampling of techniques
\begin{compactitem}
\item Distance and Similarity Functions, Timeseries
\item Clustering
% \item Clustering I: k-means
% \item Clustering II: In practice
% \item Clustering III: Hierarchical Clustering
% \item Clustering IV: GMM and Expectation Maximization
\item Assessing significance: correlations and clustering
\item Singular Value Decomposition and Dimension reduction
% \item Singular Value Decomposition I : Low Rank Approximation
% \item SVD II: Dimensionality Reduction
% \item SVD III: Anomaly Detection
\item Web Scraping
\item Classification and Regression
\end{compactitem}
\item Interpretation, assessment, and confidence
\begin{compactitem}
\item statistical validity
\item p-hacking and multiple hypothesis testing
\end{compactitem}
\item Ethics of data: privacy, transparency, accountability, representation, fairness
\item Advanced technical topics
\begin{compactitem}
\item Parallel architectures and Map Reduce
\item Collaborative filtering and recommender Systems
\item Analyzing Graphs and Networks
\end{compactitem}
% \item Network Analysis IA
% \item Network Analysis IB
% \item Networks II: Centrality
% \item Networks III: Clustering
\end{compactitem}
\newpage
\section*{CS 506 Academic Honesty Policy---Sign and Return this Page}
You may discuss homework assignments with classmates, but you are
solely responsible for what you turn in. Collaboration in the form of
discussion is allowed, but all forms of cheating (copying parts of a
classmate's assignment, plagiarism from books or old posted solutions)
are NOT allowed. We -- both teaching staff and students -- are expected
to abide by the guidelines and rules of the Academic Code of Conduct
at
\url{http://www.bu.edu/academics/policies/academic-conduct-code/}.
Graduate students must also be aware of and abide by the GRS Academic
Conduct code at
{\footnotesize \url{http://www.bu.edu/cas/students/graduate/forms-policies-procedures/academic-discipline-procedures/}.}
\vspace{0.3in}
I, the undersigned, have read and understand the academic honesty
policy.
\vspace{0.3in}
\makebox[1.5in]{Signature:\hfill} \rule{3in}{0.5pt}
\vspace{0.3in}
\makebox[1.5in]{Date: \hfill} \rule{3in}{0.5pt}
\vspace{0.3in}
\makebox[1.5in]{Your name (print): \hfill} \rule{3in}{0.5pt}
\vspace{0.3in}
\makebox[1.5in]{BU ID number: \hfill} \rule{3in}{0.5pt}
\vspace{1in}
You can probably, if you try hard enough, find solutions for homework
problems online. That said,
\begin{enumerate}
\item It is against the course honesty policy (see above).
\item If you are looking online for an answer because you don't know how
to start thinking about a problem, talk to Ms.\ Nikolakaki or myself, who may be
able to give you pointers to get you started. Piazza is great for
this -- you can usually get an answer in an hour if not a few minutes.
\item If you are looking online for an answer because you want to see if
your solution is correct, ask yourself if there is some way to verify
the solution yourself. Usually, there is. You will understand what you have done
\emph{much} better if you do that.
\end{enumerate}
\end{document}