-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcourses.tex
140 lines (115 loc) · 5.05 KB
/
courses.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
\chapter*{Course plan}
\thispagestyle{empty}
In the following, I present the course plan for PO-235.
Any questions about the classes should be sent via Google Classroom. If your query is of
general interest, please use the main stream. If your query is personal and related to
a specific assignment or grade, please use the private stream.
You can contact me via email at \href{mailto:verri@ita.br}{verri@ita.br} or
\href{mailto:filipe.verri@gp.ita.br}{filipe.verri@gp.ita.br}.
\newpage
\thispagestyle{empty}
\section*{PO-235 Data Science Project}
\emph{Course plan (\the\year{})}
Prof. Filipe A. N. Verri
\paragraph{Important:} Only graduate students are permitted to enroll in this course.
\paragraph{Number of students:} Approximately 20
\paragraph{Course load:} 3--0--0--4
\paragraph{Requirements:} Advanced programming skills, a strong statistical background, and
beginner-level machine learning skills.
\paragraph{Course program:}
Brief history of data science. Fundamental data concepts. Methodologies for data science
projects. Structured data, database normalization, and tidy data. Data handling
operators and their properties. Learning from data and principles of statistical learning
theory. Data preprocessing tasks. Evaluation and validation of data science products.
\paragraph{Goals:}
Providing the theoretical foundation and practical concepts to develop an end-to-end
data science project for an inductive task.
\paragraph{Teaching methodology:}
Expository classes in a common classroom, using a whiteboard, slide presentations, coding
examples, books, and scientific papers. Supplementary didactic materials will be available
in Google Classroom. The development of the case study will occur during home study
hours, including programming and scientific paper writing. All classes will be given in
English. Students are encouraged to ask questions in English, but Portuguese is also
permitted. All written and oral assignments must be in English.
\paragraph{Grading:} Two individual written tests in the \nth{1} quarter ($T_1$ and $T_2$) and
another in the \nth{2} quarter ($T_3$). Also, a group activity that includes writing a
scientific paper (optional), developing a data science product, and a 30-minute presentation ($L$).
Final grades will be calculated as
\begin{equation*}
\sqrt{\frac{T_1 + T_2 + T_3}{3} \cdot L}
\end{equation*}
\paragraph{Case study:} At most 6 groups will be formed. Each group will be responsible for
a case study. Students must choose a real-world problem and develop a data science
project, including data collection, data handling, inductive learning, validation,
documentation, and deployment. The results must be presented in a 30-minute presentation.
Extra points will be awarded to groups that write a scientific paper about the case study.
The trained models must be incorporated into a data science product, such as a web
application, a mobile application, or a web service.
\paragraph{Bibliography:}
\begin{itemize}
\itemsep 0pt
\item Filipe A. N. Verri (2025). \emph{Data Science Project: An Inductive Learning
Approach}. Victoria, British Columbia, Canada: Leanpub.
\item \fullcite{Zumel2019}.
\item \fullcite{Wickham2023}.
\end{itemize}
Any required extra material will be made available in Google Classroom.
\thispagestyle{empty}
\paragraph{Calendar:} The expected schedule is presented below.
\thispagestyle{empty}
\begin{center}
\begin{tabular}{ll}
\toprule
\multicolumn{2}{c}{\bfseries \nth{1} Quarter} \\
\midrule
Week & Topics \\
\midrule
\multirow{2}{*}{1} & Mathematical foundations review \\
& Brief history of data science \\
\midrule
\multirow{2}{*}{2} & \bfseries Written test (60 min) \\
& Brief history of data science \\
\midrule
3 & Fundamental data concepts \\
\midrule
4 & Methodologies for data science projects \\
\midrule
5 & Structured data, database normalization, and tidy data \\
\midrule
6 & Data handling operators and their properties \\
\midrule
7 & Learning from data and principles of statistical learning theory \\
\midrule
\multirow{2}{*}{8} & \bfseries Written test (60 min) \\
& Project discussions \\
\bottomrule
\end{tabular}
\end{center}
\begin{center}
\begin{tabular}{ll}
\toprule
\multicolumn{2}{c}{\bfseries \nth{2} Quarter} \\
\midrule
Week & Topics \\
\midrule
1 & Learning from data and principles of statistical learning theory \\
\midrule
2 & Data preprocessing tasks \\
\midrule
3 & Evaluation and validation of data science products \\
\midrule
\multirow{2}{*}{4} & \bfseries Written test (60 min) \\
& Project discussions \\
\midrule
5 & \multirow{2}{*}{Project discussions} \\
6 & \\
\midrule
7 & \multirow{2}{*}{\bfseries Presentations} \\
8 & \\
\bottomrule
\end{tabular}
\end{center}
Case studies will be presented during exam weeks. At most, 3 case studies will be
presented per day, with 30 minutes for each presentation and 20 minutes for questions.
\thispagestyle{empty}
% vim: set spell spelllang=en: