Skip to content

Latest commit

 

History

History
439 lines (299 loc) · 15.5 KB

syllabus.md

File metadata and controls

439 lines (299 loc) · 15.5 KB

Week 1: Introduction

Readings

Slides

For Next Week

Homework

Post a little bit about yourself in the Introductions forum, following the instructions there.

Lab Task

This week's lab task is mostly to play! It is intended to get you comfortable with out-of-the-box text analysis tools.

Use Voyant to visualize a text or set of texts. It can be anything you want: a book, a set of lyrics, scripts from a show you like, news articles. Try out the various features in Voyant: phrases, keywords in contexts, etc.

Once you've had a chance to play with Voyant, *post a short response to the lab task forum (no more than 300 words) about your experience. Some possible things to post about: What was interesting or confusing about the tool? Did you find anything intriguing about your text or texts? Did it find any recurring patterns or phrases? Did you find any visualisations beyond the word cloud to be interesting? Any other thoughts? Don't forget to tell us what text you used with Voyant.

Week 2: Fundamentals

Just a reminder that 'readings' refer to the readings you should have done by the lecture, while lab tasks are done by next week. The intention is that they are both related to the current week's theme: readings prepare you for the lecture, and the lecture lets you practice that learning.

Readings

Slides

For Next Week

This week's lab task is about getting started with powerful tools that will underlie many of the skills you learn in the course. The lab task is posted in a Jupyter notebook format on Github.

Week 3: Treating Text as Data - Features

Readings

Supplemental:

Slides

For Next Week

Lab Task

This week's lab task is again a series of questions, following along with a worksheet. Find it here.

Week 4: Text Mining for Art and Criticism

Readings

The following three readings are web articles related to Twitter bots: for activism, for recontextualization, and a roundup of interesting bots. Not all of these are text related, but serve as a good overview.

Slides

Assignments

The Twitter Bot assignment is posted on the Assignments page. There is a draft posting next week (post about your plans) and the final is due in two weeks.

For Next Week

Week 5.1: Document Access

Readings

Against Cleaning - Katie Rawson, Trevor Muñoz

Week 5.2: Understanding Words - Natural Language Processing 1, Part of Speech Tagging

Readings

  • Natural Language Processing for programmers part 2 - Liza Daly
    • This talks about an old concept, but is written from a beginner perspective and is useful for your assignment.
  • Part of Speech Tagging - Chapter 10 (up to 10.4) of Speech and Language Processing (3rd ed. draft)
  • Chapter 5.7 of the NLTK Book - Bird et. al
    • Just section 7, but sections 1-2, 4-6 are useful as supplements to the SLP reading if you need more info or simply find it interesting. Section 7 is the conclusion of the chapter, which succinctly describes the ways that we understand a part of speech.

Slides

05 - Getting Data

For Next Week

Twitter bot: Post to the Twitter Bot Final forum.

No lab task. Complete your bot!

Week 6: Understanding Words - Natural Language Processing 2, Information Extraction and Dependency Parsing

Readings

Optional Reading

  •   Google's approach for dependency parsing, SyntaxNet, and
      their model trained on it - Parsey McParseFace - are the
      current state of the art. This tutorial, while optional,
      offers a look at Part of Speech tagging using feed-forward
      neural networks and has a nicely described description of
      transition-based dependency parsing.
    

Slides

06 - Natural Language Processing 1 - Part of Speech Tagging

For Next Week

Week 7: Classification 1

Readings

Naive Bayes Classification and Sentiment, Speech and Language Processing (3rd edition). Dan Jurafsky and James H. Martin.

Notation

We getting to the point of the term where some mathematic notation is necessary for our readings to communicate the underlying theory.

If you are unfamiliar with Bayesian inference, the description on the 3rd page of this chapter might not satisfy your curiosity. The introduction to Bayes' Theorem from Khan Academy can help equip you with some more background about what we use Bayes' Theorem for.

Since we're looking at classes, you'll start seeing set theory, like c ∈ C. This means 'c' is an element of 'C', or in the context our reading, this class (c) is part of a set of all the possible classes (C).* *Why is that something we'd want to state? Because for Naive Bayes classification, we'll be choosing the class c with the highest probability given the evidence. The equations simply need a way to state "consider P(c|d) for all possible classes and choose the class with the highest value", which they do with .

Slides

For Next Week

Week 8.1: Classification 2

Week 8.2 Ethics in Text Mining

Readings

No required readings this week, focus on the lab task!

Optional Reading

As with our class on art and criticism, some of the most accessible work on ethics is from the bot-making community.

Slides

For Next Week

Week 9: Clustering

Readings

  • Textual Analysis - John Burrows, A Companion to Digital Humanities
  • Clustering - Sci-Kit Learn Documentation: Read Overview and the intros to 2.3.2 (K-Means) and 2.3.6 (Hierarchical clustering)

Supplemental Readings

Slides

Week 9 - Clustering

For the next two weeks

Lab 08 Worksheet

Spring Break Week

Spring Break. No class.

Week 10: Topic Modeling and Dimensionality Reduction 1

Readings

Topic modeling made just simple enough. 2012. Ted Underwood.

Probabilistic Topic Models. 2012. David Blei.

Supplemental

Introduction to Latent Dirichlet Allocation. 2011. Edwin Chen.

Slides

Topic Modeling Slides

For Next Week

Lab task 09 - Dimensionality Reduction and Sentiment Analysis

Recommended: Get started on your topic modeling assignment. Make sure you can get MALLET running on your system.

For Two Weeks from Now

Topic Modeling Assignment Due. See description on the Assignments page.

Post the Problem Statement for your Text Mining Project. See description on the Assignments page.

Week 11.1 Topic Modelling 2

Week 11.2 Sentiment Analysis

Readings

Narrative framing of consumer sentiment in online restaurant reviews. Dan Jurafsky, Victor Chahuneau, Bryan R. Routledge, Noah A. Smith.

Optional but Recommended

Indexing by Latent Semantic Analysis. Deerwester, Dumais, Furnas, Landauer, Harshman.

This is one of our core papers in Library and Information Science - 13k citations can't be wrong. You'll notice that these famous papers are particularly easy to read - Chengzheng Zhai's smoothing paper is another example - a good reminder that being clever is only useful if you can communicate it.

Slides

Topic Modelling II and Sentiment Analysis

For Next Week

Topic Modeling Assignment Due. See description on the Assignments page.

Post the Problem Statement for your Text Mining Project. See description on the Assignments page.

Week 12: Visualization

Readings

It's a busy time, no readings this week!

Slides

Week 13 - Visualization

For Next Week

  • Literature Review and Data Collection for your final project.

Week 13: Word Embeddings

Readings

Supplemental (Optional)

Bonus

Something to play with: the "Bonus App" at the bottom of Radim Řehůřek's Word2Vec tutorial.

Week 14: What's Next: Remainder Notes from Text Mining

Slides

Week 15 - What's Next

Reminders

May 3rd is the last day to turn in late lab tasks! Get them in!