Pratt Institute, Center for Continuing and Professional Studies Spatial Analysis and Visualization Initiative (SAVI)
Instructor: Neil Freeman
Location: ISC Building, Lower Level, Room 003
Continuing Education Units (C.E.U.s): 3.0
- Course Overview
- Course Requirements
- Course Readings
- Class Format
- Submitting Assignments
- Course Outline
- Resources
This course introduces the tools, techniques, and general approaches used to acquire, clean, analyze, and visualize open data, with particular emphasis on using web-based technologies and open-source tools at each step of the process.
- You will learn to formulate and articulate a meaningful research question with public open data, as well as meaningfully critique the work of others
- You will learn how to acquire data through open data portals, application programmer interfaces (APIs), and scraping data from web sites
- You will learn how to clean data using open source tools in preparation for analysis
- You will learn how to conduct exploratory data analysis using descriptive statistics
- You will learn to visualize your analytical findings in meaningful and visually-engaging graphics, as well as meaningfully critique the work of others
- You will learn the basics of cartographic design as it relates to visualizing open data
All students will need to bring their own laptop for exercises during class. Time will be set aside to help install, configure, and run the programs necessary for all assignments, projects, and exercises. Where possible, all programs will be free and open-source. All assigned work using services hosted online can be run using free accounts. Please update your system to the latest version of your prefered operating system prior to the first day of class to ensure you're able to successfully install and use the tools in class.
You will be required to have free accounts with the following services:
In addition, please install a free text editor of your choice:
- Sublime Text (All systems)
- TextWrangler (All systems)
- Notepad++ (Windows)
The required readings for this course consist of book chapters, newspaper articles, and short blog posts. The intention is to help give you a foundation in the critical skills ahead of class lectures. All required readings are available online or will be made available through the class portal. Recommended readings are suggestions if you wish to study further the topics covered in class. The books listed in the Suggested Readings section below offer even more depth and an extended discussion of the material we cover in class.
Class runs from 9:30am to 5:30pm. Each day will be consist of 80-90–minute blocks broken up by 10-minute breaks and a half-hour break for lunch. Class will be a mix of lecture and exercise work, emphasizing the application of skills covered in the lecture portion of the class. You will have ample time in class to work on practical exercises based on the information presented in lectures.
All assignments will be submitted by adding your content to this repository. See [assignments/](assignments/README.md)
for details. Assignments aren't considered submitted until the pull request has been issued. We will have ample time in class to address any technical issues and a reference guide for the process.
Area | Total Points |
---|---|
Class Participation | 25 |
Visualization Critiques | 25 |
Visualizations | 25 |
Final Project | 25 |
Total | 100 |
Regular, prompt attendence is required.
Your engagement makes class sessions richer and more fulfilling for everyone. Questions are encouraged, and active participation in class discussion and in-class exercises is very important.
Topics will be covered that day in class. Reading assignments are to be read before class in preparation of the lecture and exercises. Assignments are due before the start of the next class and build on the information presented in class.
Find an interesting or visually compelling map (interactive or static) or visualization online and write 2-3 paragraphs on the visualization, discussing the data source(s), the visual style, and how well the data was represented. Feel free to use the visualization resources listed below. Submit your analysis (include a link to the visualization) to this repository before each class.
- Introduction
- Data on the web
- Introduction to mapping and cartography
- Introduction to HTML and CSS
- Introduction to Git and Github
Please take the student skills survey.
- Complete the CARTO “Online Mapping for Beginners” course.
- Identify a research question that you would like to explore in this class, with the intention of creating maps and visualizations that will help answer question or clarify the topic. Write 2-3 paragraphs on what question you would like to answer, what data you'd like to explore using, and what you hope to contribute with your work.
- Thomas Levine, Introduction to web scraping
- Introduction to APIs ch 1-5
- Ben Wellington "Mapping the Sharing Economy"
- Heer, Jeffrey, Michael Bostock, and Vadim Ogievetsky. "A tour through the visualization zoo." Commun. ACM 53.6 (2010): 59-67.
- CARTO “Introduction to Map Design”
- Web scraping
- Introduction to APIs
- Introduction to the command line and parsing data with csvkit
- Opening closed data with Tabula
Create a second map, using new data scraped from the web or pulled via an API. Write 2-3 paragraphs discussing any challenges you encountered working with the data and/or creating your map in Carto.
- Introduction to the Census Factfinder
- Introduction to SQL
- Introduction to Spatial SQL
- Complete the SQL and PostGIS in CARTO course. Update your maps (or create a new one) using data joined from two sources
- Work through "The Basics" at Learn Python (you can skip "String Formatting". If you're feeling good, jump ahead to "List Comprehensions")
- Prepare a simple draft map for your project, using one or two sources. Embed itinto an HTML file in
assignments/assignment3
. Include a short description of the sources and any processing you did (or would like to do!).
- Python for scraping the web
- Advanced topics TBD
- Stack Overflow question & answer community of tech
- GIS Stack Exchange same as above for mapping
- The Quartz guide to bad data
- JSON to CSV converter
- Table to TSV bookmarklet (drag to toolbar or "save as bookmark")
- What is the Command Line (series of pages with links to history articles)
- Lifehacker guide to the command line
- Basic Unix commands
- Photos of historic command line interfaces
- Codecademy Python Course
- MIT Introduction to Computer Science and Programming with Python (free course)
- Learn Python the Hard Way
- U.S. Government open data
- Census TIGER map data
- New York City Open Data Portal
- New York State Open Data Portal
- UK open data
- Awesome Public Datasets
- Kirk Bourne's list of open data sources
- NYPL Space/Time Directory
- https://data.ny.gov/
- https://data.sfgov.org/
- https://data.cityofchicago.org/
- https://data.cityofboston.gov/
- https://data.seattle.gov/
- https://data.kcmo.org/
- http://data.lexingtonky.gov/
- Carto Academy
- Elements of Cartographic Style by Paul Cote
- Fry, Ben. Visualizing Data: Exploring and Explaining Data with the Processing Environment. O'Reilly Media, Inc., 2007.
- Garrad, Chris. Geoprocessing with Python. Manning Publications Co., forthcoming. Janert, Philipp K. Data analysis with open source tools. O'Reilly Media, Inc., 2010.
- McCallum, Q. Ethan. Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work. O'Reilly Media, Inc., 2012.
- Munzner, Tamara. Visualization Analysis and Design. AK Peters, 2014.
- Murray, Scott. Interactive data visualization for the Web. O'Reilly Media, Inc., 2013.
- Tufte, Edward R., and P. R. Graves-Morris. The visual display of quantitative information. Vol. 2. Cheshire, CT: Graphics press, 1983.
This course builds from material prepared by Richard Dunks under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.